diff: diff
Andrew Bromage
bromage at cs.mu.oz.au
Thu Jan 8 14:04:06 AEDT 1998
G'day all.
Nobody really needs to review this (it's not a critical part of the
distribution), but just in case anyone has any comments, I'll hold
off committing for a few days.
Part two of the diff follows.
Cheers,
Andrew Bromage
Estimated hours taken: too many
Modified files
--------------
samples/diff/README:
Info about this new version. Put the existing READMEs in
reverse order, so that they'll be in the most sensible order
for someone reading top-down.
samples/diff/Mmakefile:
Supply more detail about which bit of this program GCC 2.7.2
under Digital Unix 3.2 doesn't compile properly.
samples/diff/diff.m:
Slight reorganisation. Accept options, remove some high
coupling betweem diff__do_diff and lcss.m.
samples/diff/file.m:
Add support for attaching a filename to a file.
samples/diff/lcss.m:
Add more documentation, one slight performance improvement.
lcss__to_diff now works on the reversed LCSS, to avoid going
to the trouble of reversing it.
New files
---------
samples/diff/TODO:
Meaning should be obvious.
samples/diff/diff_out.m:
Replacement for diffs.m (which was never a very good name).
Now contains code to display a diff in lots more output
styles than previously supported.
samples/diff/difftype.m:
Replacement for lcsstype.m. Contains the type of a diff (but
not of an lcss, which is now local to lcss.m, hence the
renaming).
samples/diff/filter.m:
Contains code to filter a diff before being displayed. This
is only required if the user specified that they didn't want
to see part of the diff (e.g. with --ignore-blank-lines).
samples/diff/globals.m:
samples/diff/options.m:
Support for option handling.
Removed files
-------------
samples/diff/diffs.m:
Functionality moved to diff_out.m.
samples/diff/lcsstype.m:
Functionality moved to difftype.m.
Index: Mmakefile
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/Mmakefile,v
retrieving revision 1.3
diff -u -r1.3 Mmakefile
--- Mmakefile 1997/09/10 11:01:12 1.3
+++ Mmakefile 1998/01/07 04:11:27
@@ -15,5 +15,8 @@
# 2.7.2 under Digital Unix 3.2 due to a bug in gcc. The bug was fixed
# in 2.7.2.1, so feel free to comment out this line if you're using
# an unbuggy compiler.
+#
+# BTW, the predicate which isn't compiled correctly is diff_out__show_file
+# in diff_out.m.
MGNUCFLAGS=-O0
Index: README
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/README,v
retrieving revision 1.5
diff -u -r1.5 README
--- README 1997/07/28 20:02:48 1.5
+++ README 1998/01/08 01:36:58
@@ -1,3 +1,55 @@
+
+This is now looking a LOT more like the standard "diff" utility. There
+are a few features missing (e.g. we can't do directory diffs), but apart
+from that, it seems to work.
+
+The major changes in this version are:
+
+ - We now accept command-line options. In particular, we
+ recognise all options that are accepted by GNU diff,
+ though a lot of them result in error reports and a few
+ which have do nothing to do with the output format or
+ semantics, but are merely for efficiency, are accepted
+ and ignored.
+
+ - We support different output formats, in particular all
+ of the output formats supported by GNU diff. There are
+ a number of modifiers to the output formats (for example,
+ --expand-tabs) which we don't yet support.
+
+ - Just about everything (except the actual diff algorithm)
+ has been modified to support the above changes.
+
+ - Lots of cleanups, lots more documentation.
+
+Examine the file TODO to see what's still missing.
+
+Andrew Bromage 8 Jan 1998
+
+===========================================================================
+
+The version which appears here is a re-hacked version of Marnix Klooster's
+hacked version of my original. Special thanks to him for making my code
+a lot more maintainable than it originally was. :-)
+
+The changes from the previous version:
+
+ - Bug fix for a problem which was causing it to bomb out if
+ the two files were identical.
+
+ - Changed indenting so it more closely matches the Mercury
+ compiler coding standard.
+
+ - Update to use unique arrays (now called array.m).
+
+ - Various minor documentation tweaks.
+
+Oh, and it still runs in nowhere near the speed of GNU diff.
+
+Andrew Bromage 28 Jul 1997
+
+===========================================================================
+
The Mercury modules in this directory have been derived from the
'diff' sample distributed with Mercury 0.6. That sample carries the
following copyright information, description and to-do list (in
@@ -60,26 +112,4 @@
--
Marnix Klooster
marnix at worldonline.nl
-
----------------------------------
-
-The version which appears here is a re-hacked version of Marnix Klooster's
-hacked version of my original. Special thanks to him for making my code
-a lot more maintainable than it originally was. :-)
-
-The changes from the previous version:
-
- - Bug fix for a problem which was causing it to bomb out if
- the two files were identical.
-
- - Changed indenting so it more closely matches the Mercury
- compiler coding standard.
-
- - Update to use unique arrays (now called array.m).
-
- - Various minor documentation tweaks.
-
-Oh, and it still runs in nowhere near the speed of GNU diff.
-
-Andrew Bromage 28 Jul 1997
Index: diff.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/diff.m,v
retrieving revision 1.8
diff -u -r1.8 diff.m
--- diff.m 1997/07/28 06:00:53 1.8
+++ diff.m 1998/01/07 06:08:56
@@ -9,10 +9,6 @@
% Something very similar to the standard diff utility. Sort of. :-)
-% At present no options are recognized. To simulate the --rcs option,
-% find the call to diffs__display_diff, replace it by
-% diffs__display_diff_rcs, and recompile.
-
%-----------------------------------------------------------------------------%
:- module diff.
@@ -25,17 +21,29 @@
%-----------------------------------------------------------------------------%
:- implementation.
-:- import_module string, list, file, lcss, diffs, std_util, require.
+:- import_module options, lcss, diff_out, globals, filter.
+:- import_module string, list, file, std_util, require, getopt.
%-----------------------------------------------------------------------------%
% main: top-level predicate.
main -->
io__command_line_arguments(Args0),
-% { getopt__process_options(Args0, Args, Result0) },
-% postprocess_options(Result0, Result),
-% main_2(Result, Args).
- main_2(no, Args0).
+ { options__get_option_ops(OptionOps) },
+ { getopt__process_options(OptionOps, Args0, Args, Result0) },
+ postprocess_options(Result0, Result),
+ ( { Result = yes(Msg) },
+ usage_error(Msg)
+ ; { Result = no },
+ globals__io_get_output_style(OutputStyle),
+ ( { OutputStyle = help_only } ->
+ usage
+ ; { OutputStyle = version_only } ->
+ version
+ ;
+ main_2(Args)
+ )
+ ).
%-----------------------------------------------------------------------------%
@@ -55,20 +63,22 @@
:- pred usage(io__state :: di, io__state :: uo) is det.
usage -->
- io__write_string("Usage: diff [options] from-file to-file\n"),
- io__write_string("Options:\n"),
- io__write_string("\tnone yet :-)\n").
+ io__write_string("Usage: diff [options] from-file to-file\n\n"),
+ options_help.
+
+:- pred version(io__state :: di, io__state :: uo) is det.
+version -->
+ io__write_string("diff - Mercury diff version 0.4\n").
%-----------------------------------------------------------------------------%
- % main_2
-:- pred main_2(maybe(string), list(string), io__state, io__state).
-:- mode main_2(in, in, di, uo) is det.
-main_2(yes(Msg), _) -->
- usage_error(Msg).
-main_2(no, []) -->
+ % main_2 analyses the command-line arguments which are not
+ % options and calls diff__do_diff.
+:- pred main_2(list(string), io__state, io__state).
+:- mode main_2(in, di, uo) is det.
+main_2([]) -->
usage_error("missing operand").
-main_2(no, [Fname1 | Rest]) -->
+main_2([Fname1 | Rest]) -->
( { Rest = [Fname2 | _] },
( { Fname1 = Fname2 } ->
% There are no differences between identical files.
@@ -78,11 +88,11 @@
% (Note: Both can't be "-" since that was dealt with
% in the previous case.)
( { Fname1 = "-" } ->
- file__read_input(Contents1),
+ file__read_input(Fname1, Contents1),
file__read_file(Fname2, Contents2)
; { Fname2 = "-" } ->
file__read_file(Fname1, Contents1),
- file__read_input(Contents2)
+ file__read_input(Fname2, Contents2)
;
% Otherwise read the files normally.
file__read_file(Fname1, Contents1),
@@ -106,38 +116,32 @@
%-----------------------------------------------------------------------------%
% diff__do_diff takes the files plus all the command
- % line options (all zero of them) and determines what
- % to do with them.
-:- pred diff__do_diff(file, file, io__state, io__state).
-:- mode diff__do_diff(in, in, di, uo) is det.
-diff__do_diff(File1, File2) -->
- { diff__find_diff(File1, File2, Diff) },
- diffs__display_diff(File1, File2, Diff).
-
-%-----------------------------------------------------------------------------%
-
- % diff__find_diff takes two files and finds their
- % differences.
-:- pred diff__find_diff(file, file, diff).
-:- mode diff__find_diff(in, in, out) is det.
-
- % The process to "diff" two files is:
+ % line options and determines what to do with them.
%
- % - Convert the files to lists.
+ % At the moment, we're organised into three passes:
%
- % - Identify the longest common subsequence
- % in the lists.
+ % - diff_by_lcss takes the two files and produces
+ % a diff using the LCSS algorithm.
+ % - filter_diff analyses the diff, filtering out
+ % any edits which the user said that they didn't
+ % want to see (using the appropriate command-line
+ % options).
+ % - display_diff outputs the diff in whatever output
+ % format the user chose.
%
- % - Use this information to determine the
- % set of operations required to convert
- % one list to the other.
-diff__find_diff(File1, File2, Diff) :-
- file__to_list(File1, File1list),
- file__to_list(File2, File2list),
- file__get_numlines(File1, Length1),
- file__get_numlines(File2, Length2),
- lcss__find_lcss(File1list, File2list, Length1, Length2, Lcss),
- diffs__to_diff(Lcss, Diff).
+ % TO DO: Options like --ignore-case are probably best handled
+ % by a pass taking place _before_ the diff algorithm is
+ % run. This pass would have the benefit of determining
+ % whether or not there are any differences or not, in
+ % the case where the output style chosen doesn't require
+ % output if there is no diff. It would also speed up
+ % the --brief output style.
+:- pred diff__do_diff(file, file, io__state, io__state).
+:- mode diff__do_diff(in, in, di, uo) is det.
+diff__do_diff(File1, File2) -->
+ { diff_by_lcss(File1, File2, Diff0) },
+ filter_diff(Diff0, File1, File2, Diff),
+ display_diff(File1, File2, Diff).
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
Index: file.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/file.m,v
retrieving revision 1.9
diff -u -r1.9 file.m
--- file.m 1997/07/28 06:00:54 1.9
+++ file.m 1998/01/07 03:54:10
@@ -10,6 +10,8 @@
% This module provides file input. One can read a file entirely,
% select a single line from a read file, get the number of lines
% in a read file, and convert a read file to a list of strings.
+%
+% Every file has a filename attached to it.
%-----------------------------------------------------------------------------%
@@ -26,8 +28,8 @@
% file__read_input reads a file from the input
% stream.
-:- pred file__read_input(io__res(file), io__state, io__state).
-:- mode file__read_input(out, di, uo) is det.
+:- pred file__read_input(string, io__res(file), io__state, io__state).
+:- mode file__read_input(in, out, di, uo) is det.
% file__get_line retrieves a line from a file.
% (Lines are numbered from 0.)
@@ -40,39 +42,56 @@
:- pred file__get_numlines(file, int).
:- mode file__get_numlines(in, out) is det.
+ % file__from_list converts a list of lines to a file.
+:- pred file__from_list(string, list(string), file).
+:- mode file__from_list(in, in, out) is det.
+
% file__to_list converts a file into a list of
% lines.
:- pred file__to_list(file, list(string)).
:- mode file__to_list(in, out) is det.
+ % file__get_file_name returns the name of the file.
+:- pred file__get_file_name(file, string).
+:- mode file__get_file_name(in, out) is det.
+
+ % file__set_file_name sets the name of the file.
+:- pred file__set_file_name(file, string, file).
+:- mode file__set_file_name(in, in, out) is det.
+
%-----------------------------------------------------------------------------%
:- implementation.
-:- import_module array, require, int.
+:- import_module array, require, int, bool.
%-----------------------------------------------------------------------------%
-:- type file == array(string).
+:- type file
+ ---> file(
+ string, % File name
+ array(string) % Contents
+ ).
% Open the stream, read from the stream, then close
% the stream.
-file__read_file(FileName, Contents) -->
+file__read_file(FileName, File) -->
io__open_input(FileName, Res),
( { Res = ok(InputStream) },
- file__read_stream(InputStream, Contents0),
+ file__read_stream(InputStream, Contents),
io__close_input(InputStream),
- { Contents = ok(Contents0) }
+ { File = ok(file(FileName, Contents)) }
; { Res = error(Error) },
- { Contents = error(Error) }
+ { File = error(Error) }
).
% Get the input stream, then read from it.
-file__read_input(ok(Contents)) -->
+file__read_input(FileName, ok(file(FileName, Contents))) -->
io__input_stream(InputStream),
file__read_stream(InputStream, Contents).
% file__read_stream is the "real" file reader.
-:- pred file__read_stream(io__input_stream, file, io__state, io__state).
+:- pred file__read_stream(io__input_stream, array(string),
+ io__state, io__state).
:- mode file__read_stream(in, array_uo, di, uo) is det.
file__read_stream(Stream, File) -->
file__read_stream2(Stream, 0, File).
@@ -81,7 +100,8 @@
% read, fill File[LinesIn] to File[LinesOut-1] with the rest
% of the lines. LinesOut is the number of lines in the file.
% (Note that line numbering starts at zero.)
-:- pred file__read_stream2(io__input_stream, int, file, io__state, io__state).
+:- pred file__read_stream2(io__input_stream, int, array(string),
+ io__state, io__state).
:- mode file__read_stream2(in, in, array_uo, di, uo) is det.
file__read_stream2(Stream, LineNo, File) -->
io__read_line(Stream, Res),
@@ -99,15 +119,26 @@
%-----------------------------------------------------------------------------%
-file__get_line(File, LineNo, Line) :-
- array__semidet_lookup(File, LineNo, Line).
+file__get_line(file(_, Contents), LineNo, Line) :-
+ array__semidet_lookup(Contents, LineNo, Line).
-file__get_numlines(File, NumLines) :-
- array__bounds(File, _, NumLines1),
+file__get_numlines(file(_, Contents), NumLines) :-
+ array__bounds(Contents, _, NumLines1),
NumLines is NumLines1 + 1.
-file__to_list(File, List) :-
- array__to_list(File, List).
+%-----------------------------------------------------------------------------%
+
+file__to_list(file(_, Contents), List) :-
+ array__to_list(Contents, List).
+
+file__from_list(FileName, List, file(FileName, Contents)) :-
+ array__from_list(List, Contents).
+
+%-----------------------------------------------------------------------------%
+
+file__get_file_name(file(FileName, _), FileName).
+
+file__set_file_name(file(_, B), FileName, file(FileName, B)).
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
Index: lcss.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/lcss.m,v
retrieving revision 1.12
diff -u -r1.12 lcss.m
--- lcss.m 1997/07/28 06:00:55 1.12
+++ lcss.m 1998/01/07 05:11:29
@@ -36,33 +36,36 @@
:- module lcss.
:- interface.
-:- import_module lcsstype.
+:- import_module difftype, file.
-:- pred lcss__find_lcss(list(A) :: in, list(A) :: in, int :: in, int :: in,
- lcss :: out) is det.
+:- pred diff_by_lcss(file, file, diff).
+:- mode diff_by_lcss(in, in, out) is det.
%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
:- implementation.
:- import_module map, require, std_util, int, list, char, array.
-%-----------------------------------------------------------------------------%
+diff_by_lcss(File1, File2, Diff) :-
+ file__to_list(File1, File1list),
+ file__to_list(File2, File2list),
+ file__get_numlines(File1, Length1),
+ file__get_numlines(File2, Length2),
+ lcss__find_diff(File1list, File2list, Length1, Length2, Diff).
-:- import_module io.
+%-----------------------------------------------------------------------------%
- % For debugging only. Will be removed in the
- % final version.
-:- pred lcss__show_lcss(lcss :: in, io__state :: di, io__state :: uo) is det.
-lcss__show_lcss([]) -->
- io__write_string("[]").
-lcss__show_lcss([X - Y | Lcss]) -->
- io__write_int(X),
- io__write_char('-'),
- io__write_int(Y),
- io__write_char(','),
- lcss__show_lcss(Lcss).
+ % The longest common subsequence of two lists can be
+ % represented as an ordered list of "matches". A match is a
+ % pair of the form I-J where I is the number of an item in
+ % item 1 and J is the number of an item in list 2. (Note that
+ % an item is numbered by the position immediately preceding
+ % it, i.e., numbering starts at 0.)
+:- type lcss == list(pair(pos,pos)).
%-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
% Find a longest common subsequence. The algorithm
% used is very similar to that in:
@@ -91,12 +94,15 @@
% reverse, and Thresh[I] contains its 'end' (see above).
% Otherwise, Thresh[I] contains 'infinity.'
-lcss__find_lcss(List1, List2, L1, L2, Lcss) :-
+:- pred lcss__find_diff(list(string) :: in, list(string) :: in,
+ int :: in, int :: in, diff :: out) is det.
+
+lcss__find_diff(List1, List2, L1, L2, Diff) :-
% The original version swapped List1 and List2, so that the
% first was the largest. Is this swapping really worthwile?
% It doesn't seem to be faster, nor does it consume less
- % memory. Therefore I removed it.
+ % memory. Therefore I removed it. -- MK
% The consequence is that build_thresh and build_lcss need a
% value representing 'infinity'; previously we could use the
@@ -122,7 +128,7 @@
lcss__build_matchlist(List1, List2, MatchList),
lcss__build_thresh(N, MatchList, Inf, Thresh, Link),
- lcss__build_lcss(N, Inf, Thresh, Link, L1, L2, Lcss).
+ lcss__build_diff(N, Inf, Thresh, Link, L1, L2, Diff).
%-----------------------------------------------------------------------------%
@@ -133,7 +139,7 @@
% in increasing order, since this is required by build_thresh
% to go through the matches in lexicographical order.
-:- pred lcss__build_matchlist(list(A), list(A), list(list(int))).
+:- pred lcss__build_matchlist(list(string), list(string), list(list(int))).
:- mode lcss__build_matchlist(in, in, out) is det.
lcss__build_matchlist(List1, List2, MatchList) :-
@@ -150,7 +156,7 @@
lcss__match_map_to_matchlist(List1, Map, MatchList).
-:- pred lcss__build_match_map(int, list(A), map(A,list(int))).
+:- pred lcss__build_match_map(int, list(string), map(string,list(int))).
:- mode lcss__build_match_map(in, in, out) is det.
lcss__build_match_map(_, [], Map) :-
map__init(Map).
@@ -164,7 +170,7 @@
),
map__set(MapIn, S, Ns1, MapOut).
-:- pred lcss__match_map_to_matchlist(list(A), map(A,list(int)),
+:- pred lcss__match_map_to_matchlist(list(string), map(string,list(int)),
list(list(int))).
:- mode lcss__match_map_to_matchlist(in, in, out) is det.
lcss__match_map_to_matchlist([], _, []).
@@ -196,7 +202,9 @@
N1 is N + 1, % Why this size? Suppose we have two identical
% files of length N. Then the links will be
% [], [0-0], [0-0,1-1], ... [0-0..N-N], which
- % makes N+1 links in total.
+ % makes N+1 links in total. We could use N
+ % if we pre-screened the files to see if they
+ % were identical or not.
array__init(N1, Inf, Thresh0), % Thresh[0..N] := Inf
array__set(Thresh0, 0, -1, Thresh1), % Thresh[0] := -1
@@ -207,8 +215,7 @@
:- pred lcss__build_thresh2(int, int, list(list(int)),
- array(int), array(lcss),
- array(int), array(lcss)).
+ array(int), array(lcss), array(int), array(lcss)).
:- mode lcss__build_thresh2(in, in, in,
array_di, array_di, array_uo, array_uo) is det.
lcss__build_thresh2(_N, _I, [], Thresh0, Link0, Thresh0, Link0).
@@ -286,15 +293,12 @@
% that Thresh[K]<Inf, and Link[K] should be the Lcss in
% reverse.
- % Note that it is here that we add the 'end-of-list match'
- % L1-L2 to the Lcss.
-
-:- pred lcss__build_lcss(int, int, array(int), array(lcss), int, int, lcss).
-:- mode lcss__build_lcss(in, in, in, in, in, in, out) is det.
-lcss__build_lcss(N, Inf, Thresh, Link, L1, L2, Lcss) :-
+:- pred lcss__build_diff(int, int, array(int), array(lcss), int, int, diff).
+:- mode lcss__build_diff(in, in, in, in, in, in, out) is det.
+lcss__build_diff(N, Inf, Thresh, Link, L1, L2, Diff) :-
lcss__build_lcss2(N, Inf, Thresh, K),
- array__lookup(Link, K, Lcss1),
- list__reverse([L1 - L2 | Lcss1], Lcss).
+ array__lookup(Link, K, Lcss),
+ lcss__to_diff(L1, L2, Lcss, [], Diff).
% Find the largest K<=N for which Thresh[K]<Inf.
@@ -315,6 +319,29 @@
;
K = N
).
+
+ % lcss__to_diff turns the (reversed) longest common subsequence
+ % of two lists into a list of edits.
+:- pred lcss__to_diff(int, int, lcss, diff, diff).
+:- mode lcss__to_diff(in, in, in, in, out) is det.
+lcss__to_diff(_X, _Y, [], Diff, Diff).
+lcss__to_diff(X, Y, [X1 - Y1 | Lcss], Diff0, Diff) :-
+ X2 is X - 1, Y2 is Y - 1,
+ X3 is X1 + 1, Y3 is Y1 + 1,
+ ( X1 = X2 ->
+ ( Y1 = Y2 ->
+ Diff1 = Diff0
+ ;
+ Diff1 = [add(X, Y3 - Y) | Diff0]
+ )
+ ;
+ ( Y1 = Y2 ->
+ Diff1 = [delete(X3 - X, Y) | Diff0]
+ ;
+ Diff1 = [change(X3 - X, Y3 - Y) | Diff0]
+ )
+ ),
+ lcss__to_diff(X1, Y1, Lcss, Diff1, Diff).
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
More information about the developers
mailing list