diff: diff

Andrew Bromage bromage at cs.mu.oz.au
Thu Jan 8 14:04:06 AEDT 1998


G'day all.

Nobody really needs to review this (it's not a critical part of the
distribution), but just in case anyone has any comments, I'll hold
off committing for a few days.

Part two of the diff follows.

Cheers,
Andrew Bromage


Estimated hours taken: too many

Modified files
--------------

samples/diff/README:
	Info about this new version.  Put the existing READMEs in
	reverse order, so that they'll be in the most sensible order
	for someone reading top-down.

samples/diff/Mmakefile:
	Supply more detail about which bit of this program GCC 2.7.2
	under Digital Unix 3.2 doesn't compile properly.

samples/diff/diff.m:
	Slight reorganisation.  Accept options, remove some high
	coupling betweem diff__do_diff and lcss.m.

samples/diff/file.m:
	Add support for attaching a filename to a file.

samples/diff/lcss.m:
	Add more documentation, one slight performance improvement.
	lcss__to_diff now works on the reversed LCSS, to avoid going
	to the trouble of reversing it.

New files
---------

samples/diff/TODO:
	Meaning should be obvious.

samples/diff/diff_out.m:
	Replacement for diffs.m (which was never a very good name).
	Now contains code to display a diff in lots more output
	styles than previously supported.

samples/diff/difftype.m:
	Replacement for lcsstype.m.  Contains the type of a diff (but
	not of an lcss, which is now local to lcss.m, hence the
	renaming).

samples/diff/filter.m:
	Contains code to filter a diff before being displayed.  This
	is only required if the user specified that they didn't want
	to see part of the diff (e.g. with --ignore-blank-lines).

samples/diff/globals.m:
samples/diff/options.m:
	Support for option handling.

Removed files
-------------

samples/diff/diffs.m:
	Functionality moved to diff_out.m.

samples/diff/lcsstype.m:
	Functionality moved to difftype.m.


Index: Mmakefile
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/Mmakefile,v
retrieving revision 1.3
diff -u -r1.3 Mmakefile
--- Mmakefile	1997/09/10 11:01:12	1.3
+++ Mmakefile	1998/01/07 04:11:27
@@ -15,5 +15,8 @@
 # 2.7.2 under Digital Unix 3.2 due to a bug in gcc.  The bug was fixed
 # in 2.7.2.1, so feel free to comment out this line if you're using
 # an unbuggy compiler.
+#
+# BTW, the predicate which isn't compiled correctly is diff_out__show_file
+# in diff_out.m.
 MGNUCFLAGS=-O0
 
Index: README
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/README,v
retrieving revision 1.5
diff -u -r1.5 README
--- README	1997/07/28 20:02:48	1.5
+++ README	1998/01/08 01:36:58
@@ -1,3 +1,55 @@
+
+This is now looking a LOT more like the standard "diff" utility.  There
+are a few features missing (e.g. we can't do directory diffs), but apart
+from that, it seems to work.
+
+The major changes in this version are:
+
+	- We now accept command-line options.  In particular, we
+	  recognise all options that are accepted by GNU diff,
+	  though a lot of them result in error reports and a few
+	  which have do nothing to do with the output format or
+	  semantics, but are merely for efficiency, are accepted
+	  and ignored.
+
+	- We support different output formats, in particular all
+	  of the output formats supported by GNU diff. There are
+	  a number of modifiers to the output formats (for example,
+	  --expand-tabs) which we don't yet support.
+
+	- Just about everything (except the actual diff algorithm)
+	  has been modified to support the above changes.
+
+	- Lots of cleanups, lots more documentation.
+
+Examine the file TODO to see what's still missing.
+
+Andrew Bromage  8 Jan 1998
+
+===========================================================================
+
+The version which appears here is a re-hacked version of Marnix Klooster's
+hacked version of my original.  Special thanks to him for making my code
+a lot more maintainable than it originally was.  :-)
+
+The changes from the previous version:
+
+	- Bug fix for a problem which was causing it to bomb out if
+	  the two files were identical.
+
+	- Changed indenting so it more closely matches the Mercury
+	  compiler coding standard.
+
+	- Update to use unique arrays (now called array.m).
+
+	- Various minor documentation tweaks.
+
+Oh, and it still runs in nowhere near the speed of GNU diff.
+
+Andrew Bromage  28 Jul 1997
+
+===========================================================================
+
 The Mercury modules in this directory have been derived from the
 'diff' sample distributed with Mercury 0.6.  That sample carries the
 following copyright information, description and to-do list (in
@@ -60,26 +112,4 @@
 --
 Marnix Klooster
 marnix at worldonline.nl
-
----------------------------------
-
-The version which appears here is a re-hacked version of Marnix Klooster's
-hacked version of my original.  Special thanks to him for making my code
-a lot more maintainable than it originally was.  :-)
-
-The changes from the previous version:
-
-	- Bug fix for a problem which was causing it to bomb out if
-	  the two files were identical.
-
-	- Changed indenting so it more closely matches the Mercury
-	  compiler coding standard.
-
-	- Update to use unique arrays (now called array.m).
-
-	- Various minor documentation tweaks.
-
-Oh, and it still runs in nowhere near the speed of GNU diff.
-
-Andrew Bromage  28 Jul 1997
 
Index: diff.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/diff.m,v
retrieving revision 1.8
diff -u -r1.8 diff.m
--- diff.m	1997/07/28 06:00:53	1.8
+++ diff.m	1998/01/07 06:08:56
@@ -9,10 +9,6 @@
 
 % Something very similar to the standard diff utility.  Sort of.  :-)
 
-% At present no options are recognized.  To simulate the --rcs option,
-% find the call to diffs__display_diff, replace it by
-% diffs__display_diff_rcs, and recompile.
-
 %-----------------------------------------------------------------------------%
 
 :- module diff.
@@ -25,17 +21,29 @@
 %-----------------------------------------------------------------------------%
 
 :- implementation.
-:- import_module string, list, file, lcss, diffs, std_util, require.
+:- import_module options, lcss, diff_out, globals, filter.
+:- import_module string, list, file, std_util, require, getopt.
 
 %-----------------------------------------------------------------------------%
 
 	% main: top-level predicate.
 main -->
 	io__command_line_arguments(Args0),
-%	{ getopt__process_options(Args0, Args, Result0) },
-%	postprocess_options(Result0, Result), 
-%	main_2(Result, Args).
-	main_2(no, Args0).
+	{ options__get_option_ops(OptionOps) },
+	{ getopt__process_options(OptionOps, Args0, Args, Result0) },
+	postprocess_options(Result0, Result), 
+	( { Result = yes(Msg) },
+		usage_error(Msg)
+	; { Result = no },
+		globals__io_get_output_style(OutputStyle),
+		( { OutputStyle = help_only } ->
+			usage
+		; { OutputStyle = version_only } ->
+			version
+		;
+			main_2(Args)
+		)
+	).
 
 %-----------------------------------------------------------------------------%
 
@@ -55,20 +63,22 @@
 
 :- pred usage(io__state :: di, io__state :: uo) is det.
 usage -->
-	io__write_string("Usage: diff [options] from-file to-file\n"),
-	io__write_string("Options:\n"),
-	io__write_string("\tnone yet :-)\n").
+	io__write_string("Usage: diff [options] from-file to-file\n\n"),
+	options_help.
+
+:- pred version(io__state :: di, io__state :: uo) is det.
+version -->
+	io__write_string("diff - Mercury diff version 0.4\n").
 
 %-----------------------------------------------------------------------------%
 
-	% main_2 
-:- pred main_2(maybe(string), list(string), io__state, io__state).
-:- mode main_2(in, in, di, uo) is det.
-main_2(yes(Msg), _) -->
-	usage_error(Msg).
-main_2(no, []) -->
+	% main_2 analyses the command-line arguments which are not
+	% options and calls diff__do_diff.
+:- pred main_2(list(string), io__state, io__state).
+:- mode main_2(in, di, uo) is det.
+main_2([]) -->
 	usage_error("missing operand").
-main_2(no, [Fname1 | Rest]) -->
+main_2([Fname1 | Rest]) -->
 	( { Rest = [Fname2 | _] },
 		( { Fname1 = Fname2 } ->
 		% There are no differences between identical files.
@@ -78,11 +88,11 @@
 			% (Note: Both can't be "-" since that was dealt with
 			% in the previous case.)
 			( { Fname1 = "-" } ->
-				file__read_input(Contents1),
+				file__read_input(Fname1, Contents1),
 				file__read_file(Fname2, Contents2)
 			; { Fname2 = "-" } ->
 				file__read_file(Fname1, Contents1),
-				file__read_input(Contents2)
+				file__read_input(Fname2, Contents2)
 			;
 			% Otherwise read the files normally.
 				file__read_file(Fname1, Contents1),
@@ -106,38 +116,32 @@
 %-----------------------------------------------------------------------------%
 
 	% diff__do_diff takes the files plus all the command
-	% line options (all zero of them) and determines what
-	% to do with them.
-:- pred diff__do_diff(file, file, io__state, io__state).
-:- mode diff__do_diff(in, in, di, uo) is det.
-diff__do_diff(File1, File2) -->
-	{ diff__find_diff(File1, File2, Diff) },
-	diffs__display_diff(File1, File2, Diff).
-
-%-----------------------------------------------------------------------------%
-
-	% diff__find_diff takes two files and finds their
-	% differences.
-:- pred diff__find_diff(file, file, diff).
-:- mode diff__find_diff(in, in, out) is det.
-
-	% The process to "diff" two files is:
+	% line options and determines what to do with them.
 	%
-	%	- Convert the files to lists.
+	% At the moment, we're organised into three passes:
 	%
-	%	- Identify the longest common subsequence
-	%	  in the lists.
+	%	- diff_by_lcss takes the two files and produces
+	%	  a diff using the LCSS algorithm.
+	%	- filter_diff analyses the diff, filtering out
+	%	  any edits which the user said that they didn't
+	%	  want to see (using the appropriate command-line
+	%	  options).
+	%	- display_diff outputs the diff in whatever output
+	%	  format the user chose.
 	%
-	%	- Use this information to determine the
-	%	  set of operations required to convert
-	%	  one list to the other.
-diff__find_diff(File1, File2, Diff) :-
-	file__to_list(File1, File1list),
-	file__to_list(File2, File2list),
-	file__get_numlines(File1, Length1),
-	file__get_numlines(File2, Length2),
-	lcss__find_lcss(File1list, File2list, Length1, Length2, Lcss),
-	diffs__to_diff(Lcss, Diff).
+	% TO DO: Options like --ignore-case are probably best handled
+	%        by a pass taking place _before_ the diff algorithm is
+	%        run.  This pass would have the benefit of determining
+	%        whether or not there are any differences or not, in
+	%        the case where the output style chosen doesn't require
+	%        output if there is no diff.  It would also speed up
+	%        the --brief output style.
+:- pred diff__do_diff(file, file, io__state, io__state).
+:- mode diff__do_diff(in, in, di, uo) is det.
+diff__do_diff(File1, File2) -->
+	{ diff_by_lcss(File1, File2, Diff0) },
+	filter_diff(Diff0, File1, File2, Diff),
+	display_diff(File1, File2, Diff).
 
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
Index: file.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/file.m,v
retrieving revision 1.9
diff -u -r1.9 file.m
--- file.m	1997/07/28 06:00:54	1.9
+++ file.m	1998/01/07 03:54:10
@@ -10,6 +10,8 @@
 % This module provides file input.  One can read a file entirely,
 % select a single line from a read file, get the number of lines
 % in a read file, and convert a read file to a list of strings.
+%
+% Every file has a filename attached to it.
 
 %-----------------------------------------------------------------------------%
 
@@ -26,8 +28,8 @@
 
 	% file__read_input reads a file from the input
 	% stream.
-:- pred file__read_input(io__res(file), io__state, io__state).
-:- mode file__read_input(out, di, uo) is det.
+:- pred file__read_input(string, io__res(file), io__state, io__state).
+:- mode file__read_input(in, out, di, uo) is det.
 
 	% file__get_line retrieves a line from a file.
 	% (Lines are numbered from 0.)
@@ -40,39 +42,56 @@
 :- pred file__get_numlines(file, int).
 :- mode file__get_numlines(in, out) is det.
 
+	% file__from_list converts a list of lines to a file.
+:- pred file__from_list(string, list(string), file).
+:- mode file__from_list(in, in, out) is det.
+
 	% file__to_list converts a file into a list of
 	% lines.
 :- pred file__to_list(file, list(string)).
 :- mode file__to_list(in, out) is det.
 
+	% file__get_file_name returns the name of the file.
+:- pred file__get_file_name(file, string).
+:- mode file__get_file_name(in, out) is det.
+
+	% file__set_file_name sets the name of the file.
+:- pred file__set_file_name(file, string, file).
+:- mode file__set_file_name(in, in, out) is det.
+
 %-----------------------------------------------------------------------------%
 
 :- implementation.
-:- import_module array, require, int.
+:- import_module array, require, int, bool.
 
 %-----------------------------------------------------------------------------%
 
-:- type file == array(string).
+:- type file
+	--->	file(
+			string,			% File name
+			array(string)		% Contents
+		).
 
 	% Open the stream, read from the stream, then close
 	% the stream.
-file__read_file(FileName, Contents) -->
+file__read_file(FileName, File) -->
 	io__open_input(FileName, Res),
 	( { Res = ok(InputStream) },
-		file__read_stream(InputStream, Contents0),
+		file__read_stream(InputStream, Contents),
 		io__close_input(InputStream),
-		{ Contents = ok(Contents0) }
+		{ File = ok(file(FileName, Contents)) }
 	; { Res = error(Error) },
-		{ Contents = error(Error) }
+		{ File = error(Error) }
 	).
 
 	% Get the input stream, then read from it.
-file__read_input(ok(Contents)) -->
+file__read_input(FileName, ok(file(FileName, Contents))) -->
 	io__input_stream(InputStream),
 	file__read_stream(InputStream, Contents).
 
 	% file__read_stream is the "real" file reader.
-:- pred file__read_stream(io__input_stream, file, io__state, io__state).
+:- pred file__read_stream(io__input_stream, array(string),
+		io__state, io__state).
 :- mode file__read_stream(in, array_uo, di, uo) is det.
 file__read_stream(Stream, File) -->
 	file__read_stream2(Stream, 0, File).
@@ -81,7 +100,8 @@
 	% read, fill File[LinesIn] to File[LinesOut-1] with the rest
 	% of the lines.  LinesOut is the number of lines in the file.
 	% (Note that line numbering starts at zero.)
-:- pred file__read_stream2(io__input_stream, int, file, io__state, io__state).
+:- pred file__read_stream2(io__input_stream, int, array(string),
+		io__state, io__state).
 :- mode file__read_stream2(in, in, array_uo, di, uo) is det.
 file__read_stream2(Stream, LineNo, File) -->
 	io__read_line(Stream, Res),
@@ -99,15 +119,26 @@
 
 %-----------------------------------------------------------------------------%
 
-file__get_line(File, LineNo, Line) :-
-	array__semidet_lookup(File, LineNo, Line).
+file__get_line(file(_, Contents), LineNo, Line) :-
+	array__semidet_lookup(Contents, LineNo, Line).
 
-file__get_numlines(File, NumLines) :-
-	array__bounds(File, _, NumLines1),
+file__get_numlines(file(_, Contents), NumLines) :-
+	array__bounds(Contents, _, NumLines1),
 	NumLines is NumLines1 + 1.
 
-file__to_list(File, List) :-
-	array__to_list(File, List).
+%-----------------------------------------------------------------------------%
+
+file__to_list(file(_, Contents), List) :-
+	array__to_list(Contents, List).
+
+file__from_list(FileName, List, file(FileName, Contents)) :-
+	array__from_list(List, Contents).
+
+%-----------------------------------------------------------------------------%
+
+file__get_file_name(file(FileName, _), FileName).
+
+file__set_file_name(file(_, B), FileName, file(FileName, B)).
 
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%
Index: lcss.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/samples/diff/lcss.m,v
retrieving revision 1.12
diff -u -r1.12 lcss.m
--- lcss.m	1997/07/28 06:00:55	1.12
+++ lcss.m	1998/01/07 05:11:29
@@ -36,33 +36,36 @@
 :- module lcss.
 
 :- interface.
-:- import_module lcsstype.
+:- import_module difftype, file.
 
-:- pred lcss__find_lcss(list(A) :: in, list(A) :: in, int :: in, int :: in,
-			lcss :: out) is det.
+:- pred diff_by_lcss(file, file, diff).
+:- mode diff_by_lcss(in, in, out) is det.
 
 %-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
 
 :- implementation.
 :- import_module map, require, std_util, int, list, char, array.
 
-%-----------------------------------------------------------------------------%
+diff_by_lcss(File1, File2, Diff) :-
+	file__to_list(File1, File1list),
+	file__to_list(File2, File2list),
+	file__get_numlines(File1, Length1),
+	file__get_numlines(File2, Length2),
+	lcss__find_diff(File1list, File2list, Length1, Length2, Diff).
 
-:- import_module io.
+%-----------------------------------------------------------------------------%
 
-	% For debugging only.  Will be removed in the
-	% final version.
-:- pred lcss__show_lcss(lcss :: in, io__state :: di, io__state :: uo) is det.
-lcss__show_lcss([]) -->
-	io__write_string("[]").
-lcss__show_lcss([X - Y | Lcss]) -->
-	io__write_int(X),
-	io__write_char('-'),
-	io__write_int(Y),
-	io__write_char(','),
-	lcss__show_lcss(Lcss).
+	% The longest common subsequence of two lists can be
+	% represented as an ordered list of "matches".  A match is a
+	% pair of the form I-J where I is the number of an item in
+	% item 1 and J is the number of an item in list 2.  (Note that
+	% an item is numbered by the position immediately preceding
+	% it, i.e., numbering starts at 0.)
+:- type lcss == list(pair(pos,pos)).
 
 %-----------------------------------------------------------------------------%
+%-----------------------------------------------------------------------------%
 
 	% Find a longest common subsequence.  The algorithm
 	% used is very similar to that in:
@@ -91,12 +94,15 @@
 	% reverse, and Thresh[I] contains its 'end' (see above).
 	% Otherwise, Thresh[I] contains 'infinity.'
 
-lcss__find_lcss(List1, List2, L1, L2, Lcss) :-
+:- pred lcss__find_diff(list(string) :: in, list(string) :: in,
+			int :: in, int :: in, diff :: out) is det.
+
+lcss__find_diff(List1, List2, L1, L2, Diff) :-
 
 	% The original version swapped List1 and List2, so that the
 	% first was the largest.  Is this swapping really worthwile?
 	% It doesn't seem to be faster, nor does it consume less
-	% memory.  Therefore I removed it.
+	% memory.  Therefore I removed it.  -- MK
 
 	% The consequence is that build_thresh and build_lcss need a
 	% value representing 'infinity'; previously we could use the
@@ -122,7 +128,7 @@
 
 	lcss__build_matchlist(List1, List2, MatchList),
 	lcss__build_thresh(N, MatchList, Inf, Thresh, Link),
-	lcss__build_lcss(N, Inf, Thresh, Link, L1, L2, Lcss).
+	lcss__build_diff(N, Inf, Thresh, Link, L1, L2, Diff).
 
 %-----------------------------------------------------------------------------%
 
@@ -133,7 +139,7 @@
 	% in increasing order, since this is required by build_thresh
 	% to go through the matches in lexicographical order.
 
-:- pred lcss__build_matchlist(list(A), list(A), list(list(int))).
+:- pred lcss__build_matchlist(list(string), list(string), list(list(int))).
 :- mode lcss__build_matchlist(in, in, out) is det.
 lcss__build_matchlist(List1, List2, MatchList) :-
 
@@ -150,7 +156,7 @@
 
 	lcss__match_map_to_matchlist(List1, Map, MatchList).
 
-:- pred lcss__build_match_map(int, list(A), map(A,list(int))).
+:- pred lcss__build_match_map(int, list(string), map(string,list(int))).
 :- mode lcss__build_match_map(in, in, out) is det.
 lcss__build_match_map(_, [], Map) :-
 	map__init(Map).
@@ -164,7 +170,7 @@
 	),
 	map__set(MapIn, S, Ns1, MapOut).
 
-:- pred lcss__match_map_to_matchlist(list(A), map(A,list(int)), 
+:- pred lcss__match_map_to_matchlist(list(string), map(string,list(int)), 
 		list(list(int))).
 :- mode lcss__match_map_to_matchlist(in, in, out) is det.
 lcss__match_map_to_matchlist([], _, []).
@@ -196,7 +202,9 @@
 	N1 is N + 1,	% Why this size?  Suppose we have two identical
 			% files of length N.  Then the links will be
 			% [], [0-0], [0-0,1-1], ... [0-0..N-N], which
-			% makes N+1 links in total.
+			% makes N+1 links in total.  We could use N
+			% if we pre-screened the files to see if they
+			% were identical or not.
 
 	array__init(N1, Inf, Thresh0),		% Thresh[0..N] := Inf
 	array__set(Thresh0, 0, -1, Thresh1),	% Thresh[0] := -1
@@ -207,8 +215,7 @@
 
 
 :- pred lcss__build_thresh2(int, int, list(list(int)),
-		array(int), array(lcss),
-		array(int), array(lcss)).
+		array(int), array(lcss), array(int), array(lcss)).
 :- mode lcss__build_thresh2(in, in, in,
 		array_di, array_di, array_uo, array_uo) is det.
 lcss__build_thresh2(_N, _I, [], Thresh0, Link0, Thresh0, Link0).
@@ -286,15 +293,12 @@
 	% that Thresh[K]<Inf, and Link[K] should be the Lcss in
 	% reverse.
 
-	% Note that it is here that we add the 'end-of-list match'
-	% L1-L2 to the Lcss.
-
-:- pred lcss__build_lcss(int, int, array(int), array(lcss), int, int, lcss).
-:- mode lcss__build_lcss(in, in, in, in, in, in, out) is det.
-lcss__build_lcss(N, Inf, Thresh, Link, L1, L2, Lcss) :-
+:- pred lcss__build_diff(int, int, array(int), array(lcss), int, int, diff).
+:- mode lcss__build_diff(in, in, in, in, in, in, out) is det.
+lcss__build_diff(N, Inf, Thresh, Link, L1, L2, Diff) :-
 	lcss__build_lcss2(N, Inf, Thresh, K),
-	array__lookup(Link, K, Lcss1),
-	list__reverse([L1 - L2 | Lcss1], Lcss).
+	array__lookup(Link, K, Lcss),
+	lcss__to_diff(L1, L2, Lcss, [], Diff).
 
 
 	% Find the largest K<=N for which Thresh[K]<Inf.
@@ -315,6 +319,29 @@
 	;
 		K = N
 	).
+
+	% lcss__to_diff turns the (reversed) longest common subsequence
+	% of two lists into a list of edits.
+:- pred lcss__to_diff(int, int, lcss, diff, diff).
+:- mode lcss__to_diff(in, in, in, in, out) is det.
+lcss__to_diff(_X, _Y, [], Diff, Diff).
+lcss__to_diff(X, Y, [X1 - Y1 | Lcss], Diff0, Diff) :-
+	X2 is X - 1, Y2 is Y - 1,
+	X3 is X1 + 1, Y3 is Y1 + 1,
+	( X1 = X2 ->
+		( Y1 = Y2 ->
+			Diff1 = Diff0
+		;
+			Diff1 = [add(X, Y3 - Y) | Diff0]
+		)
+	;
+		( Y1 = Y2 ->
+			Diff1 = [delete(X3 - X, Y) | Diff0]
+		;
+			Diff1 = [change(X3 - X, Y3 - Y) | Diff0]
+		)
+	),
+	lcss__to_diff(X1, Y1, Lcss, Diff1, Diff).
 
 %-----------------------------------------------------------------------------%
 %-----------------------------------------------------------------------------%




More information about the developers mailing list