[m-rev.] for review: Add a tool to help maintain Copyright lines.

Peter Wang novalazy at gmail.com
Fri Feb 9 18:17:00 AEDT 2024


On Fri, 09 Feb 2024 15:55:00 +1100 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> 
> On 2024-02-09 15:16 +11:00 AEDT, "Peter Wang" <novalazy at gmail.com> wrote:
> > tools/update_copyright.m:
> > tools/.gitignore:
> >     Add the update_copyright program. It is deliberately not integrated
> >     into the build system as it should be compiled once, manually, then
> >     left in the workspace across checkouts, mmake clean, etc.
> 
> The update_copyright program should go into its own directory.
> I wouldn't want to start mixing Mercury code with sh code.
> 

What should I name the directory, update_copyright? Or maybe git_hooks?

> > +++ b/tools/update_copyright.m
> > @@ -0,0 +1,356 @@
> > +%---------------------------------------------------------------------------%
> > +% vim: ft=mercury ts=4 sw=4 et
> > +%---------------------------------------------------------------------------%
> > +% Copyright (C) 2024 YesLogic Pty. Ltd.
> > +% Copyright (C) 2024 The Mercury team.
> > +% This file may only be copied under the terms of the GNU General
> > +% Public License - see the file COPYING in the Mercury distribution.
> > +%---------------------------------------------------------------------------%
> > +%
> > +% update_copyright [OPTIONS] [FILES]...
> > +%
> > +% Options:
> > +%
> > +%   -s, --suffix STR    Match only Copyright lines with STR in the suffix.
> > +%
> > +% This program updates the first matching Copyright line of each file to
> > +% include the current year. If one or more file names are specified on the
> > +% command line, then those files will be updated in-place (files already
> > +% containing up-to-date Copyright lines will not be modified).
> > +% If no file names are specified, the input will be read from standard input,
> > +% and the output written to standard output.
> > +%
> > +%---------------------------------------------------------------------------%
> 
> Put the usage info *after* the paragraph that explains
> what the program does.

Ok.

> > +:- type mod_state
> > +    --->    unmodified
> > +    ;       found_unmodified
> > +    ;       found_modified.
> 
> I would add something to "mod_state" to indicate that what it is
> talking about is the copyright line; that is what is found/not found
> and modified or not modified.
> 

Ok.

> > +:- pred option_default(option::out, option_data::out) is multi.
> > +
> > +option_default(suffix, maybe_string(no)).
> > +option_default(dummy, int(0)).
> 
> I would add a comment about dummy being there to make the pred multi,
> or, even better I would replace that "option" with another, -q for quiet
> (since -s for silent would clash with suffix) which shuts up any non-error output.
> 

You can add it if you like.

> > +:- pred process_file(int::in, maybe(string)::in, string::in, bool::out,
> > +    io::di, io::uo) is det.
> > +
> > +process_file(CurrentYear, MaybeExpectSuffix, FileName, Continue, !IO) :-
> > +    io.open_input(FileName, OpenInputRes, !IO),
> > +    (
> > +        OpenInputRes = ok(InputStream),
> > +        read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
> > +            [], RevLines, unmodified, ModState, !IO),
> > +        io.close_input(InputStream, !IO),
> 
> Is there some reason why you didn't use read_named_file_as_lines?
> 

I forgot it exists.

I tried it, but I don't find it better for two reasons:

There is no read_input_stream_as_lines so the stdin case ends up
having to call read_file_as_string and split_into_lines.
(We could delete the code for stdin as well; /dev/stdin can be passed on
the command line if it's useful.)

read_named_file_as_lines loses the newlines at the end of each line,
which will need to be added to the output. Files SHOULD have a final
newline, but I'd rather preserve the input as much as possible.

> > +:- pred read_lines_loop(io.text_input_stream::in, int::in, maybe(string)::in,
> > +    list(string)::in, list(string)::out, mod_state::in, mod_state::out,
> > +    io::di, io::uo) is det.
> > +
> > +read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
> > +        !AccLines, !ModState, !IO) :-
> 
> Rename !AccLines to !RevLines. The name for this data structure should be
> the same in caller and callee, and the lines *are* reversed.
> 

Fine.

> > +:- pred parse_copyright_line(string::in, maybe(string)::in,
> > +    string::out, list(year_range)::out, string::out) is semidet.
> > +
> > +parse_copyright_line(Line, MaybeExpectSuffix, Prefix, Ranges, Suffix) :-
> > +    string.sub_string_search(Line, "Copyright ", AfterCopyright),
> > +    find_prefix_end(Line, ' ', AfterCopyright, PrefixEnd),
> > +    find_suffix_start(Line, PrefixEnd, PrefixEnd, SuffixStart),
> 
> You use prefix and suffix to refer to the parts of the line before
> and after the year ranges, but this is quite unusual terminology.
> I would use something more like "find_start_of_range_sequence"
> and "find_end_of_range_sequence". But this is moot, because ...

I don't see how it's unusual, obviously.

> 
> I would also convert the part of Line starting at AfterCopyright
> into a list of chars, and process that. It should be simpler, easier
> to read and understand, and since it would be done only for one line,
> the perf impact would be negligible. And in that case, you would
> need only one pred, named maybe find_year_ranges, and it could
> convert the digits into integers and accumulate the ranges of integers
> as it goes along. In short, I would rewrite this part of the program from scratch.
> 
> If you want, you can commit the code as is, and I would rewrite it along
> the above lines.
> 

You can do it if you want.

> > +:- pred write_rev_lines(io.output_stream::in, list(string)::in,
> > +    io::di, io::uo) is det.
> > +
> > +write_rev_lines(Stream, RevLines, !IO) :-
> > +    list.foldr(io.write_string(Stream), RevLines, !IO).
> 
> I would use list.reverse and then list.foldl, to avoid bad perf for e.g. options.m.
> And I wouldn't create a predicate for them.
> 

Done. Though, I can't measure any difference on a file even 32x as big
as options.m.

> > +++ b/tools/update_copyright.pre-commit
> > @@ -0,0 +1,32 @@
> > +#!/usr/bin/env bash
> > ...
> > +if [ ! -x "$update_copyright" ]; then
> 
> We should use "test" instead of "[".
> 

Done.

> > +changed_files=($(git diff --name-only --cached --diff-filter=ACMR -- \
> > +    '*.[mchly]' '*.java' ))
> 
> Don't we have some .cs files under git control?

Yes, but it's actually a .cs.in file.

> 
> I can never remember whether bash supports *.{m,c,h,l,y,java,cs} syntax,
> but if it does, I would use it.

bash does, but the patterns are git pathspecs, not expanded by bash.

> 
> Note that we also have some shell scripts under git control, and they
> do NOT have suffixes. I guess we could find them by looking for an
> initial #!.

I've grepped for an existing Copyright line, if the file name
doesn't match a handful of common patterns.

> 
> > +if [ "${#changed_files}" -eq 0 ]; then
> 
> As above.
> 
> > +"$update_copyright" --suffix 'Mercury team' -- "${changed_files[@]}" &&
> > +git add -u -- "${changed_files[@]}"
> 
> I saw Peter's comment on "git add" while writing this, and I agree with him.
> 
> Zoltan.

Here is an interdiff.

Peter


diff --git a/update_copyright/.gitignore b/update_copyright/.gitignore
new file mode 100644
index 000000000..a800d003d
--- /dev/null
+++ b/update_copyright/.gitignore
@@ -0,0 +1,14 @@
+update_copyright
+update_copyright.c
+update_copyright.c_date
+update_copyright.d
+update_copyright.date
+update_copyright.dep*
+update_copyright.dv
+update_copyright.err
+update_copyright.int
+update_copyright.int2
+update_copyright.mh
+update_copyright.mih
+update_copyright.o
+update_copyright_init.*
diff --git a/tools/update_copyright.m b/update_copyright/update_copyright.m
similarity index 87%
rename from tools/update_copyright.m
rename to update_copyright/update_copyright.m
index 93ef3e510..0f7742d7a 100644
--- a/tools/update_copyright.m
+++ b/update_copyright/update_copyright.m
@@ -7,12 +7,6 @@
 % Public License - see the file COPYING in the Mercury distribution.
 %---------------------------------------------------------------------------%
 %
-% update_copyright [OPTIONS] [FILES]...
-%
-% Options:
-%
-%   -s, --suffix STR    Match only Copyright lines with STR in the suffix.
-%
 % This program updates the first matching Copyright line of each file to
 % include the current year. If one or more file names are specified on the
 % command line, then those files will be updated in-place (files already
@@ -20,6 +14,18 @@
 % If no file names are specified, the input will be read from standard input,
 % and the output written to standard output.
 %
+% Usage: update_copyright [OPTIONS] [FILES]...
+%
+% Options:
+%
+%   -s, --suffix STR    Match only Copyright lines with STR in the suffix.
+%
+% Exit status:
+%
+%   0   No files modified.
+%   1   One or more files modified.
+%   2   An error occurred.
+%
 %---------------------------------------------------------------------------%
 
 :- module update_copyright.
@@ -51,9 +57,9 @@
     ;       dummy.
 
 :- type mod_state
-    --->    unmodified
-    ;       found_unmodified
-    ;       found_modified.
+    --->    unmodified          % no copyright line found yet
+    ;       found_unmodified    % found copyright, already up-to-date
+    ;       found_modified.     % found copyright, updated
 
 %---------------------------------------------------------------------------%
 
@@ -68,11 +74,7 @@ main(!IO) :-
             MaybeExpectSuffix),
         (
             NonOptionArgs = [],
-            io.input_stream(InputStream, !IO),
-            read_lines_loop(InputStream, Year, MaybeExpectSuffix,
-                [], RevLines, unmodified, _ModState, !IO),
-            io.output_stream(OutputStream, !IO),
-            write_rev_lines(OutputStream, RevLines, !IO)
+            process_stdin(Year, MaybeExpectSuffix, !IO)
         ;
             NonOptionArgs = [_ | _],
             process_files(Year, MaybeExpectSuffix, NonOptionArgs, !IO)
@@ -95,7 +97,7 @@ long_option("suffix", suffix).
 :- pred option_default(option::out, option_data::out) is multi.
 
 option_default(suffix, maybe_string(no)).
-option_default(dummy, int(0)).
+option_default(dummy, int(0)).  % force multi
 
 %---------------------------------------------------------------------------%
 
@@ -106,6 +108,15 @@ current_year(Year, !IO) :-
     localtime(Time, TM, !IO),
     Year = 1900 + TM ^ tm_year.
 
+:- pred process_stdin(int::in, maybe(string)::in, io::di, io::uo) is det.
+
+process_stdin(CurrentYear, MaybeExpectSuffix, !IO) :-
+    io.input_stream(InputStream, !IO),
+    read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
+        [], RevLines, unmodified, _ModState, !IO),
+    io.output_stream(OutputStream, !IO),
+    list.foldl(io.write_string(OutputStream), reverse(RevLines), !IO).
+
 :- pred process_files(int::in, maybe(string)::in, list(string)::in,
     io::di, io::uo) is det.
 
@@ -144,9 +155,11 @@ process_file(CurrentYear, MaybeExpectSuffix, FileName, Continue, !IO) :-
             io.open_output(FileName, OpenOutputRes, !IO),
             (
                 OpenOutputRes = ok(OutputStream),
-                write_rev_lines(OutputStream, RevLines, !IO),
+                list.foldl(io.write_string(OutputStream),
+                    reverse(RevLines), !IO),
                 io.close_output(OutputStream, !IO),
                 io.format("Modified: %s\n", [s(FileName)], !IO),
+                maybe_set_modified_exit_status(!IO),
                 Continue = yes
             ;
                 OpenOutputRes = error(Error),
@@ -167,7 +180,7 @@ process_file(CurrentYear, MaybeExpectSuffix, FileName, Continue, !IO) :-
     io::di, io::uo) is det.
 
 read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
-        !AccLines, !ModState, !IO) :-
+        !RevLines, !ModState, !IO) :-
     io.read_line_as_string(InputStream, ReadRes, !IO),
     (
         ReadRes = ok(Line),
@@ -180,17 +193,17 @@ read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
             sort(Ranges0, Ranges1),
             ( if add_to_ranges(CurrentYear, Ranges1, Ranges) then
                 make_copyright_line(Prefix, Ranges, Suffix, NewLine),
-                !:AccLines = [NewLine | !.AccLines],
+                !:RevLines = [NewLine | !.RevLines],
                 !:ModState = found_modified
             else
-                !:AccLines = [Line | !.AccLines],
+                !:RevLines = [Line | !.RevLines],
                 !:ModState = found_unmodified
             )
         else
-            !:AccLines = [Line | !.AccLines]
+            !:RevLines = [Line | !.RevLines]
         ),
         read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
-            !AccLines, !ModState, !IO)
+            !RevLines, !ModState, !IO)
     ;
         ReadRes = eof
     ;
@@ -331,12 +344,6 @@ range_to_string(Range) = Str :-
 
 %---------------------------------------------------------------------------%
 
-:- pred write_rev_lines(io.output_stream::in, list(string)::in,
-    io::di, io::uo) is det.
-
-write_rev_lines(Stream, RevLines, !IO) :-
-    list.foldr(io.write_string(Stream), RevLines, !IO).
-
 :- pred report_io_error(string::in, io.error::in, io::di, io::uo) is det.
 
 report_io_error(Prefix, Error, !IO) :-
@@ -349,7 +356,18 @@ report_error_message(Message, !IO) :-
     io.stderr_stream(ErrorStream, !IO),
     io.write_string(ErrorStream, Message, !IO),
     io.nl(ErrorStream, !IO),
-    io.set_exit_status(1, !IO).
+    io.set_exit_status(2, !IO).
+
+:- pred maybe_set_modified_exit_status(io::di, io::uo) is det.
+
+maybe_set_modified_exit_status(!IO) :-
+    % Don't override exit status for general errors.
+    io.get_exit_status(ExitStatus0, !IO),
+    ( if ExitStatus0 = 0 then
+        io.set_exit_status(1, !IO)
+    else
+        true
+    ).
 
 %---------------------------------------------------------------------------%
 :- end_module update_copyright.
diff --git a/update_copyright/update_copyright.pre-commit b/update_copyright/update_copyright.pre-commit
new file mode 100755
index 000000000..8673ade1a
--- /dev/null
+++ b/update_copyright/update_copyright.pre-commit
@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+#
+# You can copy/symlink this script to .git/hooks/pre-commit.
+# It will update the Copyright lines of all files that are going to be
+# modified in the commit, before the commit is made.
+#
+# You need to compile the update_copyright.m program:
+#   mmc --mercury-linkage static update_copyright.m
+#
+# You can clean the intermediate files with:
+#   git clean -fx 'update_copyright.*' 'update_copyright_init.*'
+#
+set -e
+
+rootdir=$( git rev-parse --show-toplevel )
+update_copyright="$rootdir/update_copyright/update_copyright"
+
+# Only continue if the update_copyright program has been compiled.
+if test ! -x "$update_copyright"
+then
+    exit
+fi
+
+# Find changed files for this commit.
+changed_files=($( git diff --name-only --cached --diff-filter=ACMR ))
+update_files=()
+unknown_files=()
+
+for file in "${changed_files[@]}"
+do
+    case $file in
+        *.[mchly] | *.in | *.java | *Mmake*)
+            update_files+=( "$file" )
+            ;;
+        *)
+            unknown_files+=( "$file" )
+            ;;
+    esac
+done
+
+if test "${#unknown_files}" -gt 0
+then
+    update_files+=($( grep -sl 'Copyright .* Mercury team' "${unknown_files[@]}" \
+        || true ))
+fi
+
+if test "${#update_files}" -gt 0
+then
+    exec "$update_copyright" --suffix 'Mercury team' -- "${update_files[@]}"
+fi


More information about the reviews mailing list