[m-rev.] for review: Add a tool to help maintain Copyright lines.
Peter Wang
novalazy at gmail.com
Fri Feb 9 18:17:00 AEDT 2024
On Fri, 09 Feb 2024 15:55:00 +1100 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
>
> On 2024-02-09 15:16 +11:00 AEDT, "Peter Wang" <novalazy at gmail.com> wrote:
> > tools/update_copyright.m:
> > tools/.gitignore:
> > Add the update_copyright program. It is deliberately not integrated
> > into the build system as it should be compiled once, manually, then
> > left in the workspace across checkouts, mmake clean, etc.
>
> The update_copyright program should go into its own directory.
> I wouldn't want to start mixing Mercury code with sh code.
>
What should I name the directory, update_copyright? Or maybe git_hooks?
> > +++ b/tools/update_copyright.m
> > @@ -0,0 +1,356 @@
> > +%---------------------------------------------------------------------------%
> > +% vim: ft=mercury ts=4 sw=4 et
> > +%---------------------------------------------------------------------------%
> > +% Copyright (C) 2024 YesLogic Pty. Ltd.
> > +% Copyright (C) 2024 The Mercury team.
> > +% This file may only be copied under the terms of the GNU General
> > +% Public License - see the file COPYING in the Mercury distribution.
> > +%---------------------------------------------------------------------------%
> > +%
> > +% update_copyright [OPTIONS] [FILES]...
> > +%
> > +% Options:
> > +%
> > +% -s, --suffix STR Match only Copyright lines with STR in the suffix.
> > +%
> > +% This program updates the first matching Copyright line of each file to
> > +% include the current year. If one or more file names are specified on the
> > +% command line, then those files will be updated in-place (files already
> > +% containing up-to-date Copyright lines will not be modified).
> > +% If no file names are specified, the input will be read from standard input,
> > +% and the output written to standard output.
> > +%
> > +%---------------------------------------------------------------------------%
>
> Put the usage info *after* the paragraph that explains
> what the program does.
Ok.
> > +:- type mod_state
> > + ---> unmodified
> > + ; found_unmodified
> > + ; found_modified.
>
> I would add something to "mod_state" to indicate that what it is
> talking about is the copyright line; that is what is found/not found
> and modified or not modified.
>
Ok.
> > +:- pred option_default(option::out, option_data::out) is multi.
> > +
> > +option_default(suffix, maybe_string(no)).
> > +option_default(dummy, int(0)).
>
> I would add a comment about dummy being there to make the pred multi,
> or, even better I would replace that "option" with another, -q for quiet
> (since -s for silent would clash with suffix) which shuts up any non-error output.
>
You can add it if you like.
> > +:- pred process_file(int::in, maybe(string)::in, string::in, bool::out,
> > + io::di, io::uo) is det.
> > +
> > +process_file(CurrentYear, MaybeExpectSuffix, FileName, Continue, !IO) :-
> > + io.open_input(FileName, OpenInputRes, !IO),
> > + (
> > + OpenInputRes = ok(InputStream),
> > + read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
> > + [], RevLines, unmodified, ModState, !IO),
> > + io.close_input(InputStream, !IO),
>
> Is there some reason why you didn't use read_named_file_as_lines?
>
I forgot it exists.
I tried it, but I don't find it better for two reasons:
There is no read_input_stream_as_lines so the stdin case ends up
having to call read_file_as_string and split_into_lines.
(We could delete the code for stdin as well; /dev/stdin can be passed on
the command line if it's useful.)
read_named_file_as_lines loses the newlines at the end of each line,
which will need to be added to the output. Files SHOULD have a final
newline, but I'd rather preserve the input as much as possible.
> > +:- pred read_lines_loop(io.text_input_stream::in, int::in, maybe(string)::in,
> > + list(string)::in, list(string)::out, mod_state::in, mod_state::out,
> > + io::di, io::uo) is det.
> > +
> > +read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
> > + !AccLines, !ModState, !IO) :-
>
> Rename !AccLines to !RevLines. The name for this data structure should be
> the same in caller and callee, and the lines *are* reversed.
>
Fine.
> > +:- pred parse_copyright_line(string::in, maybe(string)::in,
> > + string::out, list(year_range)::out, string::out) is semidet.
> > +
> > +parse_copyright_line(Line, MaybeExpectSuffix, Prefix, Ranges, Suffix) :-
> > + string.sub_string_search(Line, "Copyright ", AfterCopyright),
> > + find_prefix_end(Line, ' ', AfterCopyright, PrefixEnd),
> > + find_suffix_start(Line, PrefixEnd, PrefixEnd, SuffixStart),
>
> You use prefix and suffix to refer to the parts of the line before
> and after the year ranges, but this is quite unusual terminology.
> I would use something more like "find_start_of_range_sequence"
> and "find_end_of_range_sequence". But this is moot, because ...
I don't see how it's unusual, obviously.
>
> I would also convert the part of Line starting at AfterCopyright
> into a list of chars, and process that. It should be simpler, easier
> to read and understand, and since it would be done only for one line,
> the perf impact would be negligible. And in that case, you would
> need only one pred, named maybe find_year_ranges, and it could
> convert the digits into integers and accumulate the ranges of integers
> as it goes along. In short, I would rewrite this part of the program from scratch.
>
> If you want, you can commit the code as is, and I would rewrite it along
> the above lines.
>
You can do it if you want.
> > +:- pred write_rev_lines(io.output_stream::in, list(string)::in,
> > + io::di, io::uo) is det.
> > +
> > +write_rev_lines(Stream, RevLines, !IO) :-
> > + list.foldr(io.write_string(Stream), RevLines, !IO).
>
> I would use list.reverse and then list.foldl, to avoid bad perf for e.g. options.m.
> And I wouldn't create a predicate for them.
>
Done. Though, I can't measure any difference on a file even 32x as big
as options.m.
> > +++ b/tools/update_copyright.pre-commit
> > @@ -0,0 +1,32 @@
> > +#!/usr/bin/env bash
> > ...
> > +if [ ! -x "$update_copyright" ]; then
>
> We should use "test" instead of "[".
>
Done.
> > +changed_files=($(git diff --name-only --cached --diff-filter=ACMR -- \
> > + '*.[mchly]' '*.java' ))
>
> Don't we have some .cs files under git control?
Yes, but it's actually a .cs.in file.
>
> I can never remember whether bash supports *.{m,c,h,l,y,java,cs} syntax,
> but if it does, I would use it.
bash does, but the patterns are git pathspecs, not expanded by bash.
>
> Note that we also have some shell scripts under git control, and they
> do NOT have suffixes. I guess we could find them by looking for an
> initial #!.
I've grepped for an existing Copyright line, if the file name
doesn't match a handful of common patterns.
>
> > +if [ "${#changed_files}" -eq 0 ]; then
>
> As above.
>
> > +"$update_copyright" --suffix 'Mercury team' -- "${changed_files[@]}" &&
> > +git add -u -- "${changed_files[@]}"
>
> I saw Peter's comment on "git add" while writing this, and I agree with him.
>
> Zoltan.
Here is an interdiff.
Peter
diff --git a/update_copyright/.gitignore b/update_copyright/.gitignore
new file mode 100644
index 000000000..a800d003d
--- /dev/null
+++ b/update_copyright/.gitignore
@@ -0,0 +1,14 @@
+update_copyright
+update_copyright.c
+update_copyright.c_date
+update_copyright.d
+update_copyright.date
+update_copyright.dep*
+update_copyright.dv
+update_copyright.err
+update_copyright.int
+update_copyright.int2
+update_copyright.mh
+update_copyright.mih
+update_copyright.o
+update_copyright_init.*
diff --git a/tools/update_copyright.m b/update_copyright/update_copyright.m
similarity index 87%
rename from tools/update_copyright.m
rename to update_copyright/update_copyright.m
index 93ef3e510..0f7742d7a 100644
--- a/tools/update_copyright.m
+++ b/update_copyright/update_copyright.m
@@ -7,12 +7,6 @@
% Public License - see the file COPYING in the Mercury distribution.
%---------------------------------------------------------------------------%
%
-% update_copyright [OPTIONS] [FILES]...
-%
-% Options:
-%
-% -s, --suffix STR Match only Copyright lines with STR in the suffix.
-%
% This program updates the first matching Copyright line of each file to
% include the current year. If one or more file names are specified on the
% command line, then those files will be updated in-place (files already
@@ -20,6 +14,18 @@
% If no file names are specified, the input will be read from standard input,
% and the output written to standard output.
%
+% Usage: update_copyright [OPTIONS] [FILES]...
+%
+% Options:
+%
+% -s, --suffix STR Match only Copyright lines with STR in the suffix.
+%
+% Exit status:
+%
+% 0 No files modified.
+% 1 One or more files modified.
+% 2 An error occurred.
+%
%---------------------------------------------------------------------------%
:- module update_copyright.
@@ -51,9 +57,9 @@
; dummy.
:- type mod_state
- ---> unmodified
- ; found_unmodified
- ; found_modified.
+ ---> unmodified % no copyright line found yet
+ ; found_unmodified % found copyright, already up-to-date
+ ; found_modified. % found copyright, updated
%---------------------------------------------------------------------------%
@@ -68,11 +74,7 @@ main(!IO) :-
MaybeExpectSuffix),
(
NonOptionArgs = [],
- io.input_stream(InputStream, !IO),
- read_lines_loop(InputStream, Year, MaybeExpectSuffix,
- [], RevLines, unmodified, _ModState, !IO),
- io.output_stream(OutputStream, !IO),
- write_rev_lines(OutputStream, RevLines, !IO)
+ process_stdin(Year, MaybeExpectSuffix, !IO)
;
NonOptionArgs = [_ | _],
process_files(Year, MaybeExpectSuffix, NonOptionArgs, !IO)
@@ -95,7 +97,7 @@ long_option("suffix", suffix).
:- pred option_default(option::out, option_data::out) is multi.
option_default(suffix, maybe_string(no)).
-option_default(dummy, int(0)).
+option_default(dummy, int(0)). % force multi
%---------------------------------------------------------------------------%
@@ -106,6 +108,15 @@ current_year(Year, !IO) :-
localtime(Time, TM, !IO),
Year = 1900 + TM ^ tm_year.
+:- pred process_stdin(int::in, maybe(string)::in, io::di, io::uo) is det.
+
+process_stdin(CurrentYear, MaybeExpectSuffix, !IO) :-
+ io.input_stream(InputStream, !IO),
+ read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
+ [], RevLines, unmodified, _ModState, !IO),
+ io.output_stream(OutputStream, !IO),
+ list.foldl(io.write_string(OutputStream), reverse(RevLines), !IO).
+
:- pred process_files(int::in, maybe(string)::in, list(string)::in,
io::di, io::uo) is det.
@@ -144,9 +155,11 @@ process_file(CurrentYear, MaybeExpectSuffix, FileName, Continue, !IO) :-
io.open_output(FileName, OpenOutputRes, !IO),
(
OpenOutputRes = ok(OutputStream),
- write_rev_lines(OutputStream, RevLines, !IO),
+ list.foldl(io.write_string(OutputStream),
+ reverse(RevLines), !IO),
io.close_output(OutputStream, !IO),
io.format("Modified: %s\n", [s(FileName)], !IO),
+ maybe_set_modified_exit_status(!IO),
Continue = yes
;
OpenOutputRes = error(Error),
@@ -167,7 +180,7 @@ process_file(CurrentYear, MaybeExpectSuffix, FileName, Continue, !IO) :-
io::di, io::uo) is det.
read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
- !AccLines, !ModState, !IO) :-
+ !RevLines, !ModState, !IO) :-
io.read_line_as_string(InputStream, ReadRes, !IO),
(
ReadRes = ok(Line),
@@ -180,17 +193,17 @@ read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
sort(Ranges0, Ranges1),
( if add_to_ranges(CurrentYear, Ranges1, Ranges) then
make_copyright_line(Prefix, Ranges, Suffix, NewLine),
- !:AccLines = [NewLine | !.AccLines],
+ !:RevLines = [NewLine | !.RevLines],
!:ModState = found_modified
else
- !:AccLines = [Line | !.AccLines],
+ !:RevLines = [Line | !.RevLines],
!:ModState = found_unmodified
)
else
- !:AccLines = [Line | !.AccLines]
+ !:RevLines = [Line | !.RevLines]
),
read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
- !AccLines, !ModState, !IO)
+ !RevLines, !ModState, !IO)
;
ReadRes = eof
;
@@ -331,12 +344,6 @@ range_to_string(Range) = Str :-
%---------------------------------------------------------------------------%
-:- pred write_rev_lines(io.output_stream::in, list(string)::in,
- io::di, io::uo) is det.
-
-write_rev_lines(Stream, RevLines, !IO) :-
- list.foldr(io.write_string(Stream), RevLines, !IO).
-
:- pred report_io_error(string::in, io.error::in, io::di, io::uo) is det.
report_io_error(Prefix, Error, !IO) :-
@@ -349,7 +356,18 @@ report_error_message(Message, !IO) :-
io.stderr_stream(ErrorStream, !IO),
io.write_string(ErrorStream, Message, !IO),
io.nl(ErrorStream, !IO),
- io.set_exit_status(1, !IO).
+ io.set_exit_status(2, !IO).
+
+:- pred maybe_set_modified_exit_status(io::di, io::uo) is det.
+
+maybe_set_modified_exit_status(!IO) :-
+ % Don't override exit status for general errors.
+ io.get_exit_status(ExitStatus0, !IO),
+ ( if ExitStatus0 = 0 then
+ io.set_exit_status(1, !IO)
+ else
+ true
+ ).
%---------------------------------------------------------------------------%
:- end_module update_copyright.
diff --git a/update_copyright/update_copyright.pre-commit b/update_copyright/update_copyright.pre-commit
new file mode 100755
index 000000000..8673ade1a
--- /dev/null
+++ b/update_copyright/update_copyright.pre-commit
@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+#
+# You can copy/symlink this script to .git/hooks/pre-commit.
+# It will update the Copyright lines of all files that are going to be
+# modified in the commit, before the commit is made.
+#
+# You need to compile the update_copyright.m program:
+# mmc --mercury-linkage static update_copyright.m
+#
+# You can clean the intermediate files with:
+# git clean -fx 'update_copyright.*' 'update_copyright_init.*'
+#
+set -e
+
+rootdir=$( git rev-parse --show-toplevel )
+update_copyright="$rootdir/update_copyright/update_copyright"
+
+# Only continue if the update_copyright program has been compiled.
+if test ! -x "$update_copyright"
+then
+ exit
+fi
+
+# Find changed files for this commit.
+changed_files=($( git diff --name-only --cached --diff-filter=ACMR ))
+update_files=()
+unknown_files=()
+
+for file in "${changed_files[@]}"
+do
+ case $file in
+ *.[mchly] | *.in | *.java | *Mmake*)
+ update_files+=( "$file" )
+ ;;
+ *)
+ unknown_files+=( "$file" )
+ ;;
+ esac
+done
+
+if test "${#unknown_files}" -gt 0
+then
+ update_files+=($( grep -sl 'Copyright .* Mercury team' "${unknown_files[@]}" \
+ || true ))
+fi
+
+if test "${#update_files}" -gt 0
+then
+ exec "$update_copyright" --suffix 'Mercury team' -- "${update_files[@]}"
+fi
More information about the reviews
mailing list