[m-dev.] foldl2 and recursive_fold2 in dir.m have the wrong result type

Zoltan Somogyi zoltan.somogyi at runbox.com
Sat May 15 17:48:10 AEST 2021


2021-05-15 15:30 GMT+10:00 "Julien Fischer" <jfischer at opturion.com>:
>> Look for only relevant .class files.
> 
> I suggest:
> 
>     Look for only relevant .class files when creating Java archives.

That does not fit in 51 chars :-( I hate that limit.

>> > library/dir.m:
>>     Add a new predicate general_foldl2. Unlike foldl2 and recursive_foldl2,
>>     this predicate can be asked to continue going after finding errors,
> 
> Delete "going".

That now says "keep going", since this diff emulates the behavior of make -k.

>>     in which case it will return a structured description of those errors.
>>
>>     To make this possible, generalize the infrastructure shared by foldl2,
>>     recursive_foldl2 and now general_foldl2.
>> > NEWS:
>>     Announce the new predicate.
>> > compiler/module_cmds.m:
>>     Use the new predicate instead of recursive_foldl2.
> 
> Append:
> 
>       when creating Java archives.

That is not quite right: the predicate whose body this diff changes does
not create Java archives; it only finds stuff for the predicate that does that.
I extended the comment to say that. 

>> +:- type file_error
>> +    --->    file_error(string, file_operation, io.error).
>> +            % file_errorx(PathName, Op, Error) means that
> 
> There's a stray 'x' there.

Deleted.

> Also, there's no shortage of space so I suggest s/Op/Operation/

Done.

>> +            % when we tried to perform Op on PathName, the result was Error.
>> +            % PathName should specify a file name relative to the
> 
> Should?  Why not:
> 
>    PathName specifies a file name relative to ....

Done.

>> +:- type file_operation
>> +    --->    file_open
>> +    ;       file_close
>> +    ;       file_get_id
>> +    ;       file_get_type
>> +    ;       file_check_accessibility
> 
> To address an XXX below: maybe file_check_accessibility should
> encode which access types were tested for?  (i.e. the contents
> of the second argument of io.check_file_accessibility.)

At the moment, the only caller of general_foldl2 pays attention
only the file name in a file_error; it ignores both the operation
and the io_error fields. I have no objection to adding such a field,
but would prefer to see a motivating example first.

>> +    % general_foldl2(Params, Pred, DirName, Data0, Data, Errors, !IO).
>> +    %
>> +    % A generalised version of the above, whose behavior is controlled
>> +    % by setting up Params.
>> +    %
>> +    % Whether we recursively process subdirectories depends on whether
>> +    % the fp_subdirs field of Params is do_not_enter_subdirs or enter_subdirs.
>> +    % If it is do_not_enter_subdirs, then we do not process subdirectories
>> +    % at all. If it is enter_subdirs, then we process subdirectories depth
>> +    % first, processing each subdirectory before its contents.
> 
>    processing each subdirectory in a directory before its other contents.

Nope, neither existing foldl2 predicates nor the new one make any such guarantees.
They all process the entries in a directory in the order in which they are read from
the directory stream, and the stream's generator, the OS, makes no such promises.

What that comment was trying to say, and it seems failing, is that this is a preorder
traversal, i.e. if A is the parent dir of file B, then we do guarantee to call Pred
on A before we call Pred on B.

I would use "preorder" in the description, but I fear that with the state of CS
education in the last couple of decades, far too many people wouldn't know
what that means :-( CS graduates these days seem to think of binary trees
as *advanced* data structures :-(, probably because implementing them
in Java is a pain.

This actually reminds me of another issue. The code in module_cmds.m
wants to accumulate a set of *file*names, not *dir*names, so calling Pred
on directories, as far as it is concerned, is both unnecessary work, and
potentially an unnecessary complication. As in: we need to filter out
directory names as they cannot be the .class files we are looking for,
but a malicious actor *could* in theory put a subdirectory whose name
ends in ".class" and has one of the prefixes we are looking for into the
directory we are searching. This is very unlikely, but I would prefer
this code working by design, and not by accident.

It occurs to me that enter_subdirs could have another field, with a value
that is one of (say) call_pred_on_dir and do_not_call_pred_on_dir,
and we would call Pred on the directory only with the former value.
Do you object?

>> +    % When we encounter an error, such as a failure to open a directory
>> +    % for reading, we record an error, but what happens after that
> 
> s/an/that/

Done.
 
>> +    % depends on the fp_on_error field of Params. If this field is
>> +    % on_error_stop, then we stop the traversal then and there,
>> +    % Therefore with that setting, we will return at most one error.
>> +    % If it is on_error_keep_going, we continue with the traversal.
>> +    % Therefore with that setting, we can return more than one error.
> 
> That last bit is missing some punctuation however I suggest the
> following
> 
>     If the field is on_error_stop, then we stop the traversal and will
>     return at most one error.
>     If the field is on_error_keep_oing, we continue the traversal and
>     may return more than one error.

The two halves of those two sentences don't go with each other,
so I rewrote that prose to combine the best of your proposal
and what I originally wrote. Feel free to edit it post commit.

>  +    % Regardless of the setting of fp_on_error, we stop the traversal
>> +    % if Pred returns Continue = `no'.
>> +    %
>> +    % In all of these cases, the value of Data will reflect the
> 
> Delete "of these".
> 
>> +    % the results of all the invocations of Pred during the traversal
>> +    % up to the time the traversal was stopped either by an error,
>> +    % by Continue = no, or by running out of files to traverse.
> 
> In the previous paragraph no was quoted; here it isn't.

It is now :-)

>> @@ -1682,47 +1812,54 @@ check_dir_readable(DirName, Res, !IO) :-
>>              ; FileType = shared_memory
>>              ; FileType = unknown
>>              ),
>> -            Res = error(make_io_error(
>> -                "dir.foldl2: pathname is not a directory"))
>> +            % XXX The top level caller may not be dir.foldl2.
> 
> That's an existing problem but we should fixe by passing the identity of the
> top-level caller down.

I disagree. I think we should delete the "dir.foldl2: " part , instead of fixing it,
for two reasons. The first is that most errors we return are created by io.m,
and say things like "read failed: ..."; they do NOT specify the identity of
the predicate that *called* the failed operation in io.m. This means that
code that wants to format an error message can generate consistent output
only if it assumes that a prefix such as "dir.foldl2: " is absent. The second
is that callers of foldl2, recursive_foldl2 and general_foldl2 all know
which of these predicates they called: *if* they want the name of the
foldl2 predicate they called in any error message they print, they can
put  it there themselves. I think it is unlikely that do want that. Even in
the motivating example, the relevant pred name was not "dir.foldl2", but
"list_class_files_for_jar".

>> +            % XXX The message is too verbose for use in a full file_error.
> 
> I'm not sure what you mean by that one.

There is an inherent tension between foldl2 and recursive_foldl2 on the
one hand, and general_foldl2 on the other hand. The general version
returns error info in a structured form, with the name of the file and
the identity of the failed operation separate from the I/O error.
For this use case, including the file name or the name of the op
in the string inside the I/O error would be bad, since it would prevent
users from formatting an error message as they liked. On the other hand,
for the original two versions, whose existing interface can return
no other info for an error than the I/O error itself, *not* including
the filename and the op in the I/O error is what is bad, since they
leave the user with no useful info to track down the error.

We could solve this problem by including the filename and the op
in the message string only if a new field in the params says that
the top level call will return full file_errors, and not just I/O errors. 

> The change is otherwise fine.

Thanks for that.

> (I would note, in passing, that the Java code in this module could
> probably revised in light of Java library additions; I will take a look
> at that after this is committed.)

I will wait for your response to the points raised above before I commit.

Zoltan.


More information about the developers mailing list