[m-rev.] [m-dev.] foldl2 and recursive_fold2 in dir.m have the wrong result type

Julien Fischer jfischer at opturion.com
Sat May 15 21:26:35 AEST 2021


On Sat, 15 May 2021, Zoltan Somogyi wrote:

>
> 2021-05-15 15:30 GMT+10:00 "Julien Fischer" <jfischer at opturion.com>:

>>> +    % general_foldl2(Params, Pred, DirName, Data0, Data, Errors, !IO).
>>> +    %
>>> +    % A generalised version of the above, whose behavior is controlled
>>> +    % by setting up Params.
>>> +    %
>>> +    % Whether we recursively process subdirectories depends on whether
>>> +    % the fp_subdirs field of Params is do_not_enter_subdirs or enter_subdirs.
>>> +    % If it is do_not_enter_subdirs, then we do not process subdirectories
>>> +    % at all. If it is enter_subdirs, then we process subdirectories depth
>>> +    % first, processing each subdirectory before its contents.
>>
>>    processing each subdirectory in a directory before its other contents.
>
> Nope, neither existing foldl2 predicates nor the new one make any such guarantees.
> They all process the entries in a directory in the order in which they are read from
> the directory stream, and the stream's generator, the OS, makes no such promises.
>
> What that comment was trying to say, and it seems failing, is that this is a preorder
> traversal, i.e. if A is the parent dir of file B, then we do guarantee to call Pred
> on A before we call Pred on B.
>
> I would use "preorder" in the description, but I fear that with the state of CS
> education in the last couple of decades, far too many people wouldn't know
> what that means :-( CS graduates these days seem to think of binary trees
> as *advanced* data structures :-(, probably because implementing them
> in Java is a pain.

However parlous the state of CS education may be, I think using the term
"preoder" in the standard library documentation is reaonsable.
The standard library docuemntation already assumes that the reader knows
what things like an inorder traversal or topological sort are.

> This actually reminds me of another issue. The code in module_cmds.m
> wants to accumulate a set of *file*names, not *dir*names, so calling Pred
> on directories, as far as it is concerned, is both unnecessary work, and
> potentially an unnecessary complication. As in: we need to filter out
> directory names as they cannot be the .class files we are looking for,
> but a malicious actor *could* in theory put a subdirectory whose name
> ends in ".class" and has one of the prefixes we are looking for into the
> directory we are searching. This is very unlikely, but I would prefer
> this code working by design, and not by accident.
>
> It occurs to me that enter_subdirs could have another field, with a value
> that is one of (say) call_pred_on_dir and do_not_call_pred_on_dir,
> and we would call Pred on the directory only with the former value.
> Do you object?

I have no objections.

>>> @@ -1682,47 +1812,54 @@ check_dir_readable(DirName, Res, !IO) :-
>>>              ; FileType = shared_memory
>>>              ; FileType = unknown
>>>              ),
>>> -            Res = error(make_io_error(
>>> -                "dir.foldl2: pathname is not a directory"))
>>> +            % XXX The top level caller may not be dir.foldl2.
>>
>> That's an existing problem but we should fixe by passing the identity of the
>> top-level caller down.
>
> I disagree. I think we should delete the "dir.foldl2: " part , instead of fixing it,
> for two reasons. The first is that most errors we return are created by io.m,
> and say things like "read failed: ..."; they do NOT specify the identity of
> the predicate that *called* the failed operation in io.m. This means that
> code that wants to format an error message can generate consistent output
> only if it assumes that a prefix such as "dir.foldl2: " is absent. The second
> is that callers of foldl2, recursive_foldl2 and general_foldl2 all know
> which of these predicates they called: *if* they want the name of the
> foldl2 predicate they called in any error message they print, they can
> put  it there themselves. I think it is unlikely that do want that. Even in
> the motivating example, the relevant pred name was not "dir.foldl2", but
> "list_class_files_for_jar".

Fair enough.

>>> +            % XXX The message is too verbose for use in a full file_error.
>>
>> I'm not sure what you mean by that one.
>
> There is an inherent tension between foldl2 and recursive_foldl2 on the
> one hand, and general_foldl2 on the other hand. The general version
> returns error info in a structured form, with the name of the file and
> the identity of the failed operation separate from the I/O error.
> For this use case, including the file name or the name of the op
> in the string inside the I/O error would be bad, since it would prevent
> users from formatting an error message as they liked. On the other hand,
> for the original two versions, whose existing interface can return
> no other info for an error than the I/O error itself, *not* including
> the filename and the op in the I/O error is what is bad, since they
> leave the user with no useful info to track down the error.
>
> We could solve this problem by including the filename and the op
> in the message string only if a new field in the params says that
> the top level call will return full file_errors, and not just I/O errors.

Given that foldl2 and recursive_foldl2 (in their current form) will be
going away in the longer term anyway, I think that XXX can be resolved
once that happens.

Julien.


More information about the reviews mailing list