[m-rev.] for review and benchmarking: speed up file name creation in write_deps_file.m
Zoltan Somogyi
zoltan.somogyi at runbox.com
Fri Sep 8 12:03:15 AEST 2023
On 2023-09-07 15:05 +10:00 AEST, "Peter Wang" <novalazy at gmail.com> wrote:
> I see no difference without or with the patch:
>
> Benchmark #1: ./test.stock
> Time (mean ± σ): 2.149 s ± 0.014 s [User: 2.039 s, System: 0.106 s]
> Range (min … max): 2.128 s … 2.165 s 10 runs
>
> Benchmark #2: ./test.wdf
> Time (mean ± σ): 2.148 s ± 0.041 s [User: 2.036 s, System: 0.107 s]
> Range (min … max): 2.092 s … 2.237 s 10 runs
>
> And also no difference with --use-subdirs added to command line options.
Thanks. Based on this result, I have undone the optimization that turned
out not to be one, and replaced with a long comment describing both
the "optimization" and why it is not being used. I also updated the
commit message accordingly.
> I introduced relative_path_name_from_components specifically to be fast
> when you KNOW the list elements are simple directory components, and you
> only want a relative path name. Directory separators, drive letters, ., ..,
> and any other stuff that dir./ tries to deal with are out of scope as it
> would slow down relative_path_name_from_components for the case that is
> it designed for. We can make that clearer in the documentation as you
> say.
At the moment, dir./ has a complicated structure, that does not make clear
exactly what requirements it imposes on its two operands. I think we should
- create a predicate for checking whether the second arg of dir./
already meets the requirements
- create a predicate for mangling a string into meeting those requirements
- create two more such predicates for the first arg
- create a predicate for checking whether a string meets the requirements
of *both* args in dir./, if this would not duplicate one of the earlier ones
- create a predicate for mangling a string into meeting both requirements
- document that dir.relative_path_from_components mangles the
first dir name on its list to meet the requirements of both args of dir./
and mangles all the other list elements to meet the requirements of
the second arg
- add dir.relative_path_from_unchecked_components (which would be
the current relative_path_from_components, renamed) which is
documented to require its callers to ensure those same conditions.
We could also add a version of dir./, named maybe make_path_name_unchecked,
which would assume the args satisfy its requirements, instead of ensuring that.
I assume that the differences between the requirements of the first
and second args on Unix would just be that the first arg is allowed
to start with "/", while the differences on Windows would be more
extensive.
Opinions?
> It so happens that on Windows if you call
> relative_path_name_from_components(["a/b/c", "d"]), it will return
> "a/b/c\d" which IS valid in that Windows accepts both slashes as
> directory separators, but may not be want you want, e.g. to present to users.
That would not have been a problem, because the "a/b/c" also was
constructed by dir.relative_path_name_from_components, and so
would have used \ as the separator on Windows.
>> diff --git a/compiler/write_deps_file.m b/compiler/write_deps_file.m
>> index d223f652c..5f4eee6f7 100644
>> --- a/compiler/write_deps_file.m
>> +++ b/compiler/write_deps_file.m
> ...
>> @@ -1583,22 +1545,24 @@ generate_dv_file(Globals, SourceFileName, ModuleName, DepsMap,
>> ModuleMakeVarName ++ ".dep_errs",
>> list.map(add_suffix(".dep_err"), SourceFiles)),
>>
>> - make_module_file_names_with_suffix(Globals,
>> + % ZZZ
>> + make_module_file_names_with_ext(Globals,
>
> ZZZ
Deleted.
>> + % This code performs the job of
>> + % module_name_to_file_name(Globals, From, Ext, ModuleName, FileName)
>> + % but with the directory path and extension string parts done
>> + % just done once in make_module_file_names_with_ext, instead of
>> + % being repeated each time here.
>
> done just once
Fixed.
>> + BaseNameNoExt = module_name_to_base_file_name_no_ext(Ext, ModuleName),
>> + BaseName = BaseNameNoExt ++ ExtStr,
>> + (
>> + MaybeDirPath = no,
>> + FileName = BaseName
>> + ;
>> + MaybeDirPath = yes(DirPath),
>> + % XXX This cheats, since DirPath may contain directory separators.
>> + % The documentation of relative_path_name_from_components forbids
>> + % this, but its code does not care, and does the right thing
>> + % anyway.
>> + FileName =
>> + dir.relative_path_name_from_components([DirPath, BaseName])
>
> In this case I think you should just use string.join_list.
Moot, since this code won't be kept.
>> + ),
>> + trace [compile_time(flag("sanity_check_write_deps_file"))] (
>> + module_name_to_file_name(Globals, $pred, Ext,
>> + ModuleName, FileNameB),
>> + expect(unify(FileName, FileNameB), $pred,
>> + "FileName = FileNameB")
>> + ),
>> + % We cache result of the translation, in order to save on
>> + % temporary string construction.
>> + map.det_insert(ModuleNameExt, FileName, !Cache)
>> + ).
>> +
>
> The diff looks fine.
Thank you.
Zoltan.
More information about the reviews
mailing list