[m-rev.] for review and benchmarking: speed up file name creation in write_deps_file.m
Peter Wang
novalazy at gmail.com
Thu Sep 7 15:05:25 AEST 2023
On Thu, 07 Sep 2023 11:16:55 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> For review and benchmarking by Peter. Can you please
> check out the effect of this diff on the benchmark you used
> to evaluate the effectiveness of the cache that this diff modifies?
The benchmark I used is to make a copy of the compiler directory,
then in that directory run the same command as mmake depend:
mmc --generate-dependencies --grade hlc.gc --mercury-linkage static --flags COMP_FLAGS mercury_compile
with different values of MERCURY_COMPILER.
I see no difference without or with the patch:
Benchmark #1: ./test.stock
Time (mean ± σ): 2.149 s ± 0.014 s [User: 2.039 s, System: 0.106 s]
Range (min … max): 2.128 s … 2.165 s 10 runs
Benchmark #2: ./test.wdf
Time (mean ± σ): 2.148 s ± 0.041 s [User: 2.036 s, System: 0.107 s]
Range (min … max): 2.092 s … 2.237 s 10 runs
And also no difference with --use-subdirs added to command line options.
> Julien, can you please have a look at the comment on
> dir.relative_path_name_from_components? It implies
> that the predicate allows the construction of pathnames such as
> a/b/c/d from ["a", "b", "c", "d"], but does NOT allow
> the construction of the same pathname from ["a/b/c", "d"],
> even though the code does the right thing anyway.
> The relationship of that code, which merely adds
> the dir separator char between the components,
> and the / predicate, which does a lot more,
> should also be better documented.
I introduced relative_path_name_from_components specifically to be fast
when you KNOW the list elements are simple directory components, and you
only want a relative path name. Directory separators, drive letters, ., ..,
and any other stuff that dir./ tries to deal with are out of scope as it
would slow down relative_path_name_from_components for the case that is
it designed for. We can make that clearer in the documentation as you
say.
It so happens that on Windows if you call
relative_path_name_from_components(["a/b/c", "d"]), it will return
"a/b/c\d" which IS valid in that Windows accepts both slashes as
directory separators, but may not be want you want, e.g. to present to users.
> diff --git a/compiler/write_deps_file.m b/compiler/write_deps_file.m
> index d223f652c..5f4eee6f7 100644
> --- a/compiler/write_deps_file.m
> +++ b/compiler/write_deps_file.m
...
> @@ -1583,22 +1545,24 @@ generate_dv_file(Globals, SourceFileName, ModuleName, DepsMap,
> ModuleMakeVarName ++ ".dep_errs",
> list.map(add_suffix(".dep_err"), SourceFiles)),
>
> - make_module_file_names_with_suffix(Globals,
> + % ZZZ
> + make_module_file_names_with_ext(Globals,
ZZZ
> +:- pred make_module_file_name_in_loop(globals::in, ext::in,
> + maybe(dir_name)::in, string::in, module_name::in, file_name::out,
> + module_file_name_cache::in, module_file_name_cache::out,
> + io::di, io::uo) is det.
> +
> +make_module_file_name_in_loop(Globals, Ext, MaybeDirPath, ExtStr,
> + ModuleName, FileName, !Cache, !IO) :-
> + % Keep this code in sync with make_module_file_name below.
> + % NOTE This could be a complete switch on Ext, with an explicit decision
> + % for each category of extension about whether it is worth caching files
> + % with those extensions. However, the statistics below, gathered from
> + % the execution of a bootcheck, show that such discrimination is not
> + % neecessary for good performance.
> + ModuleNameExt = module_name_and_ext(ModuleName, Ext),
> + ( if map.search(!.Cache, ModuleNameExt, FileName0) then
> + trace [
> + compile_time(flag("write_deps_file_cache")),
> + run_time(env("WRITE_DEPS_FILE_CACHE")),
> + io(!TIO)
> + ] (
> + record_cache_hit(Ext, !TIO)
> + ),
> + FileName = FileName0
> + else
> + trace [
> + compile_time(flag("write_deps_file_cache")),
> + run_time(env("WRITE_DEPS_FILE_CACHE")),
> + io(!TIO)
> + ] (
> + record_cache_miss(Ext, !TIO)
> + ),
> + % This code performs the job of
> + % module_name_to_file_name(Globals, From, Ext, ModuleName, FileName)
> + % but with the directory path and extension string parts done
> + % just done once in make_module_file_names_with_ext, instead of
> + % being repeated each time here.
done just once
> + BaseNameNoExt = module_name_to_base_file_name_no_ext(Ext, ModuleName),
> + BaseName = BaseNameNoExt ++ ExtStr,
> + (
> + MaybeDirPath = no,
> + FileName = BaseName
> + ;
> + MaybeDirPath = yes(DirPath),
> + % XXX This cheats, since DirPath may contain directory separators.
> + % The documentation of relative_path_name_from_components forbids
> + % this, but its code does not care, and does the right thing
> + % anyway.
> + FileName =
> + dir.relative_path_name_from_components([DirPath, BaseName])
In this case I think you should just use string.join_list.
> + ),
> + trace [compile_time(flag("sanity_check_write_deps_file"))] (
> + module_name_to_file_name(Globals, $pred, Ext,
> + ModuleName, FileNameB),
> + expect(unify(FileName, FileNameB), $pred,
> + "FileName = FileNameB")
> + ),
> + % We cache result of the translation, in order to save on
> + % temporary string construction.
> + map.det_insert(ModuleNameExt, FileName, !Cache)
> + ).
> +
The diff looks fine.
Since I didn't measure any difference, I think there's no need to keep
make_module_file_names_with_ext and make_module_file_name_in_loop.
It's up to you.
Peter
More information about the reviews
mailing list