[m-rev.] for review: improve dir./
Peter Wang
novalazy at gmail.com
Mon Sep 11 11:42:24 AEST 2023
On Sun, 10 Sep 2023 18:39:30 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> Improve the structure of DirName/FleName.
>
> library/dir.m:
> Factor out common code in the code that canonicalizes filenames
> by defining and using is_directory_separator_return_canon.
> Improve the variable names used in the canonicalization predicates
> (which previously used different names for the same concepts).
>
> Add a predicate that tests whether a dirname or filename is already
> in its canonical form. and if so, skip the canonicalization process.
>
> Clarify the code of the / predicate by moving some parts of it
> to separate predicates whose names describe their purpose.
> Some of these predicates are called more than once, so they also
> factor out common code.
>
> Switch from using unsafe to safe code to test whether filenames
> name a drive letter on Windows.
> diff --git a/library/dir.m b/library/dir.m
> index a73fb42b8..b71bf320e 100644
> --- a/library/dir.m
> +++ b/library/dir.m
...
> +:- pred is_path_string_canonical(string::in) is semidet.
> +
> +is_path_string_canonical(Path) :-
> + is_path_string_canonical_loop(Path, 0, prev_char_is_not_separator).
> +
> +:- type canon_prev_char
> + ---> prev_char_is_not_separator
> + ; prev_char_is_separator.
> +
> +:- pred is_path_string_canonical_loop(string::in, int::in, canon_prev_char::in)
> + is semidet.
> +
> +is_path_string_canonical_loop(Path, CurIndex, PrevChar) :-
> + ( if string.index_next(Path, CurIndex, NextIndex, Char) then
The problem is string.index_next incurs a call to string.length for each
call, unlike string.unsafe_index_next. It's quite easy to use
unsafe_index_next safely, though.
> + ( if dir.is_directory_separator_return_canon(Char, CanonChar) then
> + % Two consecutive directory separators may not occur
> + % in a canonical path.
> + PrevChar = prev_char_is_not_separator,
> + % A directory separator may not occur in a canonical path
> + % without itself being a canonical separator.
> + Char = CanonChar,
> + is_path_string_canonical_loop(Path, NextIndex,
> + prev_char_is_separator)
> + else
> + is_path_string_canonical_loop(Path, NextIndex,
> + prev_char_is_not_separator)
> + )
> + else
> + % We have come to the end of Path.
> + true
> + ).
> @@ -868,43 +916,32 @@ dotnet_path_name_is_absolute_2(_) :-
> %---------------------------------------------------------------------------%
>
> DirName0/FileName0 = PathName :-
> - DirName = string.from_char_list(canonicalize_path_chars(
> - string.to_char_list(DirName0))),
> - FileName = string.from_char_list(canonicalize_path_chars(
> - string.to_char_list(FileName0))),
> - ( if
> - dir.path_name_is_absolute(FileName)
> - then
> + ( if is_path_string_canonical(DirName0) then
> + DirName = DirName0
> + else
> + DirName = string.from_char_list(canonicalize_path_chars(
> + string.to_char_list(DirName0)))
> + ),
> + ( if is_path_string_canonical(FileName0) then
> + FileName = FileName0
> + else
> + FileName = string.from_char_list(canonicalize_path_chars(
> + string.to_char_list(FileName0)))
> + ),
> + ( if dir.path_name_is_absolute(FileName) then
> unexpected($pred, "second argument is absolute")
> - else if
> - % Check that FileName is not a relative path of the form "C:foo".
> - use_windows_paths,
> - Length = length(FileName),
> - ( if Length >= 2 then
> - char.is_alpha(string.unsafe_index(FileName, 0)),
> - string.unsafe_index(FileName, 1) = (':'),
> - ( if Length > 2 then
> - not is_directory_separator(string.unsafe_index(FileName, 2))
> - else
> - true
> - )
> - else
> - fail
> - )
> - then
> + else if dir.path_name_is_cur_drive_relative(FileName) then
> unexpected($pred, "second argument is a current drive relative path")
Simon Taylor called these types of paths "current drive relative paths"
or "drive current paths" in commit 89a4d190c, but that terminology is
confusing. Continued below.
> else if
> - DirNameLength = length(DirName),
> (
> % Check for construction of relative paths of the form "C:foo".
> - use_windows_paths,
> - DirNameLength = 2,
> - char.is_alpha(string.unsafe_index(DirName, 0)),
> - string.unsafe_index(DirName, 1) = (':')
> + path_name_names_drive_letter(FileName, ThirdCharIndex),
> + not string.index_next(FileName, ThirdCharIndex, _, _)
> ;
> % Do not introduce duplicate directory separators.
> - % On Windows \\foo (a UNC server specification) is not equivalent
> + % On Windows, \\foo (a UNC server specification) is not equivalent
> % to \foo (the directory X:\foo, where X is the current drive).
> + string.length(DirName, DirNameLength),
> ( if DirNameLength > 0 then
> ends_with_directory_separator(DirName, DirNameLength, _)
> else
> @@ -923,6 +960,43 @@ DirName0/FileName0 = PathName :-
> FileName])
> ).
>
> + % Is FileName a relative path of the form "C:foo"?
> + %
> +:- pred path_name_is_cur_drive_relative(string::in) is semidet.
> +
These kinds of paths are not relative to the *current* drive,
but relative to the current directory on the specified drive.
On Windows, each process has a current drive, and also a current
directory per drive.
I'm not sure there is a short, standard name for these kind of paths.
I suggest "drive relative path" or "drive-letter relative path".
> +path_name_is_cur_drive_relative(FileName) :-
> + % In the following, C stands for any drive letter, and xyz for
> + % non-special characters that can occur in filenames.
> + ( if path_name_names_drive_letter(FileName, ThirdCharIndex) then
> + ( if string.index_next(FileName, ThirdCharIndex, _, ThirdChar) then
> + ( if is_directory_separator(ThirdChar) then
> + % FileName is "C:/xyz", which is a cur drive path,
> + % but an absolute one.
> + fail
Write the example the canonical way, i.e. "C:\xyz".
> + else
> + % FileName is "C:xyz", which is a cur drive relative path.
> + true
> + )
> + else
> + % FileName is "C:", which is a cur drive relative path.
> + true
> + )
> + else
> + % FileName cannot start with "C:".
> + fail
> + ).
> +
> +:- pred path_name_names_drive_letter(string::in, int::out) is semidet.
> +
> +path_name_names_drive_letter(FileName, ThirdCharIndex) :-
> + use_windows_paths,
> + string.index_next(FileName, 0, SecondCharIndex, FirstChar),
> + string.index_next(FileName, SecondCharIndex, ThirdCharIndex, SecondChar),
> + % SecondChar is *much* more unlikely to be ':'
> + % than FirstChar is to pass is_alpha.
> + SecondChar = (':'),
> + char.is_alpha(FirstChar).
Peter
More information about the reviews
mailing list