[m-rev.] for review: library changes for `mmc --make' on Windows

Fergus Henderson fjh at cs.mu.OZ.AU
Tue Jul 15 12:47:05 AEST 2003


On 15-Jul-2003, Simon Taylor <stayl at cs.mu.OZ.AU> wrote:
> library/dir.m:
> 	Handle Windows-style paths.
> 
> 	Change the determinism of dir.basename and dir.split_name.
> 	dir.basename now fails for root directories (a new function
> 	dir.basename_det calls error/1 rather than failing).
> 	dir.split_name fails for root directories or if the pathname
> 	passed doesn't contain a directory separator.
> 
> 	Add predicates dir.make_directory and dir.path_name_is_absolute.
> 
> 	Add a multi predicate dir.is_directory separator which
> 	returns all separators for the platform (including '/' on
> 	Windows), not just the standard one.
> 	
> 	Add a function dir.parent_directory (returns "..").
> 	
> 	Add dir.foldl2 and dir.recursive_foldl2, to iterate through
> 	the entries in a directory (and maybe its subdirectories).
> 
> 	Change '/' to correctly handle Windows paths of the form
> 	"C:"/"foo" and "\"/"foo".
> 
> 	Don't add repated directory separators in '/'.

s/repated/repeated/

> diff -u library/dir.m library/dir.m
...
> +	% Note that the predicates and functions in this module may
> +	% change directory separators in paths passed to them
> +	% to the normal separator for the platform.

If that comment is supposed to apply to all predicates and functions in the
module, then I think it should go at the top of the module.

Also, I think any occurrences of implementation dependent behaviour
should be explicitly flagged with the words "implementation dependent"
or "system depend" or "unspecified" or something like that, rather than
just using the word "may".

But in general, implementation dependent behaviour should be avoided
whenever possible.  I think it would be better to explicitly specify
either (a) that the original directory separators will be preserved
(but that any *new* directory separators added will use the platform's
default directory separator), or (b) that *all* directory separators
in path names will be replaced with the platform's default directory
separator.

How difficult would it be to preserve the original
directory separator?

>  	% Returns ".".
>  :- func dir__this_directory = string.
> @@ -39,25 +48,39 @@
>  	% dir__split_name(PathName, DirName, BaseName).
>  	% Split a filename into a directory part and a filename part.
>  	% Fails for root directories or relative filenames not
> -	% containing a directory separator.
> +	% containing directory information.
> +	% Trailing slashes are removed from PathName before splitting.
> +	% DirName may have a trailing slash.
>  :- pred dir__split_name(string::in, string::out, string::out) is semidet.

Here again I think that if the intent is that the behaviour be
implementation-dependent, the documentation should call attention
to that by using the words "implementation-dependent" or "unspecified",
but that it would be better to explicitly specify it.

>  	% dir__basename(PathName) = BaseName.
>  	% Returns the non-directory part of a filename.
> -	% Fails when given a root directory.
> +	% Fails when given a root directory, "." or "..".
> +	% Trailing slashes in PathName are removed first.
>  :- func dir__basename(string) = string is semidet.
>  :- pred dir__basename(string::in, string::out) is semidet.

What does this do for path names such as "C:" or "C:\"?
The documentation does not make that clear.

> +	% dir__dirname(PathName) = DirName.
>  	% Returns the directory part of a filename.
> +	% Returns PathName if it specifies a root directory.
> +	% Returns `dir__this_directory' when given a filename
> +	% without any directory information (e.g. "foo").
> +	% Returns PathName for Windows paths such as "X:"
> +	% (the current directory on drive `X').
> +	% Trailing slashes in PathName are removed first.
> +	% DirName may have a trailing slash.
> +:- func dir__dirname(string) = string.
>  :- pred dir__dirname(string::in, string::out) is det.

This description seems to imply that dirname("C:\") = "C:",
since trailing slashes are removed first, but that would be
semantically wrong, since "C:" is not the directory part of "C:\".

>  	% Is the path name syntactically an absolute path
>  	% (doesn't check whether the path exists).
> +	% On Unix systems, this means that the path starts with '/'.
> +	% On Microsoft Windows systems a root directory starts
> +	% with a drive root (e.g. 'C:\', '\') or a UNC share
> +	% root (e.g. \\server\share\).
>  :- pred dir__path_name_is_absolute(string::in) is semidet.

The predicate checks whether the path name is an absolute path,
but the documentation here says "a _root_ directory starts with ..." --
did you mean to say "an _absolute_ directory starts with ..."?

Also, since `\' is really relative to the current drive,
I think the documentation should draw the programmers attention
to the fact that the routine's behaviour may not completely match
its name.  For example, perhaps add something like the following:

	% Note that on Microsoft Windows systems, path names starting
	% with '\' (or '/') are technically relative to the current drive,
	% but this routine treats them as absolute.

> +dir__is_directory_separator(Char) :-
> +	( Char = dir__directory_separator
> +	; Char = dir__alt_directory_separator, Char \= dir__directory_separator
> +	).

In the case where dir__directory_separator = dir__alt_directory_separator,
the `out' mode of this routine will leave an unnecessary choice point behind.

It would be better to write it with the "\=" in the first clause,

	( Char = dir__directory_separator, Char \= dir__alt_directory_separator
	; Char = dir__alt_directory_separator
	).

or using an if-then-else:

	dir__is_directory_separator(Char) :-
		( dir__directory_separator = dir__alt_directory_separator ->
			Char = dir__directory_separator
		;
			( Char = dir__directory_separator
			; Char = dir__alt_directory_separator
			)
		).

> +canonicalize_path_chars_2([], RevFileName) = reverse(RevFileName).
> +canonicalize_path_chars_2([C0 | FileName0], RevFileName0) =
> +		canonicalize_path_chars_2(FileName0, RevFileName) :-
> +	% Convert all directory separators to the standard
> +	% separator for the platform.
> +	C = ( is_directory_separator(C0) -> directory_separator ; C0 ),
> +	(
> +		C = directory_separator,
> +		FileName0 = [C2 | _],
> +		dir__is_directory_separator(C2)
> +	->
> +		RevFileName = RevFileName0
>  	;
> -		FileName = strip_repeated_dir_separators_2(FileName0,
> -				[C | RevFileName])
> +		RevFileName = [C | RevFileName0]
>  	).

It would help to have a comment "strip repeated directory separators"
before the if-then-else.

> +dir__dotnet_path_name_is_absolute(FileName) :-
> +	dir__dotnet_path_name_is_absolute_2(FileName),
> +
> +	% The .NET CLI function System.IO.Path.IsPathRooted succeeds for
> +	% paths such as `C:', which specifies a directory relative to the
> +	% current directory on drive C.
> +	\+ (
> +		use_windows_paths,
> +		FileNameLen = length(FileName),
> +		( FileNameLen >= 2 ->
> +			char__is_alpha(string__unsafe_index(FileName, 0)),
> +			string__unsafe_index(FileName, 1) = (':'),
> +			( FileNameLen > 2 ->
> +				string__unsafe_index(FileName, 2)
> +					\= directory_separator `with_type` char

Why do you need the "`with_type` char" here?

Also, shouldn't this use `\+ is_directory_separator(...)'  rather than
`... \= directory_separator'?  Windows (or at least the Microsoft.NET
CLI implementation -- I didn't check direct use of the Win32 API) treats
"//server/share" as a UNC path too, not just "\\server/share".

> -dir__make_path_name(DirName, FileName) = PathName :-
> +dir__make_path_name(DirName, FileName) = DirName/FileName.
> +
> +DirName/FileName =
> +	( if
> +	    (
> +		% Check for construction of relative paths of the form "C:foo".
> +		use_windows_paths,
> +		length(DirName) = 2,
> +		char__is_alpha(string__unsafe_index(DirName, 0)),
> +		string__unsafe_index(DirName, 1) = (':')
> +	    ;
> +		% Don't introduce duplicate directory separators.
> +		% On Windows \\foo (a UNC server specification) is
> +		% not equivalent to \foo (the directory X:\foo, where
> +		% X is the current drive).
> +		string__unsafe_index(DirName,
> +		    string__length(DirName) - 1) =
> +			dir__directory_separator `with_type` char
> +	    )

Likewise here.

> +:- pragma promise_pure(dir__make_directory_2/4).
> +dir__make_directory_2(_::in, _::out, _::di, _::uo) :-
> +	private_builtin__sorry("dir__make_directory").

Why is that needed?
Won't that be handled by the automatic stub generation?

> @@ -822,15 +1018,14 @@
>  	ML_check_dir_readable(DirName, &is_readable, &Result);
>  	if (is_readable) {
>  		dir_pattern_len = strlen(DirName);
> -		MR_allocate_aligned_string_msg(dir_pattern,
> -			dir_pattern_len + 2, MR_PROC_LABEL);
> +		dir_pattern = MR_malloc(dir_pattern_len + 2);
>  		strcpy(dir_pattern, DirName);
>  		dir_pattern[dir_pattern_len] = '\\\\';
>  		dir_pattern[dir_pattern_len + 1] = '*';
>  		dir_pattern[dir_pattern_len + 2] = '\\0';

Off-by-one error in the argument to MR_malloc.


-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list