[m-dev.] for review: string__fmt

Fergus Henderson fjh at cs.mu.OZ.AU
Tue Aug 8 17:26:59 AEST 2000


On 07-Aug-2000, Peter Ross <peter.ross at miscrit.be> wrote:
> 
> library/string.m:
>     Implement a new predicate string__fmt.  It provides almost the same
>     functionality as string__format, but uses a lot less memory and is
>     more compatible with the C version of sprintf seeing as that is what
>     it calls to do all of its formatting.

> Index: library/string.m
> +:- pred string__fmt(string::in,
> +		list(string__poly_type)::in, string::out) is det.
> +% Has almost equivalent behavour to string__format except that it
> +% doesn't handle * modifier for precision and width.
> +% It also rounds numbers by precision value, not truncates.
> +% The main advantage of using this version is that it uses *much* less
> +% memory then string__format and I am a lot more confident of its
> +% correctness.

Comments like "I am a lot more confident of its correctness" are fine
for the log message, but don't belong in the interface documentation,
where it is not at all clear who "I" is.

It seems like poor interface design to have two functions
`string__format' and `string__fmt' that do almost the same thing but
with some subtle differences.  It seems that the differences between
the two are artifacts of their implementation, rather than being a
deliberate interface design choice.  Why not just have one function
that does it right?

For example, couldn't you modify the code for string__fmt to handle
the `*' modifier, and then just replace the old code for string__format
with the new code for string__fmt?

> Index: library/string.m
> @@ -341,8 +340,16 @@
>  %		It is always better to include a `.' to remove ambiguity.  This
>  %		interpretation is non-standard and may change.
>  %
> -%		Numbers are truncated by a precision value, not rounded off.

Why did you remove that comment?
I think it should stay.

> +	% Implementation of append_list that uses C as this minimises the
> +	% amount of garbage created.
> +:- func fast_append_list(list(string)) = string.
> +:- pragma c_code(fast_append_list(Strs::in) = (Str::out),

Why have two seperate procedures `string__append_list' and
`string__fast_append_list'?  Why not just make the ordinary
`string__append_list' fast?

> +:- type specifier
> +	--->	conv(list(char))	% a list of chars which contains
> +					% one conversion specifier.
> +	;	string(list(char)).	% a list of chars which may only
> +					% contain the "%%" conversion
> +					% specifier.

One way to handle `*' conversions would be to change that to

	:- type specifier
		--->	zero_arg_conv(list(char))
				% a format specifier that requires
				% no arguments (i.e. `%%')
		;	one_arg_conv(list(char))
				% a format specifier that requires
				% one argument (e.g. `%d', `%s', etc.)
		;	two_arg_conv(list(char))
				% a format specifier that requires
				% two arguments (e.g. `%*d', `%*s', etc.)
		;	string(list(char)).
				% a list of chars which does not
				% contain `%'

and modify the rest of the code accordingly.

> +:- pred length_mod(list(char)::in, list(char)::out) is semidet.
> +
> +length_mod --> ['h'], ( ['h'] -> [] ; [] ).
> +length_mod --> ['l'], ( ['l'] -> [] ; [] ).
> +length_mod --> ['L'].
> +length_mod --> ['j'].
> +length_mod --> ['z'].
> +length_mod --> ['t'].

Most of these (`hh', `ll', `j', `z', and `t') are new in C99.
They were not included in C89.  If someone uses these length
modifiers, they may run into portability problems.

But in any case, you can't allow arbitrary length modifiers here,
since what you actually pass to sprintf() will have a definite fixed size.
If you pass a value of type `long', but give it the format specifier
`%d' or `%hd', then bad things will happen...

> +	% Create a string from a float using the format string.
> +:- func float_to_string(string, float) = string.

I think it would be clearer to name these functions
`format_float/2', `format_string/2', etc.
rather than `float_to_string/2', `string_to_string/2', etc.

You should document that it is the caller's responsibility to
check the validity of the format string.
	
> +:- pragma c_code(float_to_string(FormatStr::in, Val::in) = (Str::out),
> +		[will_not_call_mercury, thread_safe], "{
> +	char buf[500];
> +	Word tmp;
> +	sprintf(buf, FormatStr, Val);

Here you are passing an `MR_Float' to sprintf().
But the format string might not be valid for an MR_Float.

Probably you should (a) cast the argument to sprintf() from `MR_Float'
to `long double', so that it has a known type, and (b) replace the
length modifier in the format string with a length modifier
appropriate for long double, i.e. "%Lf".
(a) would be done here, while (b) would be done while parsing the
format string.

Likewise for int_to_string, except for that it is a little bit
more complicated.  There are three cases:

	- if MR_Integer is not long long, then you should use `long'
	  and `%ld'

	- if MR_Integer is long long, and the C standard library supports
	  the `%lld' format, then you should use long long and `%lld'.

	- if MR_Integer is long long, and the C standard library does not
	  support the `%lld' format -- as can happen when you're using
	  a modern compiler such as gcc but an old standard library --
	  then you're a bit screwed.  Short of reimplementing printf(),
	  probably the best approach in that situation is to check
	  that the value is in range for `long'; if so, you can cast to
	  long and use `%d', and if not, then you can throw an exception,
	  or perhaps just insert "<overflow>" in the output.

> +	% Create a string from a string using the format string.
> +:- func string_to_string(string, string) = string.
> +:- pragma c_code(string_to_string(FormatStr::in, Val::in) = (Str::out),
> +		[will_not_call_mercury, thread_safe], "{
> +	char buf[500];
> +	Word tmp;
> +	sprintf(buf, FormatStr, Val);
> +	incr_hp_atomic_msg(tmp, (strlen(buf) + sizeof(Word)) / sizeof(Word),
> +		MR_PROC_LABEL, ""string:string/0"");
> +	Str = (char *)tmp;
> +	strcpy(Str, buf);

The fixed-length buffer here is a problem.
At very least, the buffer should be much larger.
But of course that won't solve the problem.

A better approach would be to check the amount of space
you need, and if it doesn't fit in the fixed-size buffer,
then dynamically allocate a big enough buffer.
On systems that support snprintf(), you could use that.
If not, then you could check the length of `Val'...

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list