[m-dev.] for review: string__fmt

Tyson Dowd trd at cs.mu.OZ.AU
Tue Aug 8 12:59:32 AEST 2000


On 07-Aug-2000, Peter Ross <peter.ross at miscrit.be> wrote:
> Hi,
> 
> For review by anyone.
> 
> 
> ===================================================================
> 
> 
> Estimated hours taken: 8
> 
> library/string.m:
>     Implement a new predicate string__fmt.  It provides almost the same
>     functionality as string__format, but uses a lot less memory and is
>     more compatible with the C version of sprintf seeing as that is what
>     it calls to do all of its formatting.
> 
> 
> Index: library/string.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/library/string.m,v
> retrieving revision 1.122
> diff -u -r1.122 string.m
> --- library/string.m	2000/04/27 10:24:25	1.122
> +++ library/string.m	2000/08/07 14:32:20
> @@ -109,12 +109,11 @@
>  
>  :- pred string__to_char_list(string, list(char)).
>  :- mode string__to_char_list(in, out) is det.
> +:- mode string__to_char_list(out, in) is det.
>  
>  :- pred string__from_char_list(list(char), string).
>  :- mode string__from_char_list(in, out) is det.
> -:- mode string__from_char_list(out, in) is semidet.
> -	% XXX second mode should be det too
> -	% (but this turns out to be tricky to implement)
> +:- mode string__from_char_list(out, in) is det.
>  
>  :- pred string__from_rev_char_list(list(char), string).
>  :- mode string__from_rev_char_list(in, out) is det.
> @@ -341,8 +340,16 @@
>  %		It is always better to include a `.' to remove ambiguity.  This
>  %		interpretation is non-standard and may change.
>  %
> -%		Numbers are truncated by a precision value, not rounded off.
>  
> +:- pred string__fmt(string::in,
> +		list(string__poly_type)::in, string::out) is det.
> +% Has almost equivalent behavour to string__format except that it
> +% doesn't handle * modifier for precision and width.
> +% It also rounds numbers by precision value, not truncates.
> +% The main advantage of using this version is that it uses *much* less
> +% memory then string__format and I am a lot more confident of its
> +% correctness.

Better to make these two a point list, like

% string__fmt has equivalent behaviour to string__format except:
%	- it doesn't handle the * modifier for precision and width
%	- it rounds numbers by precision value instead of truncating 
% The main advantage of using this version is that it uses *much* less
% memory then string__format.

This also fixes some typos.

(Leave the correctness bit for the log message).

It might be worthwhile noting in the code that we could implement 
string__format in terms of string__fmt (if we handle those two cases
somehow).

> +
>  %------------------------------------------------------------------------------%
>  
>  :- type string__poly_type --->
> @@ -576,16 +583,68 @@
>  		string__int_to_base_string_2(NegN1, Base, Str1),
>  		string__append(Str1, DigitString, Str)
>  	).
> +
> +string__from_char_list(CharList, Str) :-
> +	string__to_char_list(Str, CharList).
> +
> +/*-----------------------------------------------------------------------*/
> +
> +/*
> +:- pred string__to_char_list(string, list(char)).
> +:- mode string__to_char_list(in, out) is det.
> +:- mode string__to_char_list(out, in) is det.
> +*/
> +
> +:- pragma c_code(string__to_char_list(Str::in, CharList::out),
> +		[will_not_call_mercury, thread_safe], "{
> +	const char *p = Str + strlen(Str);

Please use MR_ConstString instead of const char *.

> +	CharList = MR_list_empty_msg(MR_PROC_LABEL);
> +	while (p > Str) {
> +		p--;
> +		CharList = MR_list_cons_msg((UnsignedChar) *p, CharList,

Please use MR_UnsignedChar instead of UnsignedChar.

> +			MR_PROC_LABEL);
> +	}
> +}").
> +
> +:- pragma c_code(string__to_char_list(Str::out, CharList::in),
> +		[will_not_call_mercury, thread_safe], "{
> +		/* mode (out, in) is det */
> +	Word char_list_ptr;
> +	size_t size;
> +	Word str_ptr;

Please use MR_Word instead of Word.

> +/*
> +** loop to calculate list length + sizeof(Word) in `size' using list in
> +** `char_list_ptr'
> +*/
> +	size = sizeof(Word);
> +	char_list_ptr = CharList;
> +	while (! MR_list_is_empty(char_list_ptr)) {
> +		size++;
> +		char_list_ptr = MR_list_tail(char_list_ptr);
> +	}
> +/*
> +** allocate (length + 1) bytes of heap space for string
> +** i.e. (length + 1 + sizeof(Word) - 1) / sizeof(Word) words
> +*/
> +	incr_hp_atomic_msg(str_ptr, size / sizeof(Word),
> +		MR_PROC_LABEL, ""string:string/0"");

This should probably be a macro of its own in mercury_string.h.

And it should use MR_Word.

> +	Str = (char *) str_ptr;

MR_String.

But it would be even nicer if a string allocation function just allocated 
an MR_String to begin with, then you can do away with str_ptr
altogether.

> +/*
> +** loop to copy the characters from the char_list to the string
> +*/
> +	size = 0;
> +	char_list_ptr = CharList;
> +	while (! MR_list_is_empty(char_list_ptr)) {
> +		Str[size++] = MR_list_head(char_list_ptr);
> +		char_list_ptr = MR_list_tail(char_list_ptr);
> +	}
> +/*
> +** null terminate the string
> +*/
> +	Str[size] = '\\0';
> +}").
>  
> -% NB: it would be more efficient to do this directly (using pragma c_code)
> -string__to_char_list(String, CharList) :-
> -	string__to_int_list(String, IntList),
> -	string__int_list_to_char_list(IntList, CharList).
> -
> -% NB: it would be more efficient to do this directly (using pragma c_code)
> -string__from_char_list(CharList, String) :-
> -	string__char_list_to_int_list(CharList, IntList),
> -	string__to_int_list(String, IntList).
> +/*-----------------------------------------------------------------------*/
>  
>  %
>  % We could implement from_rev_char_list using list__reverse and from_char_list,
> @@ -755,6 +814,41 @@
>  
>  %-----------------------------------------------------------------------------%
>  
> +	% Implementation of append_list that uses C as this minimises the
> +	% amount of garbage created.
> +:- func fast_append_list(list(string)) = string.
> +:- pragma c_code(fast_append_list(Strs::in) = (Str::out),
> +		[will_not_call_mercury, thread_safe], "{
> +	Word	list = Strs;
> +	Word	tmp;
> +	size_t	len = 0;

MR_Word.

> +
> +		/* Determine the total len of all strings */
> +	while (!MR_list_is_empty(list)) {
> +		len += strlen((char *) MR_list_head(list));
> +		list = MR_list_tail(list);
> +	}
> +
> +		/* Allocate enough word aligned memory for the string */
> +	incr_hp_atomic_msg(tmp, (len + sizeof(Word)) / sizeof(Word),
> +			MR_PROC_LABEL, ""string:string/0"");
> +	Str = (String) tmp;
> +

Same deal.  That macro will pay for itself quickly.

> +		/* Copy the strings into the new memory */
> +	len = 0;
> +	list = Strs;
> +	while (!MR_list_is_empty(list)) {
> +		strcpy((char *) Str + len, (char *) MR_list_head(list));

MR_String

> +		len += strlen((char *) MR_list_head(list));

MR_String

> +		list = MR_list_tail(list);
> +	}
> +
> +		/* Set the last character to the null char */
> +	*(Str + len) = '\\0';

I much prefer Str[len] = '\\0';

> +}").
> +
> +%-----------------------------------------------------------------------------%
> +
>  	% Note - string__hash is also defined in code/imp.h
>  	% The two definitions must be kept identical.
>  
> @@ -1553,6 +1647,322 @@
>  :- pred string__special_precision_and_width(int).
>  :- mode string__special_precision_and_width(out) is det.
>  string__special_precision_and_width(-1).
> +
> +%-----------------------------------------------------------------------------%
> +
> +	% This predicate has been optimised to produce the least memory
> +	% possible.  Which is very useful for programs which do a lot of
> +	% formatted IO.

Can I play editor?

% This predicate has been optimised to use as little memory
% as possible -- memory usage is a significant problem for programs which 
% do a lot of formatted IO.


> +string__fmt(FormatString, PolyList, String) :-
> +	(
> +		format_string(Specifiers, PolyList, [],
> +				to_char_list(FormatString), [])
> +	->
> +		specifiers_to_string(Specifiers, PolyList, Strings),
> +		String = fast_append_list(Strings)
> +	;
> +		error("string__fmt: format string invalid.")
> +	).
> +
> +:- type specifier
> +	--->	conv(list(char))	% a list of chars which contains
> +					% one conversion specifier.
> +	;	string(list(char)).	% a list of chars which may only
> +					% contain the "%%" conversion
> +					% specifier.
> +
> +:- pred specifiers_to_string(list(specifier)::in, list(string__poly_type)::in,
> +		list(string)::out) is det.

A comment describing what this predicate is intended to do would be
nice.

> +
> +specifiers_to_string([], _Polys, []).

Shouldn't you check that Polys is the empty list?

> +specifiers_to_string([conv(FormatChars) | Specs], Poly0s, Strings) :-
> +	string__from_char_list(FormatChars, FormatString),
> +	( Poly0s = [Poly | Polys] ->
> +		specifiers_to_string(Specs, Polys, Strings0),
> +		(
> +			Poly = i(Int),
> +			SubString = int_to_string(FormatString, Int)
> +		;
> +			Poly = f(Float),
> +			SubString = float_to_string(FormatString, Float)
> +		;
> +			Poly = c(Char),
> +			SubString = char_to_string(FormatString, Char)
> +		;
> +			Poly = s(Str),
> +			SubString = string_to_string(FormatString, Str)
> +		),
> +		Strings = [SubString | Strings0]
> +		% string__append(SubString, String0, String)

This shouldn't be commented out here should it?

> +	;
> +		error("specifiers_to_string")
> +	).
> +specifiers_to_string([string(Chars) | Specs], Polys, Strings) :-
> +	specifiers_to_string(Specs, Polys, Strings0),
> +		% We need to use sprintf to print the string as it may
> +		% contain the "%%" string.
> +	Strings = [string_to_string(string__from_char_list(Chars), "") |
> +			Strings0].
> +
> +	%
> +	% Parse a format string.
> +	%
> +:- pred format_string(list(specifier)::out, list(string__poly_type)::in,
> +		list(string__poly_type)::out,
> +		list(char)::in, list(char)::out) is semidet.
> +
> +format_string(Results, PolyTypes0, PolyTypes) -->
> +	=(Init),
> +	( other(PolyTypes0) ->
> +		=(Final),
> +		format_string_2(Result0s, PolyTypes0, PolyTypes),
> +		{ list__remove_suffix(Init, Final, Prefix) },
> +		{ Results = [string(Prefix) | Result0s] }
> +	;
> +		format_string_2(Results, PolyTypes0, PolyTypes)
> +	).
> +
> +	%
> +	% Parse a string beginning with a '%' that is a conversion
> +	% specification.
> +	%
> +:- pred format_string_2(list(specifier)::out, list(string__poly_type)::in,
> +		list(string__poly_type)::out,
> +		list(char)::in, list(char)::out) is semidet.
> +
> +format_string_2(Results, PolyTypes0, PolyTypes) -->
> +	=(Init),
> +	( ['%'] ->
> +		conversion_specification_and_other(PolyTypes0),
> +		=(Final),
> +		{ PolyTypes0 = [_ | PolyTypes1] },
> +		format_string_2(Result0s, PolyTypes1, PolyTypes),
> +		{ list__remove_suffix(Init, Final, Prefix) },
> +		{ Results = [conv(Prefix) | Result0s] }
> +	;
> +		{ Results = [] },
> +		{ PolyTypes = PolyTypes0 }
> +	).
> +
> +	%
> +	% Parse a conversion specification followed by any characters
> +	% which don't belong to a conversion specification.
> +	%
> +:- pred conversion_specification_and_other(list(string__poly_type)::in,
> +		list(char)::in, list(char)::out) is semidet.
> +
> +conversion_specification_and_other(PolyTypes0) -->
> +	conversion_specification(PolyTypes0),
> +	{ PolyTypes0 = [_ | PolyTypes] },
> +	other(PolyTypes).

This code is confusing.  

Is there a good reason why you can't just call 
`conversion_specification' and *then* call `other' in format_string_2?
If you did this then you could probably just use the first element of
the PolyTypes to check the `conversion_specification', and the second
element of the PolyTypes to check `other'.

But since `other' just calls conversion_specification in a negation, it
seems you run this check twice for each conversion specifier (once in a
negation to check for the end of "other" and once in the
convesion_specification itself.  It seems that this is a bit wasteful
(for a high performance formatter), couldn't you restructure the code to
do it just once?

Some explanantion of the overall design is missing.  I'm not entirely
sure how it all hangs together.  So I am not convinced about the
correctness of this code, and I'm also not convinced it's necessary for
it to have these little inefficiencies.

> +
> +	%
> +	% Each conversion specification is introduced by the character
> +	% '%',  and ends with a conversion specifier.  In between there
> +	% may be (in this order)  zero  or more  flags,  an optional
> +	% minimum field width, an optional precision and an optional
> +	% length modifier.
> +	% Note we *don't* parse the '%' character as a valid conversion
> +	% specifier, this ensures that if we parse a conversion
> +	% specification we should consume one of the string__poly_type
> +	% tokens.
> +	%
> +:- pred conversion_specification(list(string__poly_type)::in,
> +		list(char)::in, list(char)::out) is semidet.
> +
> +conversion_specification(PolyTypes0) -->
> +	zero_or_more_occurences(flag_mod),
> +	zero_or_one_occurence(width_mod),
> +	zero_or_one_occurence(prec_mod),
> +	zero_or_one_occurence(length_mod),
> +	spec(PolyTypes0).
> +
> +	%
> +	% Do we have a flag?
> +	%
> +:- pred flag_mod(list(char)::in, list(char)::out) is semidet.
> +
> +flag_mod --> ['#'].
> +flag_mod --> ['0'].
> +flag_mod --> ['-'].
> +flag_mod --> [' '].
> +flag_mod --> ['+'].
> +
> +	%
> +	% Do we have a minimum field width?
> +	%
> +:- pred width_mod(list(char)::in, list(char)::out) is semidet.
> +
> +width_mod --> 
> +	non_zero_digit,
> +	zero_or_more_occurences(digit).
> +
> +	%
> +	% Do we have a precision?
> +	%
> +:- pred prec_mod(list(char)::in, list(char)::out) is semidet.
> +
> +prec_mod --> 
> +	['.'],
> +	non_zero_digit,
> +	zero_or_more_occurences(digit).
> +
> +	%
> +	% Do we have a length modifier?
> +	%
> +:- pred length_mod(list(char)::in, list(char)::out) is semidet.
> +
> +length_mod --> ['h'], ( ['h'] -> [] ; [] ).
> +length_mod --> ['l'], ( ['l'] -> [] ; [] ).
> +length_mod --> ['L'].
> +length_mod --> ['j'].
> +length_mod --> ['z'].
> +length_mod --> ['t'].
> +
> +	%
> +	% Do we have a valid conversion specifier?
> +	% We check to ensure that the specifier also matches the type
> +	% from the input list.
> +	% Note we *don't* parse the '%' character as a valid conversion
> +	% specifier, this ensures that if we parse a conversion
> +	% specification we should consume one of the string__poly_type
> +	% tokens.
> +	%
> +:- pred spec(list(string__poly_type)::in,
> +		list(char)::in, list(char)::out) is semidet.
> +
> +	% valid integer conversion specifiers
> +spec([P | _Ps]) --> { P = i(_) }, ['d'].
> +spec([P | _Ps]) --> { P = i(_) }, ['i'].
> +spec([P | _Ps]) --> { P = i(_) }, ['o'].
> +spec([P | _Ps]) --> { P = i(_) }, ['u'].
> +spec([P | _Ps]) --> { P = i(_) }, ['x'].
> +spec([P | _Ps]) --> { P = i(_) }, ['X'].
> +
> +	% valid float conversion specifiers
> +spec([P | _Ps]) --> { P = f(_) }, ['e'].
> +spec([P | _Ps]) --> { P = f(_) }, ['E'].
> +spec([P | _Ps]) --> { P = f(_) }, ['f'].
> +spec([P | _Ps]) --> { P = f(_) }, ['F'].
> +spec([P | _Ps]) --> { P = f(_) }, ['g'].
> +spec([P | _Ps]) --> { P = f(_) }, ['G'].
> +% spec([P | _Ps]) --> { P = f(_) }, ['a'].
> +% spec([P | _Ps]) --> { P = f(_) }, ['A'].

Why are these commented out?

> +	% Create a string from a float using the format string.
> +:- func float_to_string(string, float) = string.
> +:- pragma c_code(float_to_string(FormatStr::in, Val::in) = (Str::out),
> +		[will_not_call_mercury, thread_safe], "{
> +	char buf[500];
> +	Word tmp;

MR_Word.

> +	sprintf(buf, FormatStr, Val);
> +	incr_hp_atomic_msg(tmp, (strlen(buf) + sizeof(Word)) / sizeof(Word),
> +		MR_PROC_LABEL, ""string:string/0"");

String allocator again.

> +	Str = (char *)tmp;

MR_String.

> +	strcpy(Str, buf);
> +}").
> +
> +	% Create a string from a int using the format string.
> +:- func int_to_string(string, int) = string.
> +:- pragma c_code(int_to_string(FormatStr::in, Val::in) = (Str::out),
> +		[will_not_call_mercury, thread_safe], "{
> +	char buf[500];
> +	Word tmp;

mumble

> +	sprintf(buf, FormatStr, Val);
> +	incr_hp_atomic_msg(tmp, (strlen(buf) + sizeof(Word)) / sizeof(Word),
> +		MR_PROC_LABEL, ""string:string/0"");

mumble

> +	Str = (char *)tmp;

mumble

Same for string_to_string and char_to_string.


If you fix the documentation and MR_ thingies it's fine to check in.
However I'd like to revisit the implementation and see whether the
things I suggested to restructure the code could be done (and make it
any more efficient) or are a load of rubbish.  So can you post another
diff before you commit and we will discuss it.

-- 
       Tyson Dowd           # 
                            #  Surreal humour isn't everyone's cup of fur.
     trd at cs.mu.oz.au        # 
http://www.cs.mu.oz.au/~trd #
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list