[m-dev.] for review: string__fmt

Peter Ross peter.ross at miscrit.be
Tue Aug 8 20:22:45 AEST 2000


On Tue, Aug 08, 2000 at 12:59:32PM +1000, Tyson Dowd wrote:
> On 07-Aug-2000, Peter Ross <peter.ross at miscrit.be> wrote:
> > +/*
> > +** loop to calculate list length + sizeof(Word) in `size' using list in
> > +** `char_list_ptr'
> > +*/
> > +	size = sizeof(Word);
> > +	char_list_ptr = CharList;
> > +	while (! MR_list_is_empty(char_list_ptr)) {
> > +		size++;
> > +		char_list_ptr = MR_list_tail(char_list_ptr);
> > +	}
> > +/*
> > +** allocate (length + 1) bytes of heap space for string
> > +** i.e. (length + 1 + sizeof(Word) - 1) / sizeof(Word) words
> > +*/
> > +	incr_hp_atomic_msg(str_ptr, size / sizeof(Word),
> > +		MR_PROC_LABEL, ""string:string/0"");
> 
> This should probably be a macro of its own in mercury_string.h.
> 
> And it should use MR_Word.
> 
> > +	Str = (char *) str_ptr;
> 
> MR_String.
> 
> But it would be even nicer if a string allocation function just allocated 
> an MR_String to begin with, then you can do away with str_ptr
> altogether.
> 
Will do.

> > +%-----------------------------------------------------------------------------%
> > +
> > +	% This predicate has been optimised to produce the least memory
> > +	% possible.  Which is very useful for programs which do a lot of
> > +	% formatted IO.
> 
> Can I play editor?
> 
> % This predicate has been optimised to use as little memory
> % as possible -- memory usage is a significant problem for programs which 
> % do a lot of formatted IO.
> 
OK.

> > +
> > +specifiers_to_string([], _Polys, []).
> 
> Shouldn't you check that Polys is the empty list?
> 
No, as we check each poly is consumed in the parsing stage.

> > +specifiers_to_string([conv(FormatChars) | Specs], Poly0s, Strings) :-
> > +	string__from_char_list(FormatChars, FormatString),
> > +	( Poly0s = [Poly | Polys] ->
> > +		specifiers_to_string(Specs, Polys, Strings0),
> > +		(
> > +			Poly = i(Int),
> > +			SubString = int_to_string(FormatString, Int)
> > +		;
> > +			Poly = f(Float),
> > +			SubString = float_to_string(FormatString, Float)
> > +		;
> > +			Poly = c(Char),
> > +			SubString = char_to_string(FormatString, Char)
> > +		;
> > +			Poly = s(Str),
> > +			SubString = string_to_string(FormatString, Str)
> > +		),
> > +		Strings = [SubString | Strings0]
> > +		% string__append(SubString, String0, String)
> 
> This shouldn't be commented out here should it?
> 
It is an artifact from a previous implementation, memory wise it is a
lot cheaper to construct a list of strings then call fast_append_list
then it is to append each string onto the start of the string at the end
of each recursive call.


> > +	;
> > +		error("specifiers_to_string")
> > +	).
> > +specifiers_to_string([string(Chars) | Specs], Polys, Strings) :-
> > +	specifiers_to_string(Specs, Polys, Strings0),
> > +		% We need to use sprintf to print the string as it may
> > +		% contain the "%%" string.
> > +	Strings = [string_to_string(string__from_char_list(Chars), "") |
> > +			Strings0].
> > +
> > +	%
> > +	% Parse a format string.
> > +	%
> > +:- pred format_string(list(specifier)::out, list(string__poly_type)::in,
> > +		list(string__poly_type)::out,
> > +		list(char)::in, list(char)::out) is semidet.
> > +
> > +format_string(Results, PolyTypes0, PolyTypes) -->
> > +	=(Init),
> > +	( other(PolyTypes0) ->
> > +		=(Final),
> > +		format_string_2(Result0s, PolyTypes0, PolyTypes),
> > +		{ list__remove_suffix(Init, Final, Prefix) },
> > +		{ Results = [string(Prefix) | Result0s] }
> > +	;
> > +		format_string_2(Results, PolyTypes0, PolyTypes)
> > +	).
> > +
> > +	%
> > +	% Parse a string beginning with a '%' that is a conversion
> > +	% specification.
> > +	%
> > +:- pred format_string_2(list(specifier)::out, list(string__poly_type)::in,
> > +		list(string__poly_type)::out,
> > +		list(char)::in, list(char)::out) is semidet.
> > +
> > +format_string_2(Results, PolyTypes0, PolyTypes) -->
> > +	=(Init),
> > +	( ['%'] ->
> > +		conversion_specification_and_other(PolyTypes0),
> > +		=(Final),
> > +		{ PolyTypes0 = [_ | PolyTypes1] },
> > +		format_string_2(Result0s, PolyTypes1, PolyTypes),
> > +		{ list__remove_suffix(Init, Final, Prefix) },
> > +		{ Results = [conv(Prefix) | Result0s] }
> > +	;
> > +		{ Results = [] },
> > +		{ PolyTypes = PolyTypes0 }
> > +	).
> > +
> > +	%
> > +	% Parse a conversion specification followed by any characters
> > +	% which don't belong to a conversion specification.
> > +	%
> > +:- pred conversion_specification_and_other(list(string__poly_type)::in,
> > +		list(char)::in, list(char)::out) is semidet.
> > +
> > +conversion_specification_and_other(PolyTypes0) -->
> > +	conversion_specification(PolyTypes0),
> > +	{ PolyTypes0 = [_ | PolyTypes] },
> > +	other(PolyTypes).
> 
> This code is confusing.  
> 
> Is there a good reason why you can't just call 
> `conversion_specification' and *then* call `other' in format_string_2?
> If you did this then you could probably just use the first element of
> the PolyTypes to check the `conversion_specification', and the second
> element of the PolyTypes to check `other'.
> 
This is also an artificat from previous implementations.  I assume that
the conversion_specification_and_other will be inlined giving exactly
what you said.

> But since `other' just calls conversion_specification in a negation, it
> seems you run this check twice for each conversion specifier (once in a
> negation to check for the end of "other" and once in the
> convesion_specification itself.  It seems that this is a bit wasteful
> (for a high performance formatter), couldn't you restructure the code to
> do it just once?
> 
Yes, but at the cost of allocating more memory.  Note that conversion
specification strings are quite short so parsing them twice shouldn't be
a major performance hit.  I have taken the approach that in the long run
memory allocation is more expensive then some extra processing.

> Some explanantion of the overall design is missing.  I'm not entirely
> sure how it all hangs together.  So I am not convinced about the
> correctness of this code, and I'm also not convinced it's necessary for
> it to have these little inefficiencies.
> 

It works by taking as many characters as possible from the start of the
format string which don't consume a poly token.  This guarantees that at
the start of format_string_2 we have a '%' which will consume a poly
token.  We then parse this conversion specifier followed by as many
characters as possible which don't consume a poly token leaving us in
the same place.

The lists of characters are constructed to be as long as possible so
as to reduce the amount of memory allocated.

> > +
> > +	%
> > +	% Each conversion specification is introduced by the character
> > +	% '%',  and ends with a conversion specifier.  In between there
> > +	% may be (in this order)  zero  or more  flags,  an optional
> > +	% minimum field width, an optional precision and an optional
> > +	% length modifier.
> > +	% Note we *don't* parse the '%' character as a valid conversion
> > +	% specifier, this ensures that if we parse a conversion
> > +	% specification we should consume one of the string__poly_type
> > +	% tokens.
> > +	%
> > +:- pred conversion_specification(list(string__poly_type)::in,
> > +		list(char)::in, list(char)::out) is semidet.
> > +
> > +conversion_specification(PolyTypes0) -->
> > +	zero_or_more_occurences(flag_mod),
> > +	zero_or_one_occurence(width_mod),
> > +	zero_or_one_occurence(prec_mod),
> > +	zero_or_one_occurence(length_mod),
> > +	spec(PolyTypes0).
> > +
> > +	%
> > +	% Do we have a flag?
> > +	%
> > +:- pred flag_mod(list(char)::in, list(char)::out) is semidet.
> > +
> > +flag_mod --> ['#'].
> > +flag_mod --> ['0'].
> > +flag_mod --> ['-'].
> > +flag_mod --> [' '].
> > +flag_mod --> ['+'].
> > +
> > +	%
> > +	% Do we have a minimum field width?
> > +	%
> > +:- pred width_mod(list(char)::in, list(char)::out) is semidet.
> > +
> > +width_mod --> 
> > +	non_zero_digit,
> > +	zero_or_more_occurences(digit).
> > +
> > +	%
> > +	% Do we have a precision?
> > +	%
> > +:- pred prec_mod(list(char)::in, list(char)::out) is semidet.
> > +
> > +prec_mod --> 
> > +	['.'],
> > +	non_zero_digit,
> > +	zero_or_more_occurences(digit).
> > +
> > +	%
> > +	% Do we have a length modifier?
> > +	%
> > +:- pred length_mod(list(char)::in, list(char)::out) is semidet.
> > +
> > +length_mod --> ['h'], ( ['h'] -> [] ; [] ).
> > +length_mod --> ['l'], ( ['l'] -> [] ; [] ).
> > +length_mod --> ['L'].
> > +length_mod --> ['j'].
> > +length_mod --> ['z'].
> > +length_mod --> ['t'].
> > +
> > +	%
> > +	% Do we have a valid conversion specifier?
> > +	% We check to ensure that the specifier also matches the type
> > +	% from the input list.
> > +	% Note we *don't* parse the '%' character as a valid conversion
> > +	% specifier, this ensures that if we parse a conversion
> > +	% specification we should consume one of the string__poly_type
> > +	% tokens.
> > +	%
> > +:- pred spec(list(string__poly_type)::in,
> > +		list(char)::in, list(char)::out) is semidet.
> > +
> > +	% valid integer conversion specifiers
> > +spec([P | _Ps]) --> { P = i(_) }, ['d'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['i'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['o'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['u'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['x'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['X'].
> > +
> > +	% valid float conversion specifiers
> > +spec([P | _Ps]) --> { P = f(_) }, ['e'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['E'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['f'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['F'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['g'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['G'].
> > +% spec([P | _Ps]) --> { P = f(_) }, ['a'].
> > +% spec([P | _Ps]) --> { P = f(_) }, ['A'].
> 
> Why are these commented out?
> 
They are valid for sprintf but not by the documentation of
string__format.


> 
> If you fix the documentation and MR_ thingies it's fine to check in.
> However I'd like to revisit the implementation and see whether the
> things I suggested to restructure the code could be done (and make it
> any more efficient) or are a load of rubbish.  So can you post another
> diff before you commit and we will discuss it.
> 

I have done all the MR_ stuff as a seperate diff and commited it.

I will look at yours and Fergus review comments and post a new diff
soon.

Pete
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list