[m-dev.] for review: string__fmt
Peter Ross
peter.ross at miscrit.be
Tue Aug 8 20:22:45 AEST 2000
On Tue, Aug 08, 2000 at 12:59:32PM +1000, Tyson Dowd wrote:
> On 07-Aug-2000, Peter Ross <peter.ross at miscrit.be> wrote:
> > +/*
> > +** loop to calculate list length + sizeof(Word) in `size' using list in
> > +** `char_list_ptr'
> > +*/
> > + size = sizeof(Word);
> > + char_list_ptr = CharList;
> > + while (! MR_list_is_empty(char_list_ptr)) {
> > + size++;
> > + char_list_ptr = MR_list_tail(char_list_ptr);
> > + }
> > +/*
> > +** allocate (length + 1) bytes of heap space for string
> > +** i.e. (length + 1 + sizeof(Word) - 1) / sizeof(Word) words
> > +*/
> > + incr_hp_atomic_msg(str_ptr, size / sizeof(Word),
> > + MR_PROC_LABEL, ""string:string/0"");
>
> This should probably be a macro of its own in mercury_string.h.
>
> And it should use MR_Word.
>
> > + Str = (char *) str_ptr;
>
> MR_String.
>
> But it would be even nicer if a string allocation function just allocated
> an MR_String to begin with, then you can do away with str_ptr
> altogether.
>
Will do.
> > +%-----------------------------------------------------------------------------%
> > +
> > + % This predicate has been optimised to produce the least memory
> > + % possible. Which is very useful for programs which do a lot of
> > + % formatted IO.
>
> Can I play editor?
>
> % This predicate has been optimised to use as little memory
> % as possible -- memory usage is a significant problem for programs which
> % do a lot of formatted IO.
>
OK.
> > +
> > +specifiers_to_string([], _Polys, []).
>
> Shouldn't you check that Polys is the empty list?
>
No, as we check each poly is consumed in the parsing stage.
> > +specifiers_to_string([conv(FormatChars) | Specs], Poly0s, Strings) :-
> > + string__from_char_list(FormatChars, FormatString),
> > + ( Poly0s = [Poly | Polys] ->
> > + specifiers_to_string(Specs, Polys, Strings0),
> > + (
> > + Poly = i(Int),
> > + SubString = int_to_string(FormatString, Int)
> > + ;
> > + Poly = f(Float),
> > + SubString = float_to_string(FormatString, Float)
> > + ;
> > + Poly = c(Char),
> > + SubString = char_to_string(FormatString, Char)
> > + ;
> > + Poly = s(Str),
> > + SubString = string_to_string(FormatString, Str)
> > + ),
> > + Strings = [SubString | Strings0]
> > + % string__append(SubString, String0, String)
>
> This shouldn't be commented out here should it?
>
It is an artifact from a previous implementation, memory wise it is a
lot cheaper to construct a list of strings then call fast_append_list
then it is to append each string onto the start of the string at the end
of each recursive call.
> > + ;
> > + error("specifiers_to_string")
> > + ).
> > +specifiers_to_string([string(Chars) | Specs], Polys, Strings) :-
> > + specifiers_to_string(Specs, Polys, Strings0),
> > + % We need to use sprintf to print the string as it may
> > + % contain the "%%" string.
> > + Strings = [string_to_string(string__from_char_list(Chars), "") |
> > + Strings0].
> > +
> > + %
> > + % Parse a format string.
> > + %
> > +:- pred format_string(list(specifier)::out, list(string__poly_type)::in,
> > + list(string__poly_type)::out,
> > + list(char)::in, list(char)::out) is semidet.
> > +
> > +format_string(Results, PolyTypes0, PolyTypes) -->
> > + =(Init),
> > + ( other(PolyTypes0) ->
> > + =(Final),
> > + format_string_2(Result0s, PolyTypes0, PolyTypes),
> > + { list__remove_suffix(Init, Final, Prefix) },
> > + { Results = [string(Prefix) | Result0s] }
> > + ;
> > + format_string_2(Results, PolyTypes0, PolyTypes)
> > + ).
> > +
> > + %
> > + % Parse a string beginning with a '%' that is a conversion
> > + % specification.
> > + %
> > +:- pred format_string_2(list(specifier)::out, list(string__poly_type)::in,
> > + list(string__poly_type)::out,
> > + list(char)::in, list(char)::out) is semidet.
> > +
> > +format_string_2(Results, PolyTypes0, PolyTypes) -->
> > + =(Init),
> > + ( ['%'] ->
> > + conversion_specification_and_other(PolyTypes0),
> > + =(Final),
> > + { PolyTypes0 = [_ | PolyTypes1] },
> > + format_string_2(Result0s, PolyTypes1, PolyTypes),
> > + { list__remove_suffix(Init, Final, Prefix) },
> > + { Results = [conv(Prefix) | Result0s] }
> > + ;
> > + { Results = [] },
> > + { PolyTypes = PolyTypes0 }
> > + ).
> > +
> > + %
> > + % Parse a conversion specification followed by any characters
> > + % which don't belong to a conversion specification.
> > + %
> > +:- pred conversion_specification_and_other(list(string__poly_type)::in,
> > + list(char)::in, list(char)::out) is semidet.
> > +
> > +conversion_specification_and_other(PolyTypes0) -->
> > + conversion_specification(PolyTypes0),
> > + { PolyTypes0 = [_ | PolyTypes] },
> > + other(PolyTypes).
>
> This code is confusing.
>
> Is there a good reason why you can't just call
> `conversion_specification' and *then* call `other' in format_string_2?
> If you did this then you could probably just use the first element of
> the PolyTypes to check the `conversion_specification', and the second
> element of the PolyTypes to check `other'.
>
This is also an artificat from previous implementations. I assume that
the conversion_specification_and_other will be inlined giving exactly
what you said.
> But since `other' just calls conversion_specification in a negation, it
> seems you run this check twice for each conversion specifier (once in a
> negation to check for the end of "other" and once in the
> convesion_specification itself. It seems that this is a bit wasteful
> (for a high performance formatter), couldn't you restructure the code to
> do it just once?
>
Yes, but at the cost of allocating more memory. Note that conversion
specification strings are quite short so parsing them twice shouldn't be
a major performance hit. I have taken the approach that in the long run
memory allocation is more expensive then some extra processing.
> Some explanantion of the overall design is missing. I'm not entirely
> sure how it all hangs together. So I am not convinced about the
> correctness of this code, and I'm also not convinced it's necessary for
> it to have these little inefficiencies.
>
It works by taking as many characters as possible from the start of the
format string which don't consume a poly token. This guarantees that at
the start of format_string_2 we have a '%' which will consume a poly
token. We then parse this conversion specifier followed by as many
characters as possible which don't consume a poly token leaving us in
the same place.
The lists of characters are constructed to be as long as possible so
as to reduce the amount of memory allocated.
> > +
> > + %
> > + % Each conversion specification is introduced by the character
> > + % '%', and ends with a conversion specifier. In between there
> > + % may be (in this order) zero or more flags, an optional
> > + % minimum field width, an optional precision and an optional
> > + % length modifier.
> > + % Note we *don't* parse the '%' character as a valid conversion
> > + % specifier, this ensures that if we parse a conversion
> > + % specification we should consume one of the string__poly_type
> > + % tokens.
> > + %
> > +:- pred conversion_specification(list(string__poly_type)::in,
> > + list(char)::in, list(char)::out) is semidet.
> > +
> > +conversion_specification(PolyTypes0) -->
> > + zero_or_more_occurences(flag_mod),
> > + zero_or_one_occurence(width_mod),
> > + zero_or_one_occurence(prec_mod),
> > + zero_or_one_occurence(length_mod),
> > + spec(PolyTypes0).
> > +
> > + %
> > + % Do we have a flag?
> > + %
> > +:- pred flag_mod(list(char)::in, list(char)::out) is semidet.
> > +
> > +flag_mod --> ['#'].
> > +flag_mod --> ['0'].
> > +flag_mod --> ['-'].
> > +flag_mod --> [' '].
> > +flag_mod --> ['+'].
> > +
> > + %
> > + % Do we have a minimum field width?
> > + %
> > +:- pred width_mod(list(char)::in, list(char)::out) is semidet.
> > +
> > +width_mod -->
> > + non_zero_digit,
> > + zero_or_more_occurences(digit).
> > +
> > + %
> > + % Do we have a precision?
> > + %
> > +:- pred prec_mod(list(char)::in, list(char)::out) is semidet.
> > +
> > +prec_mod -->
> > + ['.'],
> > + non_zero_digit,
> > + zero_or_more_occurences(digit).
> > +
> > + %
> > + % Do we have a length modifier?
> > + %
> > +:- pred length_mod(list(char)::in, list(char)::out) is semidet.
> > +
> > +length_mod --> ['h'], ( ['h'] -> [] ; [] ).
> > +length_mod --> ['l'], ( ['l'] -> [] ; [] ).
> > +length_mod --> ['L'].
> > +length_mod --> ['j'].
> > +length_mod --> ['z'].
> > +length_mod --> ['t'].
> > +
> > + %
> > + % Do we have a valid conversion specifier?
> > + % We check to ensure that the specifier also matches the type
> > + % from the input list.
> > + % Note we *don't* parse the '%' character as a valid conversion
> > + % specifier, this ensures that if we parse a conversion
> > + % specification we should consume one of the string__poly_type
> > + % tokens.
> > + %
> > +:- pred spec(list(string__poly_type)::in,
> > + list(char)::in, list(char)::out) is semidet.
> > +
> > + % valid integer conversion specifiers
> > +spec([P | _Ps]) --> { P = i(_) }, ['d'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['i'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['o'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['u'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['x'].
> > +spec([P | _Ps]) --> { P = i(_) }, ['X'].
> > +
> > + % valid float conversion specifiers
> > +spec([P | _Ps]) --> { P = f(_) }, ['e'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['E'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['f'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['F'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['g'].
> > +spec([P | _Ps]) --> { P = f(_) }, ['G'].
> > +% spec([P | _Ps]) --> { P = f(_) }, ['a'].
> > +% spec([P | _Ps]) --> { P = f(_) }, ['A'].
>
> Why are these commented out?
>
They are valid for sprintf but not by the documentation of
string__format.
>
> If you fix the documentation and MR_ thingies it's fine to check in.
> However I'd like to revisit the implementation and see whether the
> things I suggested to restructure the code could be done (and make it
> any more efficient) or are a load of rubbish. So can you post another
> diff before you commit and we will discuss it.
>
I have done all the MR_ stuff as a seperate diff and commited it.
I will look at yours and Fergus review comments and post a new diff
soon.
Pete
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list