[m-rev.] for review: optimize the tests for builtin types when printing terms

Zoltan Somogyi zoltan.somogyi at runbox.com
Thu Oct 17 13:33:24 AEDT 2019



On Thu, 17 Oct 2019 11:57:34 +1100 (AEDT), Julien Fischer <jfischer at opturion.com> wrote:

> > +    % The builtin types within the builtin.m module whose printing
> > +    % we may want to specialize.
> > +    %
> > +:- type builtin_type_in_builtin
> > +    --->    type_builtin_string
> > +    ;       type_builtin_character
> > +    ;       type_builtin_float
> > +    ;       type_builtin_int
> > +    ;       type_builtin_int8
> > +    ;       type_builtin_int16
> > +    ;       type_builtin_int32
> > +    ;       type_builtin_int64
> > +    ;       type_builtin_uint
> > +    ;       type_builtin_uint8
> > +    ;       type_builtin_uint16
> > +    ;       type_builtin_uint32
> > +    ;       type_builtin_uint64
> > +    ;       type_builtin_c_ptr.
> 
> While this works in this context, it would be worth considering a more general
> solution that works in others.  Most pieces of code that serialise generic
> types (here printing a string representation, in term_to_xml.m converting them to
> XML) have a similar structure.  (Older versions of my JSON library supported RTTI
> based serialisation of terms to JSON and that code followed almost exactly the
> same pattern.)
> 
> What is required in all of these use cases is the ability quickly classify a
> type.  The RTTI system already has this information in the form of the
> type_ctor_rep enumeration.

That enumeration does not cover everything that this use case wants to cover.
For example, it does not know about bitmaps or version arrays.
Extending the enumeration to cover them would complicate
all the other users of that enumeration.

> Exposing that enumeration directly to users is not
> suitable as it contains a bunch of implementation details that are (mostly) not
> relevant.  I think what we should provide here is something like the following
> in library/type_desc.m.
> 
>      :- type type_ctor_class
>  	--->    class_string
>  	;	class_character
>  	;	class_float
>  	...
>  	;	class_enum
>  	;	class_du
>   	;	class_array
> 
> etc etc.
> 
>      :- func classify_type_ctor(type_ctor_desc) = type_ctor_class.

How do you propose to implement this function? Using type_ctor_reps
alone? Using a mixture of type_ctor_reps and the kinds of tests done
by this code? More important, do you expect to expose the class type
to users? If yes, we wouldn't be able to extend the type without breaking
the code of anyone else who uses it.

> > @@ -465,37 +483,160 @@ print_cc(Stream, Term, !State) :-
> >      print(Stream, include_details_cc, Term, !State).
> >
> >  print(Stream, NonCanon, Term, !State) :-
> > -    % `string', `char', `uint' and `univ' are special cases for print
> > -    ( if dynamic_cast(Term, String : string) then
> > -        put(Stream, String, !State)
> > -    else if dynamic_cast(Term, Char : char) then
> 
> ...
> 
> > +                    else
> > +                        print_quoted(Stream, NonCanon, Term, !State)
> > +                    )
> > +                ;
> > +                    TB = type_builtin_character,
> > +                    ( if dynamic_cast(Term, Char : char) then
> > +                        put(Stream, Char, !State)
> > +                    else
> > +                        print_quoted(Stream, NonCanon, Term, !State)
> > +                    )
> > +% XXX Check whether string.float_to_string generates the same output
> > +% as the default execution path.
> 
> The default execution path calls put_float/4 which itself calls
> float_to_string/1, so yes.

Thanks.

> (print only special cases those types that it
> represents differently from write.)

That explains why it special cases a different set of types.
But for the types that we special case in *both* write and print,
it would be nice if the print code had a comment explaining
*what the difference was*. I can see e.g. that we could print extra digits
for floats in write for round-trippability, or add i8/u8/... suffixes after
fixed size integral types, but don't see what we can do differently
for int and uint itself. Are they special cased for speed? Only if they are,
should float be special cased as well, for the same reason.

> > +                    TB = type_builtin_string,
> > +                    ( if univ_to_type(Univ, String) then
> > +                        term_io.quote_string(Stream, String, !State)
> > +                    else
> > +                        write_ordinary_term(Stream, NonCanon, Univ, Priority,
> > +                            !State)
> > +                    )
> 
> Why is this branch not just
> 
>      TB = type_builtin_string,
>      det_univ_to_type(Univ, String),
>      term_io.quote_string(Stream, String, !State)
> 
> ?  Similarly, for the other branches.

det_univ_to_type internally does an if-then-else with the same condition,
the only difference is it aborts in the else case. This would not be the right
thing to do in two very unlikely but possible cases: that we add new type
constructors to these modules with the same names but different arities,
and that the user links into their programs their own modules with the
same names but different contents as the standard library modules.
The latter can't happen with builtin.m and private_builtin.m, since
those are always implicitly imported anyway, but can happen with the others.

Zoltan.


More information about the reviews mailing list