[m-rev.] for review: optimize the tests for builtin types when printing terms

Julien Fischer jfischer at opturion.com
Thu Oct 17 11:57:34 AEDT 2019


Hi Zoltan,

On Thu, 17 Oct 2019, Zoltan Somogyi wrote:

> For review by Julien, since it was his mod of term_to_xml.m that
> tickled my memory of this old todo item.
>
> On my laptop, this diff speeds up the attached microbenchmark
> from about 2.6 to about 1.6 seconds.

...

> Optimize the tests for builtin types when printing terms.
> 
> library/stream.string_writer.m:
>     When printing terms to a stream, we special case the handling
>     of a whole bunch of builtin types. We used to test for these one by one,
>     in sequence, which can be very slow.
>
>     Switch to an approach where we first get the identity of the term's type,
>     and we test for a type for one of these types *if, and only if*,
>     the identity says that it is one of the types we want to special-case
>     in one of the modules that has to-be-special-cased types.
>
>     This code is significantly longer than the original, but
>     (a) it should be significantly faster (it speeds up a microbenchmark
>     by a bit more than a third), and (b) it should not slow down
>     if we want to special-case the treatment of more types in the future.
> 
> tests/hard_coded/stream_string_writer_types.{m,exp}:
>     A test to ensure that the assumptions that the new code in
>     stream.string_writer.m makes about which builtin type is defined where
>     are still valid.
> 
> tests/hard_coded/Mmakefile:
>     Enable the new test case.
> diff --git a/library/stream.string_writer.m b/library/stream.string_writer.m
> index c4978f24a..e104b2e95 100644
> --- a/library/stream.string_writer.m
> +++ b/library/stream.string_writer.m
> @@ -71,22 +71,21 @@
>      <= stream.writer(Stream, string, State).

...


> @@ -455,9 +454,28 @@ nl(Stream, !State) :-
>
>  %---------------------------------------------------------------------------%
>  %
> -% Various different versions of print
> +% Various different versions of print.
>  %
> 
> +    % The builtin types within the builtin.m module whose printing
> +    % we may want to specialize.
> +    %
> +:- type builtin_type_in_builtin
> +    --->    type_builtin_string
> +    ;       type_builtin_character
> +    ;       type_builtin_float
> +    ;       type_builtin_int
> +    ;       type_builtin_int8
> +    ;       type_builtin_int16
> +    ;       type_builtin_int32
> +    ;       type_builtin_int64
> +    ;       type_builtin_uint
> +    ;       type_builtin_uint8
> +    ;       type_builtin_uint16
> +    ;       type_builtin_uint32
> +    ;       type_builtin_uint64
> +    ;       type_builtin_c_ptr.

While this works in this context, it would be worth considering a more general
solution that works in others.  Most pieces of code that serialise generic
types (here printing a string representation, in term_to_xml.m converting them to
XML) have a similar structure.  (Older versions of my JSON library supported RTTI
based serialisation of terms to JSON and that code followed almost exactly the
same pattern.)

What is required in all of these use cases is the ability quickly classify a
type.  The RTTI system already has this information in the form of the
type_ctor_rep enumeration.  Exposing that enumeration directly to users is not
suitable as it contains a bunch of implementation details that are (mostly) not
relevant.  I think what we should provide here is something like the following
in library/type_desc.m.

     :- type type_ctor_class
 	--->    class_string
 	;	class_character
 	;	class_float
 	...
 	;	class_enum
 	;	class_du
  	;	class_array

etc etc.

     :- func classify_type_ctor(type_ctor_desc) = type_ctor_class.

(The names above are for illustration only.)

> @@ -465,37 +483,160 @@ print_cc(Stream, Term, !State) :-
>      print(Stream, include_details_cc, Term, !State).
>
>  print(Stream, NonCanon, Term, !State) :-
> -    % `string', `char', `uint' and `univ' are special cases for print
> -    ( if dynamic_cast(Term, String : string) then
> -        put(Stream, String, !State)
> -    else if dynamic_cast(Term, Char : char) then

...

> +                    else
> +                        print_quoted(Stream, NonCanon, Term, !State)
> +                    )
> +                ;
> +                    TB = type_builtin_character,
> +                    ( if dynamic_cast(Term, Char : char) then
> +                        put(Stream, Char, !State)
> +                    else
> +                        print_quoted(Stream, NonCanon, Term, !State)
> +                    )
> +% XXX Check whether string.float_to_string generates the same output
> +% as the default execution path.

The default execution path calls put_float/4 which itself calls
float_to_string/1, so yes.   (print only special cases those types that it
represents differently from write.)

> +%               ;
> +%                   TB = type_builtin_float,
> +%                   ( if dynamic_cast(Term, Float : float) then
> +%                       put(Stream, string.float_to_string(Float), !State)
> +%                   else
> +%                       print_quoted(Stream, NonCanon, Term, !State)
> +%                   )

...

> @@ -570,120 +711,254 @@ do_write_univ(Stream, NonCanon, Univ, !State) :-
>      is cc_multi.
>  :- mode do_write_univ_prio(in, in, in, in, di, uo) is cc_multi.
>  :- pragma type_spec(do_write_univ_prio/6,
> -            (Stream = io.output_stream, State = io.state)).
> +    (Stream = io.output_stream, State = io.state)).
>
>      % We only use the io.stream_db we read impurely when we have
>      % the io.state.
>  :- pragma promise_pure(do_write_univ_prio/6).
>
>  do_write_univ_prio(Stream, NonCanon, Univ, Priority, !State) :-


...


> +    TypeDesc = univ_type(Univ),
> +    type_ctor_and_args(TypeDesc, TypeCtorDesc, ArgTypeDescs),
> +    type_ctor_name_and_arity(TypeCtorDesc,
> +        TypeCtorModuleName, TypeCtorName, _TypeCtorArity),
> +    ( if
> +        ( TypeCtorModuleName = "builtin"
> +        ; TypeCtorModuleName = "private_builtin"
> +        ; TypeCtorModuleName = "bitmap"
> +        ; TypeCtorModuleName = "type_desc"
> +        ; TypeCtorModuleName = "io"
> +        ; TypeCtorModuleName = "array"
> +        ; TypeCtorModuleName = "version_array"
> +        )
> +        % The print predicate also special cases the handling of types
> +        % from integer.m univ.m, and calendar.m.
>      then
> -        has_type(Elem, ElemType),
> -        same_private_builtin_type(PrivateBuiltinTypeInfo, Elem),
> -        det_univ_to_type(Univ, PrivateBuiltinTypeInfo),
> -        write_private_builtin_type_info(Stream, PrivateBuiltinTypeInfo, !State)
> +        (
> +            TypeCtorModuleName = "builtin",
> +            ( if
> +                ( TypeCtorName = "string",      TB = type_builtin_string
> +                ; TypeCtorName = "character",   TB = type_builtin_character
> +                ; TypeCtorName = "float",       TB = type_builtin_float
> +                ; TypeCtorName = "int",         TB = type_builtin_int
> +                ; TypeCtorName = "int8",        TB = type_builtin_int8
> +                ; TypeCtorName = "int16",       TB = type_builtin_int16
> +                ; TypeCtorName = "int32",       TB = type_builtin_int32
> +                ; TypeCtorName = "int64",       TB = type_builtin_int64
> +                ; TypeCtorName = "uint",        TB = type_builtin_uint
> +                ; TypeCtorName = "uint8",       TB = type_builtin_uint8
> +                ; TypeCtorName = "uint16",      TB = type_builtin_uint16
> +                ; TypeCtorName = "uint32",      TB = type_builtin_uint32
> +                ; TypeCtorName = "uint64",      TB = type_builtin_uint64
> +                ; TypeCtorName = "c_pointer",   TB = type_builtin_c_ptr
> +                )
> +            then
> +                (
> +                    TB = type_builtin_string,
> +                    ( if univ_to_type(Univ, String) then
> +                        term_io.quote_string(Stream, String, !State)
> +                    else
> +                        write_ordinary_term(Stream, NonCanon, Univ, Priority,
> +                            !State)
> +                    )

Why is this branch not just

     TB = type_builtin_string,
     det_univ_to_type(Univ, String),
     term_io.quote_string(Stream, String, !State)

?  Similarly, for the other branches.

Julien.


More information about the reviews mailing list