[m-rev.] for review: reduce memory allocation in lexer.m

Julien Fischer jfischer at opturion.com
Tue Jul 30 15:39:21 AEST 2019


Hi Zoltan,

On Tue, 30 Jul 2019, Zoltan Somogyi wrote:

> For review by anyone.

> Drastically reduce memory allocation by the scanner.
> 
> When doing "mmc --make-short-interface *.m" in the compiler, this diff
> reduces the overall amount of memory allocated by almost half, with
> the amount allocated by the scanner going from 76 Mb to 13.5 Mb.

Nicely done!

> The speedup when doing that is around 12%, though it is closer to 0.5%
> on tools/speedtest. (With  --make-short-interface, scanning and parsing
> take a much bigger fraction of the compiler's time than with other tasks.)
> 
> library/lexer.m:
>     We used to have two implementations of everything in this module.
>     One read in characters from a stream on demand, the other processed
>     a string containing the already-fully-read-in contents of a file.
>     The latter used a data structure named posn that contained not just
>     the current offset into the file, but the current line number and
>     the offset of the start of the current line.

Is it worth keeping this second one in the long run?  It doesn't seem so to me.

>     Add a third predicate for every existing pair of predicates.
>     This one also processes fully-read-in files as string, but this one
>     passes around the current offset as a notag wrapper around an integer,
>     which eliminates the need to allocate memory when stepping over *every*
>     character. The current line number and the offset of the start of the
>     current line are in another structure, so we do still allocate memory
>     when stepping over a newline. Storing these two items in separate
>     arguments would allow us to reduce memory allocations even further,

Assuming that neither value is going to exceed UINT32_MAX, you could
replace them with uint32s; once argument packing is enabled the result
should be allocationless on 64-bit machines.

>     but
>
>     - the increase in the cost of parameter passing would probably the same
>       when going from passing around 2 variable to passing around 3
>       as there was in going from passing around 1 variable to passing around 2,
>       while
>     - the reduction in memory allocation would be much smaller.
>
>     Thus the cost/benefit ration is likely to be much worse.
>
>     Try to compute the output arguments in their order in the arg list.
>
>     Delete an obsolete comment.

...

> diff --git a/library/lexer.m b/library/lexer.m
> index 92162eca2..240bb0743 100644
> --- a/library/lexer.m
> +++ b/library/lexer.m
> @@ -84,6 +84,22 @@
>      --->    token_cons(token, token_context, token_list)
>      ;       token_nil.
> 
> +    % A line_context and a line_posn together contain exaxtly the same


s/exaxtly/exactly/


> +    % fields as a posn, with the same semantics. The difference is that
> +    % stepping past a single character requires no memory allocation
> +    % whatsoever *unless* that character is a newline.
> +
> +:- type line_context
> +    --->    line_context(
> +                line_context_current_line_number        :: int,
> +                line_context_offset_of_start_of_line    :: int
> +            ).
> +
> +:- type line_posn
> +    --->    line_posn(
> +                line_posn_current_offset_in_file        :: int
> +            ).
> +
>      % Read a list of tokens either from the current input stream
>      % or from the specified input stream.
>      % Keep reading until we encounter either an `end' token

The rest looks ok.

Julien.


More information about the reviews mailing list