[m-rev.] for review: reduce memory allocation in lexer.m
Julien Fischer
jfischer at opturion.com
Tue Jul 30 15:39:21 AEST 2019
Hi Zoltan,
On Tue, 30 Jul 2019, Zoltan Somogyi wrote:
> For review by anyone.
> Drastically reduce memory allocation by the scanner.
>
> When doing "mmc --make-short-interface *.m" in the compiler, this diff
> reduces the overall amount of memory allocated by almost half, with
> the amount allocated by the scanner going from 76 Mb to 13.5 Mb.
Nicely done!
> The speedup when doing that is around 12%, though it is closer to 0.5%
> on tools/speedtest. (With --make-short-interface, scanning and parsing
> take a much bigger fraction of the compiler's time than with other tasks.)
>
> library/lexer.m:
> We used to have two implementations of everything in this module.
> One read in characters from a stream on demand, the other processed
> a string containing the already-fully-read-in contents of a file.
> The latter used a data structure named posn that contained not just
> the current offset into the file, but the current line number and
> the offset of the start of the current line.
Is it worth keeping this second one in the long run? It doesn't seem so to me.
> Add a third predicate for every existing pair of predicates.
> This one also processes fully-read-in files as string, but this one
> passes around the current offset as a notag wrapper around an integer,
> which eliminates the need to allocate memory when stepping over *every*
> character. The current line number and the offset of the start of the
> current line are in another structure, so we do still allocate memory
> when stepping over a newline. Storing these two items in separate
> arguments would allow us to reduce memory allocations even further,
Assuming that neither value is going to exceed UINT32_MAX, you could
replace them with uint32s; once argument packing is enabled the result
should be allocationless on 64-bit machines.
> but
>
> - the increase in the cost of parameter passing would probably the same
> when going from passing around 2 variable to passing around 3
> as there was in going from passing around 1 variable to passing around 2,
> while
> - the reduction in memory allocation would be much smaller.
>
> Thus the cost/benefit ration is likely to be much worse.
>
> Try to compute the output arguments in their order in the arg list.
>
> Delete an obsolete comment.
...
> diff --git a/library/lexer.m b/library/lexer.m
> index 92162eca2..240bb0743 100644
> --- a/library/lexer.m
> +++ b/library/lexer.m
> @@ -84,6 +84,22 @@
> ---> token_cons(token, token_context, token_list)
> ; token_nil.
>
> + % A line_context and a line_posn together contain exaxtly the same
s/exaxtly/exactly/
> + % fields as a posn, with the same semantics. The difference is that
> + % stepping past a single character requires no memory allocation
> + % whatsoever *unless* that character is a newline.
> +
> +:- type line_context
> + ---> line_context(
> + line_context_current_line_number :: int,
> + line_context_offset_of_start_of_line :: int
> + ).
> +
> +:- type line_posn
> + ---> line_posn(
> + line_posn_current_offset_in_file :: int
> + ).
> +
> % Read a list of tokens either from the current input stream
> % or from the specified input stream.
> % Keep reading until we encounter either an `end' token
The rest looks ok.
Julien.
More information about the reviews
mailing list