[m-rev.] for review: read Mercury files using read_file_as_string

Julien Fischer jfischer at opturion.com
Sat Jul 27 17:47:23 AEST 2019


Hi Zoltan,

On Fri, 26 Jul 2019, Zoltan Somogyi wrote:

> This diff, which is for review by anyone, gets about a 2% speedup
> on my laptop. However, it does raise two issues.
>
> The first is that after this change, the compiler cannot pass the invalid/null_char
> test case. As the attached diff shows, instead of reporting null chars in the items
> in which they occur, it reports simply that there is a null char in the file, and
> does not even attempt to parse it. We could improve on this by reporting e.g.
> the line numbers of all the lines that contain NULs, but we could never reproduce
> the current expected output of invalid/null_char. The question is, is the speedup
> worth this complication in this very rare situation? And if yes, would the above
> mentioned improvement be useful to anyone? (It isn't to me.)

I don't think the less accurate error message is significant; the
compiler doesn't pass that test in non-C grades anyway.  (Because the
target languages are fine with null chars in strings.)

IIRC, that test was not added because of any specfic issue with the
compiler but as part of change by Simon to prevent null chars being used
in Mercury generally.

> The second is that the speedup could be slightly higher if read_file_as_string
> returned the max offset in the string it returns, which it has to know in order to
> allocate the right amount of memory. While I could write a new variant of
> the C version of read_file_as_string that does that, maybe called read_file_as_
> string_and_length,

Don't use length, since that's ambiguous.

> I could not do the same for the Java and Erlang versions.  Would
> anyone like to volunteer?

I suggest you add the C implementation and a fallbck Mercury implementation
(as we have with read_file_as_string).  I can add the Java version.
My Erlang is probably not up to the task of adding it for that language :-(

> Read in Mercury files all at once.
> 
> Instead of reading in Mercury files (.m files, .int* files, .*opt files)
> one term (or one item) at a time, using I/O operations that each get
> a single character, read them in all at once, using read_file_as_string,
> and use the parser and lexer predicates that pass around in,out pairs
> of positions instead of di,uo pairs of I/O states.
> 
> On tools/speedtest, this yields a speedup of around 2% in hlc.gc on my laptop.
> 
> compiler/parse_module.m:
>     As above: invoke the !Posn versions of the predicates that parse terms,
>     not the !IO versions. These in term call the !Posn versions of the
>     predicates in lexer.m.
> 
> library/lexer.m:
>     We used to undo the act of stepping over a character in !Posn predicates

s/character/code point/

>     by scanning UTF codes backwards. Switch to a simpler approach: simply

by scanning backwards to the previous code point.

>     remember the position before the stepping-over.
>
>     Fix a bug each in string_get_hex_2 and string_get_int_dot.
>     Both bugs had to do with undoing just the right such steppings-over.
>
>     Inline two predicate calls in string_get_token_2, which is the main
>     predicate of the !Posn version. This inlining allows us replace
>     two switches with one, getting us about ~0.5% of the overall ~2% speedup.
>
>     Replace several deeply nested if-then-else chains with incomplete switches.
>
>     Put the string we are processing and its length in neighbouring arguments.
>
>     Delete unused arguments from some predicates.

The diff looks fine otherwise.

Julien.


More information about the reviews mailing list