[m-rev.] for post-commit review: stop using .*opt.tmp files

Zoltan Somogyi zoltan.somogyi at runbox.com
Sun Feb 18 14:54:22 AEDT 2024


On 2024-02-18 14:31 +11:00 AEDT, "Julien Fischer" <jfischer at opturion.com> wrote:
> It did occur to me that there is an issue with this change (and the one for .int
> file) on systems that use CR-LF line endings. This is because the string containing
> the existing version of any .int / .opt files would presumably contain the CR-LF
> line endings, while those constructed by string builders presumably wouldn't.

Whether the existing versions contain CR-LF depends on

- what the string output primitives we call do, when printing whole strings, and on
- what value the line_ending field of the mercury file structure contains, when
  printing individual characters (as e.g. io.nl does).

That itself seems to raise the possibility of inconsistency.

As I mentioned in earlier emails, I intend to change this code to read in files
as byte arrays, and write them out as byte arrays, with conversions between
byte arrays and strings doing the conversion between UTF-8 and UTF-16.
This conversion could also take care of the line ending conversion, if required.
And it could do so while testing the line ending setting just once, instead of
doing so when processing every \n char.

> This doesn't seem to be the case (at least for the C grades) for a small test
> program I wrote on Windows. Some of our I/O read primitives convert CR-LF to
> '\n', (e.g. io.primitives_read.read_char_code/6) but I'm not certain about some
> of the others, particularly on some of the non-C backends.
> 
> In short, there isn't an obvious correctness argument for this change on
> systems using CR-LF endings, even though it appears to work. The real problem
> is, arguably, that the user-facing documentation of the io module does not
> describe what happens with CR-LF line endings when reading from text file input
> streams.

Agreed, that is a problem. Unfortunately, fixing it requires knowing what
the C# and Java primitives that we invoke from our foreign_procs do in this
regard. That puts me out of the running for doing the fix.

> (Also, does anyone have any objection to doing s/CR-LF/CRLF/ in the
> comments in the library?; I was searching for the latter, which is more
> usual, before realising we used the former.)

No objection from me.

Thanks for the review.

(If I were god-emperor of the universe, I would pass a law that says that
every technical team that has a conversation on this topic could submit
an invoice to Bill Gates, which he would have to pay, for all the costs they incur
in dealing with this issue, which was created by his lack of foresight, and of taste,
in the early days of Microsoft.)

Zoltan.


More information about the reviews mailing list