[m-rev.] io.{read,write}_binary
Julien Fischer
jfischer at opturion.com
Tue Apr 25 12:53:24 AEST 2023
Hi Zoltan,
On Tue, 25 Apr 2023, Zoltan Somogyi wrote:
> These two predicates are implemented using a form of type
> punning: they take the text_{in,out}put_stream wrapper off a stream
> and put a binary_{in,out}put_stream wrapper around it.
> This works because at the moment, these wrappers all wrap
> the same type, but this won't be the case soon. We therefore
> need a new way to implement these two predicates.
>
> The approach I propose is that
>
> - write_binary should call string.string to convert the term
> to be written to a string,
>
> - it should could how many code units (utf 8 or 16 depending
> on the target) the string has
>
> - write out the length as a binary integer, followed by the
> code units of the string, again as binary data
>
> read_binary would then reverse the process.
>
> This should work. It should even work for Java, which the
> comments on these predicates say current code does not.
>
> Opinions? Objections?
IMO, the above predicates should be removed from the standard library
entirely.
That said, if they remain and keep using the term-to-string approach, we
should always write the string to the binary stream in UTF-8 encoding
regardless of what the backend is (e.g. using io.write_binary_string_utf8/{3,4}).
(Although we don't really have a convenient mechanism for going the
other way around yet.)
> Note that I think the obvious size of the length prefix is 64 bits.
> 32 would work in virtually all cases, but it is not futureproof, and
> the savings are as tiny as the chance that anyone will ever want
> to invoke these preds on a >4Gb term.
That's fine. Both C# and Java impose maximum lengths on string, but we
can check if the size is exceeds those. There's no point artificially
limiting the C backend here.
Julien.
More information about the reviews
mailing list