[m-rev.] for review: add write_binary_utf8_string

Julien Fischer jfischer at opturion.com
Thu Apr 7 11:50:57 AEST 2022


Hi Peter,

On Thu, 7 Apr 2022, Peter Wang wrote:

> On Thu, 07 Apr 2022 10:38:02 +1000 Julien Fischer <jfischer at opturion.com> wrote:
>>
>> +%---------------------%
>> +
>> +    % Write the UTF-8 encoding of a string to the current binary output stream
>> +    % or the specified binary output stream. If the given string is not
>> +    % well-formed, then the behaviour is implementation dependent.
>> +    %
>> +:- pred write_binary_utf8_string(string::in, io::di, io::uo) is det.
>> +:- pred write_binary_utf8_string(io.binary_output_stream::in, string::in,
>> +    io::di, io::uo) is det.
>> +
>
> Or write_binary_string_utf8, in line with the pattern
> write_binary_<TYPE><SUFFIX>?

Done. That will work better if ever add supporting for writing strings
using other encodings.

>> @@ -1147,6 +1150,50 @@ do_write_float(Stream, Float, Error, !IO) :-
>>   ").
>>
>>   %---------------------------------------------------------------------------%
>> +
>> +:- pragma foreign_proc("C",
>> +    do_write_binary_utf8_string(Stream::in, String::in, Error::out,
>> +        _IO0::di, _IO::uo),
>> +    [will_not_call_mercury, promise_pure, thread_safe, tabled_for_io],
>> +"
>> +    size_t len = strlen(String);
>> +    if (MR_WRITE(*Stream, (unsigned char *) String, len)) {
>
> Check the result is != len.

Done.

>> +        Error = errno;
>> +    } else {
>> +        Error = 0;
>> +    }
>> +").
>> +
>> +:- pragma foreign_proc("C#",
>> +    do_write_binary_utf8_string(Stream::in, String::in, Error::out,
>> +        _IO0::di, _IO::uo),
>> +    [will_not_call_mercury, promise_pure, thread_safe, tabled_for_io],
>> +"
>> +    byte[] bytes = mercury.io__stream_ops.text_encoding.GetBytes(String);
>> +    try {
>> +        Stream.stream.Write(bytes, 0, bytes.Length);
>> +        Error = null;
>> +    } catch (System.Exception e) {
>> +        Error = e;
>> +    }
>> +").
>> +
>> +:- pragma foreign_proc("Java",
>> +    do_write_binary_utf8_string(Stream::in, String::in, Error::out,
>> +        _IO0::di, _IO::uo),
>> +    [will_not_call_mercury, promise_pure, thread_safe, tabled_for_io],
>> +"
>> +    byte[] bytes = String.getBytes(java.nio.charset.StandardCharsets.UTF_8);
>> +    try {
>> +        ((jmercury.io__stream_ops.MR_BinaryOutputFile) Stream).write(
>> +            bytes, 0, bytes.length);
>> +        Error = null;
>> +    } catch (java.io.IOException e) {
>> +        Error = e;
>> +    }
>> +").
>> +
>
> That looks fine. (I had a quick look around for something more efficient
> but nothing obvious came up.)

For Java, an alternative possibility is for every MR_BinaryOutputFile to
also create a UTF-8 PrintWriter and use that. I haven't checked, but
that should avoid having to create the array of bytes. There are
potential problems with that approach however, managing multiple views
of the stream, end of line behaviour etc. To say nothing of the fact
that binary output stream that do not have strings written to them will
needlessly create a PrintWriter.

Unless there is a serious performance issue or people are writing
gigantic strings to binary output streams in either the C# or Java
grades, I don't intend to look at it further now.

>> diff --git a/tests/hard_coded/write_binary_utf8.m b/tests/hard_coded/write_binary_utf8.m
>> index e69de29..0ca6e27 100644
>> --- a/tests/hard_coded/write_binary_utf8.m
>> +++ b/tests/hard_coded/write_binary_utf8.m
> ...
>> +%---------------------------------------------------------------------------%
>> +
>> +main(!IO) :-
>> +    io.open_binary_output(test_file, OpenOutResult, !IO),
>> +    (
>> +        OpenOutResult = ok(Out),
>> +        output_test_strings(Out, !IO),
>> +        io.close_binary_output(Out, !IO),
>> +        read_and_print_bytes(!IO),
>
> Minor: make read_and_print_bytes take the file name as an argument.

Done.

Thanks for that.

Julien.


More information about the reviews mailing list