[m-dev.] for discussion: reading multibyte integers from binary streams

Julien Fischer jfischer at opturion.com
Wed Jan 2 14:24:18 AEDT 2019


Hi,

A while ago, I added predicates to the io module for writing multibyte integers
to binary file streams (e.g. write_binary_uint32_le etc).

We also need to add predicates for reading multibyte integers from binary file
streams, for example:

    :- pred read_binary_uint32(?::out, io::di, io::uo) is det.
    :- pred read_binary_uint32(io.binary_input_stream::in, ?::out,
        io::di, io::uo) is det.
    :- pred read_binary_uint32_le(?::out, io::di, io::uo) is det.
    :- pred read_binary_uint32_le(io.binary_input_stream::in, ?::out,
        io::di, io::uo) is det.
    :- pred read_binary_uint32_be(?::out, io::di, io::uo) is det.
    :- pred read_binary_uint32_be(io.binary_input_stream::in, ?::out,
        io::di, io::uo) is det.

etc etc.

The result types for these predicates (the '?' above) will need to account for
an extra possibility: where the final multibyte integer read from the stream is
incomplete (e.g. when reading a uint32 but there are only 3 bytes remaining in
the stream.)

I would like feedback on (1) the name of this new result type and (2) how much
information it should actually return.  Here's a couple of proposals:

1.

  :- type compound_result(T)
      --->    ok(T)
      ;       eof
      ;       incomplete
      :       error(io.error).

Here we just flag that the final value on the stream was incomplete and
leave it at that.  The diff below, which I do not intend to commit, implements
read_binary_uint32_{le,be} using this type.

2.

    :- type compound_result(T, U)
       --->    ok(T)
       ;       eof
       ;       incomplete(U)     % arg gives partial bytes for incomplete value
       ;       error(io.error).

Alternatively, we could return the bytes already read for an incomplete value.
(This value is polymorphic since returning a list of bytes for 16-bit values
seems a bit silly.)  My suspicion is that this approach will be overkill for
most applications.

Another alternative would be to return the number of bytes read for the
incomplete value.

Comments and suggestions?

Julien.

-------------------------


diff --git a/library/io.m b/library/io.m
index d407c09..5be2ed4 100644
--- a/library/io.m
+++ b/library/io.m
@@ -108,6 +108,16 @@
      ;       eof
      ;       error(io.error).

+    % compound_result is used where the result is composed from smaller
+    % elements read from the stream and it possible for the final element to be
+    % incomplete.
+    %
+:- type compound_result(T)
+    --->    ok(T)
+    ;       eof
+    ;       incomplete
+    ;       error(io.error).
+
  :- type read_result(T)
      --->    ok(T)
      ;       eof
@@ -872,6 +882,16 @@
  :- pred read_binary_uint8(io.binary_input_stream::in, io.result(uint8)::out,
      io::di, io::uo) is det.

+:- pred read_binary_uint32_le(io.compound_result(uint32)::out,
+    io::di, io::uo) is det.
+:- pred read_binary_uint32_le(io.binary_input_stream::in,
+    io.compound_result(uint32)::out, io::di, io::uo) is det.
+
+:- pred read_binary_uint32_be(io.compound_result(uint32)::out,
+    io::di, io::uo) is det.
+:- pred read_binary_uint32_be(io.binary_input_stream::in,
+    io.compound_result(uint32)::out, io::di, io::uo) is det.
+
      % Fill a bitmap from the current binary input stream
      % or from the specified binary input stream.
      % Return the number of bytes read. On end-of-file, the number of
@@ -2132,6 +2152,19 @@ using System.Security.Principal;
  :- pragma foreign_export_enum("Java", result_code/0,
      [prefix("ML_RESULT_CODE_"), uppercase]).

+:- type compound_result_code
+    --->    ok
+    ;       eof
+    ;       incomplete
+    ;       error.
+
+:- pragma foreign_export_enum("C", compound_result_code/0,
+    [prefix("ML_COMPOUND_RESULT_CODE_"), uppercase]).
+:- pragma foreign_export_enum("C#", compound_result_code/0,
+    [prefix("ML_COMPOUND_RESULT_CODE_"), uppercase]).
+:- pragma foreign_export_enum("Java", compound_result_code/0,
+    [prefix("ML_COMPOUND_RESULT_CODE_"), uppercase]).
+
      % Reads a character (code point) from specified stream. This may
      % involve converting external character encodings into Mercury's internal
      % character representation and (for text streams) converting OS line
@@ -2383,6 +2416,174 @@ read_binary_uint8(binary_input_stream(Stream), Result, !IO) :-
          Result = error(io_error(Msg))
      ).

+%---------------------%
+
+read_binary_uint32_le(Result, !IO) :-
+    binary_input_stream(Stream, !IO),
+    read_binary_uint32_le(Stream, Result, !IO).
+
+read_binary_uint32_le(binary_input_stream(Stream), Result, !IO) :-
+    do_read_uint32_le(Stream, Result0, UInt32, Error, !IO),
+    (
+        Result0 = ok,
+        Result = ok(UInt32)
+    ;
+        Result0 = eof,
+        Result = eof
+    ;
+        Result0 = incomplete,
+        Result = incomplete
+    ;
+        Result0 = error,
+        make_err_msg(Error, "read failed: ", Msg),
+        Result = error(io_error(Msg))
+    ).
+
+:- pred do_read_uint32_le(stream::in, compound_result_code::out, uint32::out,
+    system_error::out, io::di, io::uo) is det.
+
+:- pragma foreign_proc("C",
+    do_read_uint32_le(Stream::in, Result::out, UInt32::out, Error::out,
+        _IO0::di, _IO::uo),
+    [will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
+"
+    unsigned char buffer[4];
+    size_t nread = MR_READ(*Stream, buffer, 4);
+
+    if (nread < 4) {
+        UInt32 = 0;
+        if (MR_FERROR(*Stream)) {
+            Result = ML_COMPOUND_RESULT_CODE_ERROR,
+            Error = errno;
+        } else if (nread > 0) {
+            Result = ML_COMPOUND_RESULT_CODE_INCOMPLETE;
+            Error = 0;
+        } else {
+            Result = ML_COMPOUND_RESULT_CODE_EOF;
+            Error = 0;
+        }
+    } else {
+        Result = ML_COMPOUND_RESULT_CODE_OK;
+        #if defined(MR_BIG_ENDIAN)
+            ((unsigned char *) &UInt32)[0] = buffer[3];
+            ((unsigned char *) &UInt32)[1] = buffer[2];
+            ((unsigned char *) &UInt32)[2] = buffer[1];
+            ((unsigned char *) &UInt32)[3] = buffer[0];
+        #else
+            UInt32 = *((uint32_t *) buffer);
+        #endif
+        Error = 0;
+    }
+").
+
+:- pragma foreign_proc("C#",
+    do_read_uint32_le(Stream::in, Result::out, UInt32::out, Error::out,
+        _IO0::di, _IO::uo) ,
+    [will_not_call_mercury, promise_pure, thread_safe],
+"
+    byte[] buffer = new byte[4];
+    io.MR_MercuryFileStruct mf = Stream;
+    UInt32 = 0;
+
+    int nread = 0;
+
+    if (mf.putback != -1) {
+        buffer[nread] = (byte) mf.putback;
+        nread++;
+        mf.putback = -1;
+    }
+
+    try {
+        for ( ; nread < 4; nread++) {
+            int b = mf.stream.ReadByte();
+            if (b == -1) {
+                break;
+            }
+            buffer[nread] = (byte) b;
+        }
+        if (nread < 4) {
+            if (nread > 0) {
+                Result = io.ML_COMPOUND_RESULT_CODE_INCOMPLETE;
+            } else {
+                Result = io.ML_COMPOUND_RESULT_CODE_EOF;
+            }
+        } else {
+            Result = io.ML_COMPOUND_RESULT_CODE_OK;
+            UInt32 = (uint) (buffer[3] << 24 | buffer[2] << 16 |
+                buffer[1] << 8 | buffer[0]);
+        }
+        Error = null;
+    } catch (System.Exception e) {
+        Result = io.ML_COMPOUND_RESULT_CODE_ERROR;
+        Error = e;
+    }
+").
+
+:- pragma foreign_proc("Java",
+    do_read_uint32_le(Stream::in, Result::out, UInt32::out, Error::out,
+        _IO0::di, _IO::uo),
+    [will_not_call_mercury, promise_pure, thread_safe],
+"
+    byte[] buffer = new byte[4];
+    MR_BinaryInputFile mf = (MR_BinaryInputFile) Stream;
+    UInt32 = 0;
+
+    try {
+        int nread;
+        for (nread = 0; nread < 4; nread++) {
+            int next = mf.read_byte();
+            if (next == -1) {
+                break;
+            }
+            buffer[nread] = (byte) next;
+        }
+        if (nread < 4) {
+            if (nread > 0) {
+                Result = io.ML_COMPOUND_RESULT_CODE_INCOMPLETE;
+            } else {
+                Result = io.ML_COMPOUND_RESULT_CODE_EOF;
+            }
+        } else {
+            Result = io.ML_COMPOUND_RESULT_CODE_OK;
+            UInt32 =
+                (buffer[3] & 0xff) << 24 |
+                (buffer[2] & 0xff) << 16 |
+                (buffer[1] & 0xff) << 8  |
+                (buffer[0] & 0xff);
+        }
+        Error = null;
+    } catch (java.lang.Exception e) {
+        Result = io.ML_COMPOUND_RESULT_CODE_ERROR;
+        Error = e;
+    }
+").
+
+%---------------------%
+
+read_binary_uint32_be(Result, !IO) :-
+    binary_input_stream(Stream, !IO),
+    read_binary_uint32_be(Stream, Result, !IO).
+
+read_binary_uint32_be(binary_input_stream(Stream), Result, !IO) :-
+    do_read_uint32_le(Stream, Result0, UInt32LE, Error, !IO),
+    (
+        Result0 = ok,
+        UInt32BE = uint32.reverse_bytes(UInt32LE),
+        Result = ok(UInt32BE)
+    ;
+        Result0 = eof,
+        Result = eof
+    ;
+        Result0 = incomplete,
+        Result = incomplete
+    ;
+        Result0 = error,
+        make_err_msg(Error, "read failed: ", Msg),
+        Result = error(io_error(Msg))
+    ).
+
+%---------------------%
+
  read_bitmap(!Bitmap, BytesRead, Result, !IO) :-
      binary_input_stream(Stream, !IO),
      read_bitmap(Stream, !Bitmap, BytesRead, Result, !IO).



More information about the developers mailing list