[m-rev.] for review: add 64-bit versions of seek and offset on binary streams

Julien Fischer jfischer at opturion.com
Fri Oct 4 17:09:19 AEST 2019


For review by anyone.

I'll update the NEWS file separately after I've committed other related
changes -- I want to commit this now as it is required by those changes.
My intention is that we will eventually mark the existing versions of
seek and offset as obsolete and then after a time change their arguments
to be int64s (i.e. in the long run we won't need to have two versions of
seek and offset).

I would like opinions on what we should do on platforms with 32-bit ints
where binary_input_stream_offset, for example, returns a value that
doesn't fit in a 32-bit int; currently we just truncate the returned
offset, I think we should change it throw an exception if that happens.

------------------------------------------------------------------------------

Add 64-bit versions of seek and offset on binary streams.

The current versions of the seek and offset operations on binary streams use an
int to represent the offset.  On systems where Mercury's int type is 32-bit
this effectively limits the sizes of file we can handle with these operations
to approx. 2GB.  Additionally, the underlying operations are broken on 64-bit
Windows because fseek() and ftell() on Windows only use 32-bit values.  (We
should be using _fseeki64() and _ftelli64() on Windows -- that will be a
separate change however.)

This diff introduces versions of the seek and offset operations that use the
int64 type to represent the stream offset.  It also changes the underlying
infrastructure (e.g. the Java implementation of Mercury binary file streams) to
represent offsets using only a 64-bit type.  The existing stream and offset
operations are now implemented by _casting_ the int64 offsets to ints -- this
is wrong on some platforms, but it is the existing behaviour.

library.io.m:
      Add the predicates seek64_binary_{input,output}/5 and
      binary_{input,output}_stream_offset64/4.

      Shift the input stream versions of the above operations into the correct
      section; for some reason they were listed with the predicates that
      operate on binary *output* streams.

      Add XXXs about related issues that need to fixed.

      Use 64-bit integers only for seek and offset operations on binary
      streams.

Julien.

diff --git a/library/io.m b/library/io.m
index 995ddd7..a1077e8 100644
--- a/library/io.m
+++ b/library/io.m
@@ -1355,31 +1355,27 @@
      io::di, io::uo) is det.

      % Seek to an offset relative to Whence (documented above)
-    % on a specified binary input stream. Attempting to seek on a pipe
-    % or tty results in implementation dependent behaviour.
-    %
-    % A successful seek undoes any effects of putback_byte on the stream.
-    %
-:- pred seek_binary_input(io.binary_input_stream::in, io.whence::in,
-    int::in, io::di, io::uo) is det.
-
-    % Seek to an offset relative to Whence (documented above)
      % on a specified binary output stream. Attempting to seek on a pipe
      % or tty results in implementation dependent behaviour.
      %
  :- pred seek_binary_output(io.binary_output_stream::in, io.whence::in,
      int::in, io::di, io::uo) is det.

-    % Returns the offset (in bytes) into the specified binary input stream.
+    % As above, but the offset is always a 64-bit value.
      %
-:- pred binary_input_stream_offset(io.binary_input_stream::in, int::out,
-    io::di, io::uo) is det.
+:- pred seek64_binary_output(io.binary_output_stream::in, io.whence::in,
+    int64::in, io::di, io::uo) is det.

      % Returns the offset (in bytes) into the specified binary output stream.
      %
  :- pred binary_output_stream_offset(io.binary_output_stream::in, int::out,
      io::di, io::uo) is det.

+    % As above, but the offset is always a 64-bit value.
+    %
+:- pred binary_output_stream_offset64(io.binary_output_stream::in, int64::out,
+    io::di, io::uo) is det.
+
  %---------------------------------------------------------------------------%
  %
  % Binary input stream predicates.
@@ -1438,6 +1434,30 @@
  :- pred binary_input_stream_name(io.binary_input_stream::in, string::out,
      io::di, io::uo) is det.

+    % Seek to an offset relative to Whence (documented above)
+    % on a specified binary input stream. Attempting to seek on a pipe
+    % or tty results in implementation dependent behaviour.
+    %
+    % A successful seek undoes any effects of putback_byte on the stream.
+    %
+:- pred seek_binary_input(io.binary_input_stream::in, io.whence::in,
+    int::in, io::di, io::uo) is det.
+
+    % As above, but the offset is always a 64-bit value.
+    %
+:- pred seek64_binary_input(io.binary_input_stream::in, io.whence::in,
+    int64::in, io::di, io::uo) is det.
+
+    % Returns the offset (in bytes) into the specified binary input stream.
+    %
+:- pred binary_input_stream_offset(io.binary_input_stream::in, int::out,
+    io::di, io::uo) is det.
+
+    % As above, but the offset is always a 64-bit value.
+    %
+:- pred binary_input_stream_offset64(io.binary_input_stream::in, int64::out,
+    io::di, io::uo) is det.
+
  %---------------------------------------------------------------------------%
  %
  % Binary output stream predicates.
@@ -4142,23 +4162,27 @@ input_stream_foldl2_io_maybe_stop(Stream, Pred, T0, Res, !IO) :-

  %---------------------------------------------------------------------------%

+    % XXX FIXME this should return an int64.
  :- pred input_stream_file_size(io.input_stream::in, int::out,
      io::di, io::uo) is det.

  input_stream_file_size(input_stream(Stream), Size, !IO) :-
-    stream_file_size(Stream, Size, !IO).
+    stream_file_size(Stream, Size64, !IO),
+    Size = int64.cast_to_int(Size64).

+    % XXX FIXME this should return an int64.
  :- pred binary_input_stream_file_size(io.binary_input_stream::in, int::out,
      io::di, io::uo) is det.

  binary_input_stream_file_size(binary_input_stream(Stream), Size, !IO) :-
-    stream_file_size(Stream, Size, !IO).
+    stream_file_size(Stream, Size64, !IO),
+    Size = int64.cast_to_int(Size64).

      % stream_file_size(Stream, Size):
      % If Stream is a regular file, then Size is its size (in bytes),
      % otherwise Size is -1.
      %
-:- pred stream_file_size(stream::in, int::out, io::di, io::uo) is det.
+:- pred stream_file_size(stream::in, int64::out, io::di, io::uo) is det.

  :- pragma foreign_decl("C", "
  #ifdef MR_HAVE_UNISTD_H
@@ -4199,7 +4223,7 @@ binary_input_stream_file_size(binary_input_stream(Stream), Size, !IO) :-
      [will_not_call_mercury, promise_pure, thread_safe],
  "{
      if (Stream.stream.CanSeek) {
-        Size = (int) Stream.stream.Length;
+        Size = Stream.stream.Length;
      } else {
          Size = -1;
      }
@@ -7075,22 +7099,22 @@ int             ML_fprintf(MercuryFilePtr mf, const char *format, ...);
          // size(): [Java]
          //
          // Returns the length of a file.
-        public int size()
+        public long size()
              throws java.io.IOException
          {
-            return (int) channel.size();
+            return channel.size();
          }

          // getOffset():
          //
          // Returns the current position in a binary file.
-        abstract public int getOffset() throws java.io.IOException;
+        abstract public long getOffset() throws java.io.IOException;

          // seek(): [Java]
          //
          // Seek relative to start, current position or end depending on the
          // flag.
-        public void seek_binary(int flag, int offset)
+        public void seek_binary(int flag, long offset)
              throws java.io.IOException
          {
              long position;
@@ -7167,14 +7191,14 @@ int             ML_fprintf(MercuryFilePtr mf, const char *format, ...);
          }

          @Override
-        public int getOffset()
+        public long getOffset()
              throws java.io.IOException
          {
-            return (int) channel.position() - pushback.size();
+            return channel.position() - pushback.size();
          }

          @Override
-        public void seek_binary(int flag, int offset)
+        public void seek_binary(int flag, long offset)
              throws java.io.IOException
          {
              super.seek_binary(flag, offset);
@@ -7229,7 +7253,7 @@ int             ML_fprintf(MercuryFilePtr mf, const char *format, ...);
          }

          @Override
-        public int getOffset()
+        public long getOffset()
              throws java.io.IOException
          {
              return (int) channel.position();
@@ -9359,15 +9383,25 @@ whence_to_int(end, 2).

  seek_binary_input(binary_input_stream(Stream), Whence, Offset, !IO) :-
      whence_to_int(Whence, Flag),
-    seek_binary_2(Stream, Flag, Offset, Error, !IO),
+    seek_binary_2(Stream, Flag, int64.from_int(Offset), Error, !IO),
      throw_on_error(Error, "error seeking in file: ", !IO).

  seek_binary_output(binary_output_stream(Stream), Whence, Offset, !IO) :-
      whence_to_int(Whence, Flag),
+    seek_binary_2(Stream, Flag, int64.from_int(Offset), Error, !IO),
+    throw_on_error(Error, "error seeking in file: ", !IO).
+
+seek64_binary_input(binary_input_stream(Stream), Whence, Offset, !IO) :-
+    whence_to_int(Whence, Flag),
      seek_binary_2(Stream, Flag, Offset, Error, !IO),
      throw_on_error(Error, "error seeking in file: ", !IO).

-:- pred seek_binary_2(stream::in, int::in, int::in, system_error::out,
+seek64_binary_output(binary_output_stream(Stream), Whence, Offset, !IO) :-
+    whence_to_int(Whence, Flag),
+    seek_binary_2(Stream, Flag, Offset, Error, !IO),
+    throw_on_error(Error, "error seeking in file: ", !IO).
+
+:- pred seek_binary_2(stream::in, int::in, int64::in, system_error::out,
      io::di, io::uo) is det.

  :- pragma foreign_proc("C",
@@ -9427,14 +9461,24 @@ seek_binary_output(binary_output_stream(Stream), Whence, Offset, !IO) :-
  %---------------------%

  binary_input_stream_offset(binary_input_stream(Stream), Offset, !IO) :-
+    binary_stream_offset_2(Stream, Offset64, Error, !IO),
+    throw_on_error(Error, "error getting file offset: ", !IO),
+    Offset = int64.cast_to_int(Offset64).
+
+binary_output_stream_offset(binary_output_stream(Stream), Offset, !IO) :-
+    binary_stream_offset_2(Stream, Offset64, Error, !IO),
+    throw_on_error(Error, "error getting file offset: ", !IO),
+    Offset = int64.cast_to_int(Offset64).
+
+binary_input_stream_offset64(binary_input_stream(Stream), Offset, !IO) :-
      binary_stream_offset_2(Stream, Offset, Error, !IO),
      throw_on_error(Error, "error getting file offset: ", !IO).

-binary_output_stream_offset(binary_output_stream(Stream), Offset, !IO) :-
+binary_output_stream_offset64(binary_output_stream(Stream), Offset, !IO) :-
      binary_stream_offset_2(Stream, Offset, Error, !IO),
      throw_on_error(Error, "error getting file offset: ", !IO).

-:- pred binary_stream_offset_2(stream::in, int::out, system_error::out,
+:- pred binary_stream_offset_2(stream::in, int64::out, system_error::out,
      io::di, io::uo) is det.

  :- pragma foreign_proc("C",


More information about the reviews mailing list