[m-rev.] for review: more specific errors when file sizes exceed maximum buffer sizes
Peter Wang
novalazy at gmail.com
Mon Oct 28 11:57:50 AEDT 2019
On Fri, 25 Oct 2019 16:02:28 +1100 (AEDT), Julien Fischer <jfischer at opturion.com> wrote:
>
> More specific errors when file sizes exceed maximum buffer sizes.
>
> Predicates such as read_file_as_string and read_binary_input_as_bitmap, use
> buffers that are backend by Mercury arrays. Since Mercury arrays are indexed
> using the int type, it is possible to cause the buffer index to overflow when
> reading large files on 32-bit platforms. Detect this situation and throw an
> exception that is specific to it. Currently, the behaviour we get depends on
> how much the overflowing index wraps around.
>
> As part of the above, fix the XXXs where we were casting 64-bit stream file
> sizes to ints; file stream files sizes are now always returned as int64s.
>
> Julien.
>
> diff --git a/library/io.m b/library/io.m
> index 91c1a72..7b63ed4 100644
> --- a/library/io.m
> +++ b/library/io.m
> @@ -200,6 +200,9 @@
> % Returns an error if the file contains a null character, because
> % null characters are not allowed in Mercury strings.
> %
> + % Throws an exception if the number of characters in the stream exceeds the
> + % maximum index of an array on this platform.
> + %
Number of characters may be interpreted as code points, and maximum
index of an array seems like an implementation detail.
Maybe just write the limits directly?
> % WARNING: the returned string is NOT guaranteed to be valid UTF-8
> % or UTF-16.
> %
> @@ -1053,8 +1056,10 @@
> bitmap::bitmap_di, bitmap::bitmap_uo, num_bytes::out, io.res::out,
> io::di, io::uo) is det.
>
> - % Reads all the bytes until eof or error from the current binary input
> - % stream or from the specified binary input stream into a bitmap.
> + % Reads all the bytes until eof on error from the current binary input
or error
> + % stream or the specified binary input stream into a bitmap. Throws an
I think it was easier to read with "or from".
> + % exception if the number of bytes to be read exceeds the maximum number
> + % of bytes that can be stored in a bitmap on this platform.
> %
> :- pred read_binary_file_as_bitmap(
> io.res(bitmap)::out, io::di, io::uo) is det.
> @@ -3626,9 +3631,18 @@ read_binary_file_as_bitmap(Stream, Result, !IO) :-
> % according to the size of the file. Otherwise, just use a default buffer
> % size of 4k minus a bit (to give malloc some room).
> binary_input_stream_file_size(Stream, FileSize, !IO),
> - ( if FileSize >= 0 then
> - binary_input_stream_offset(Stream, CurrentOffset, !IO),
> - RemainingSize = FileSize - CurrentOffset,
> + ( if FileSize >= 0i64 then
> + binary_input_stream_offset64(Stream, CurrentOffset, !IO),
> + RemainingSizeInt64 = FileSize - CurrentOffset,
> + ( if
> + int.bits_per_int = 32,
> + RemainingSizeInt64 > from_int(int.max_int)
> + then
> + error("io.read_binary_file_as_bitmap",
> + "file size exceeds maximum buffer size")
> + else
> + RemainingSize = cast_to_int(RemainingSizeInt64)
> + ),
> some [!BM] (
> !:BM = bitmap.init(RemainingSize * bits_per_byte),
> ( if RemainingSize = 0 then
> @@ -4043,18 +4057,30 @@ read_file_as_string_2(Stream, String, NumCUs, Error, NullCharError, !IO) :-
> % according to the size of the file. Otherwise, just use a default buffer
> % size of 4k minus a bit (to give malloc some room).
> input_stream_file_size(input_stream(Stream), FileSize, !IO),
> - ( if FileSize >= 0 then
> - BufferSize0 = FileSize + 1
> + ( if FileSize >= 0i64 then
> + BufferSize0 = FileSize + 1i64
> + else
> + BufferSize0 = 4000i64
> + ),
> +
> + % In grades with 32-bit ints, buffer indexes will be 32-bit, so abort if we
> + % need to read in more than int.int_max bytes.
int.max_int
Otherwise, it looks fine.
Peter
More information about the reviews
mailing list