[m-rev.] for review: more specific errors when file sizes exceed maximum buffer sizes

Peter Wang novalazy at gmail.com
Mon Oct 28 11:57:50 AEDT 2019


On Fri, 25 Oct 2019 16:02:28 +1100 (AEDT), Julien Fischer <jfischer at opturion.com> wrote:
> 
> More specific errors when file sizes exceed maximum buffer sizes.
> 
> Predicates such as read_file_as_string and read_binary_input_as_bitmap, use
> buffers that are backend by Mercury arrays.  Since Mercury arrays are indexed
> using the int type, it is possible to cause the buffer index to overflow when
> reading large files on 32-bit platforms.  Detect this situation and throw an
> exception that is specific to it.  Currently, the behaviour we get depends on
> how much the overflowing index wraps around.
> 
> As part of the above, fix the XXXs where we were casting 64-bit stream file
> sizes to ints; file stream files sizes are now always returned as int64s.
> 
> Julien.
> 
> diff --git a/library/io.m b/library/io.m
> index 91c1a72..7b63ed4 100644
> --- a/library/io.m
> +++ b/library/io.m
> @@ -200,6 +200,9 @@
>       % Returns an error if the file contains a null character, because
>       % null characters are not allowed in Mercury strings.
>       %
> +    % Throws an exception if the number of characters in the stream exceeds the
> +    % maximum index of an array on this platform.
> +    %

Number of characters may be interpreted as code points, and maximum
index of an array seems like an implementation detail.
Maybe just write the limits directly?

>       % WARNING: the returned string is NOT guaranteed to be valid UTF-8
>       % or UTF-16.
>       %
> @@ -1053,8 +1056,10 @@
>       bitmap::bitmap_di, bitmap::bitmap_uo, num_bytes::out, io.res::out,
>       io::di, io::uo) is det.
> 
> -    % Reads all the bytes until eof or error from the current binary input
> -    % stream or from the specified binary input stream into a bitmap.
> +    % Reads all the bytes until eof on error from the current binary input

or error

> +    % stream or the specified binary input stream into a bitmap. Throws an

I think it was easier to read with "or from".

> +    % exception if the number of bytes to be read exceeds the maximum number
> +    % of bytes that can be stored in a bitmap on this platform.
>       %
>   :- pred read_binary_file_as_bitmap(
>       io.res(bitmap)::out, io::di, io::uo) is det.
> @@ -3626,9 +3631,18 @@ read_binary_file_as_bitmap(Stream, Result, !IO) :-
>       % according to the size of the file. Otherwise, just use a default buffer
>       % size of 4k minus a bit (to give malloc some room).
>       binary_input_stream_file_size(Stream, FileSize, !IO),
> -    ( if FileSize >= 0 then
> -        binary_input_stream_offset(Stream, CurrentOffset, !IO),
> -        RemainingSize = FileSize - CurrentOffset,
> +    ( if FileSize >= 0i64 then
> +        binary_input_stream_offset64(Stream, CurrentOffset, !IO),
> +        RemainingSizeInt64 = FileSize - CurrentOffset,
> +        ( if
> +            int.bits_per_int = 32,
> +            RemainingSizeInt64 > from_int(int.max_int)
> +        then
> +            error("io.read_binary_file_as_bitmap",
> +                "file size exceeds maximum buffer size")
> +        else
> +            RemainingSize = cast_to_int(RemainingSizeInt64)
> +        ),
>           some [!BM] (
>               !:BM = bitmap.init(RemainingSize * bits_per_byte),
>               ( if RemainingSize = 0 then
> @@ -4043,18 +4057,30 @@ read_file_as_string_2(Stream, String, NumCUs, Error, NullCharError, !IO) :-
>       % according to the size of the file. Otherwise, just use a default buffer
>       % size of 4k minus a bit (to give malloc some room).
>       input_stream_file_size(input_stream(Stream), FileSize, !IO),
> -    ( if FileSize >= 0 then
> -        BufferSize0 = FileSize + 1
> +    ( if FileSize >= 0i64 then
> +        BufferSize0 = FileSize + 1i64
> +    else
> +        BufferSize0 = 4000i64
> +    ),
> +
> +    % In grades with 32-bit ints, buffer indexes will be 32-bit, so abort if we
> +    % need to read in more than int.int_max bytes.

int.max_int

Otherwise, it looks fine.

Peter


More information about the reviews mailing list