[m-rev.] [m-dev.] io.read_named_file_as_*
Peter Wang
novalazy at gmail.com
Fri Apr 21 17:57:10 AEST 2023
On Fri, 21 Apr 2023 17:09:45 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
>
> 2023-04-21 16:59 GMT+10:00 "Peter Wang" <novalazy at gmail.com>:
> >> WHAT DOES StreamReader return for invalid UTF-8, or invalidUTF-16?
> >> the following official-looking doc is silent on the issue
> >> https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader.read?view=net-8.0#system-io-streamreader-read
> >
> > It depends on the encoding that the StreamReader was instantiated with.
> >
> > https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-encoding#Replacement
> >
> > The default behaviour is to return the replacement character.
> > It's possible to create a Encoding instance that throws an exception
> > instead.
> >
> > https://learn.microsoft.com/en-us/dotnet/api/system.text.utf8encoding.-ctor?view=net-7.0#system-text-utf8encoding-ctor(system-boolean-system-boolean)
>
> Thanks for digging that up.
>
> The replace-or-throw seems to be an utf8-specific decision;
> I expect there will be a corresponding utf16-specific decision.
> But as far as I can tell, io.m's open stream predicates don't even
> allow users to specify utf8 vs 16, so it looks like that should be
> fixed first.
>
I don't think it's a high priority to read/write UTF-16 files as they
are uncommon as far as I know.
> I think this also would require changing our current approach
> of leaving the reader and writer slots null at the end of the open
> operation, and initializing them on demand, because the utf8 vs 16
> and replace vs throw info would be available only during the open
> operation. Do you know what considerations went into choosing
> the current approach in the first place?
I don't know what that is. It seems to have originated from the days
of the original .NET backend.
Peter
More information about the reviews
mailing list