[m-rev.] [m-dev.] io.read_named_file_as_*

Peter Wang novalazy at gmail.com
Fri Apr 21 17:57:10 AEST 2023


On Fri, 21 Apr 2023 17:09:45 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> 
> 2023-04-21 16:59 GMT+10:00 "Peter Wang" <novalazy at gmail.com>:
> >>         WHAT DOES StreamReader return for invalid UTF-8, or invalidUTF-16?
> >>         the following official-looking doc is silent on the issue
> >>         https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader.read?view=net-8.0#system-io-streamreader-read
> > 
> > It depends on the encoding that the StreamReader was instantiated with.
> > 
> > https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-encoding#Replacement
> > 
> > The default behaviour is to return the replacement character.
> > It's possible to create a Encoding instance that throws an exception
> > instead.
> > 
> > https://learn.microsoft.com/en-us/dotnet/api/system.text.utf8encoding.-ctor?view=net-7.0#system-text-utf8encoding-ctor(system-boolean-system-boolean)
> 
> Thanks for digging that up.
> 
> The replace-or-throw seems to be an utf8-specific decision;
> I expect there will be a corresponding utf16-specific decision.
> But as far as I can tell, io.m's open stream predicates don't even
> allow users to specify utf8 vs 16, so it looks like that should be
> fixed first.
> 

I don't think it's a high priority to read/write UTF-16 files as they
are uncommon as far as I know.

> I think this also would require changing our current approach
> of leaving the reader and writer slots null at the end of the open
> operation, and initializing them on demand, because the utf8 vs 16
> and replace vs throw info would be available only during the open
> operation. Do you know what considerations went into choosing
> the current approach in the first place?

I don't know what that is. It seems to have originated from the days
of the original .NET backend.

Peter


More information about the reviews mailing list