[m-rev.] [m-dev.] io.read_named_file_as_*

Zoltan Somogyi zoltan.somogyi at runbox.com
Fri Apr 21 17:09:45 AEST 2023


2023-04-21 16:59 GMT+10:00 "Peter Wang" <novalazy at gmail.com>:
>>         WHAT DOES StreamReader return for invalid UTF-8, or invalidUTF-16?
>>         the following official-looking doc is silent on the issue
>>         https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader.read?view=net-8.0#system-io-streamreader-read
> 
> It depends on the encoding that the StreamReader was instantiated with.
> 
> https://learn.microsoft.com/en-us/dotnet/standard/base-types/character-encoding#Replacement
> 
> The default behaviour is to return the replacement character.
> It's possible to create a Encoding instance that throws an exception
> instead.
> 
> https://learn.microsoft.com/en-us/dotnet/api/system.text.utf8encoding.-ctor?view=net-7.0#system-text-utf8encoding-ctor(system-boolean-system-boolean)

Thanks for digging that up.

The replace-or-throw seems to be an utf8-specific decision;
I expect there will be a corresponding utf16-specific decision.
But as far as I can tell, io.m's open stream predicates don't even
allow users to specify utf8 vs 16, so it looks like that should be
fixed first.

I think this also would require changing our current approach
of leaving the reader and writer slots null at the end of the open
operation, and initializing them on demand, because the utf8 vs 16
and replace vs throw info would be available only during the open
operation. Do you know what considerations went into choosing
the current approach in the first place?

>> -----------------
>>     java
>>                             io.stream_ops: Java MR_TextInputFile.read_char
>> 
>>         returns the 0xFFFD replacement character for invalid UTF-16
>>         that UTF-16 could have been created by conversion from UTF-8
>> 
> 
> That is the default behaviour. As with C#, it's possible to make the
> stream reader throw an exception on an ill-formed sequence.
> (See the following program, thanks to ChatGPT.)

It looks like whatever approach we end up choosing for C# will work
for Java as well.

Zoltan.


More information about the reviews mailing list