[m-dev.] Porposal: Support reading/writing either dos/unix file endings on any platform

Peter Wang novalazy at gmail.com
Tue Nov 12 13:03:10 AEDT 2013

On Tue, 12 Nov 2013 12:06:57 +1100, Paul Bone <paul at bone.id.au> wrote:
> io.open_input/4 and io.open_output/4 open files for text input.  This means
> that line endings are automatically translated between platforms.  If a
> UNIX-style OS is used the file is assumed to be in the UNIX format, if a
> DOS-style OS is used then the file is assumed to be in the DOS format.
> However these assumptions aren't always correct.
> I share files with developers who use Windows (while I use Linux).  If a
> Mercury program such as the error program opens a file created by a
> developer who uses windows, then it incorrectly assumes the file uses the
> UNIX file endings, and after modifying and saving the file it uses a mix of
> both endings.
> Rather than awkwardly patch the error program, every other program that
> should behave like this, it'd be better to add support for this in Mercury's
> standard library.  I propose that:
>     + An input stream's line ending style is automatically detected from the
>       file's contents, not from the host OS.
>     + An output stream's line ending style can be specified by a new
>       io.open_output/5 predicate.  io.open_output/4 should redirect to this
>       and provide a default behaviour (use the host OS's preferred line
>       ending).
>     + A new pair of predicates input_stream_line_ending_style/2 and
>       output_stream_line_ending_style/2 be created to retrieve the line
>       ending style of a file handle.  This can be used to open an output
>       file in the same style as an input file.
> :- type line_ending_style
>     --->    unix_line_endings
>     ;       dos_line_endings.
> :- type maybe_line_ending_style
>     --->    host_os_line_endings
>     ;       unix_line_endings
>     ;       dos_line_endings.
>     % Maybe these should take the IO state pair?
>     %
> :- pred input_stream_line_ending_style(input_stream::in,
>     line_ending_style::out) is det.
> :- pred output_stream_line_ending_style(output_stream::in,
>     line_ending_style::out) is det.
>     % open_output(Filename, LineEndingStyle, Result, !IO),
>     %
> :- pred open_output(string::in, maybe_line_ending_style::in,
>     res(output_stream)::out, io::di, io::uo) is det.
> Thoughts?

I think you should specify whether you want CRLF->LF translation for
input streams, and LF->CRLF for output streams (defaulting to, e.g.
"yes" for input and "host" for output).  I'm not keen on the automatic
detection.  If it's limited to checking if/what type of newline
translation was required in the input stream *so far* then that's
probably ok.  output_stream_line_ending_style seems pointless.

A related problem is file encodings, and the "UTF-8 BOM".


More information about the developers mailing list