[m-dev.] Porposal: Support reading/writing either dos/unix file endings on any platform
Peter Wang
novalazy at gmail.com
Tue Nov 12 13:03:10 AEDT 2013
On Tue, 12 Nov 2013 12:06:57 +1100, Paul Bone <paul at bone.id.au> wrote:
>
> io.open_input/4 and io.open_output/4 open files for text input. This means
> that line endings are automatically translated between platforms. If a
> UNIX-style OS is used the file is assumed to be in the UNIX format, if a
> DOS-style OS is used then the file is assumed to be in the DOS format.
> However these assumptions aren't always correct.
>
> I share files with developers who use Windows (while I use Linux). If a
> Mercury program such as the error program opens a file created by a
> developer who uses windows, then it incorrectly assumes the file uses the
> UNIX file endings, and after modifying and saving the file it uses a mix of
> both endings.
>
> Rather than awkwardly patch the error program, every other program that
> should behave like this, it'd be better to add support for this in Mercury's
> standard library. I propose that:
>
> + An input stream's line ending style is automatically detected from the
> file's contents, not from the host OS.
> + An output stream's line ending style can be specified by a new
> io.open_output/5 predicate. io.open_output/4 should redirect to this
> and provide a default behaviour (use the host OS's preferred line
> ending).
> + A new pair of predicates input_stream_line_ending_style/2 and
> output_stream_line_ending_style/2 be created to retrieve the line
> ending style of a file handle. This can be used to open an output
> file in the same style as an input file.
>
> :- type line_ending_style
> ---> unix_line_endings
> ; dos_line_endings.
>
> :- type maybe_line_ending_style
> ---> host_os_line_endings
> ; unix_line_endings
> ; dos_line_endings.
>
> % Maybe these should take the IO state pair?
> %
> :- pred input_stream_line_ending_style(input_stream::in,
> line_ending_style::out) is det.
>
> :- pred output_stream_line_ending_style(output_stream::in,
> line_ending_style::out) is det.
>
> % open_output(Filename, LineEndingStyle, Result, !IO),
> %
> :- pred open_output(string::in, maybe_line_ending_style::in,
> res(output_stream)::out, io::di, io::uo) is det.
>
> Thoughts?
I think you should specify whether you want CRLF->LF translation for
input streams, and LF->CRLF for output streams (defaulting to, e.g.
"yes" for input and "host" for output). I'm not keen on the automatic
detection. If it's limited to checking if/what type of newline
translation was required in the input stream *so far* then that's
probably ok. output_stream_line_ending_style seems pointless.
A related problem is file encodings, and the "UTF-8 BOM".
Peter
More information about the developers
mailing list