[m-rev.] for review: specialise file copying in the compiler

Julien Fischer jfischer at opturion.com
Sat Apr 23 22:38:46 AEST 2022


Hi Zoltan,

On Sat, 23 Apr 2022, Zoltan Somogyi wrote:

>
> 2022-04-23 21:38 GMT+10:00 "Julien Fischer" <jfischer at opturion.com>:
>> Specialise file copying in the compiler.
>>
>> Specialise the Mercury implementation of file copying in the compiler by
>> avoiding the use of io.binary_input_stream_foldl_io/5. This allows us to avoid
>> a higher-order call each time a byte is written and it allows us to use the
>> unboxed version of the predicate that reads bytes. The other reason for this
>> change is that we are planning to deprecate (and eventually remove)
>> io.binary_input_stream_foldl_io/5.
>
> Avoiding the use of binary_input_stream_foldl_io is a good idea,
> and we have agreed on it a while ago. But copying the input one byte
> at a time isn't the right approach. The right approach would be to
> read the entire file to be copied into a buffer, and then write out the buffer.
> You could then use stat to preallocate exactly the buffer size you need,
> and read the entire file with one system call, at least for files that (a) have
> a known size, unlike streams, and (b) have a size that fits into main memory.

I agree and when the necessary machinery exists (see below) will do that.
I will add a comment about that for now.

Actually, there are a few things that should happen here:

1. If the OS or platform provides a file copy operation (e.g. CopyFile()
on Windows, java.nio.file.Files.copy()) we should use that in preference
to either calling an external command or using the Mercury implementation.

2. We should add a way to tell the compiler not to even try copying a
file using an external command. (This would be useful on Windows, where
the copy command is a poor substitute for cp.)

3. We should add some sort of copy_file predicate to the standard
library.  This is slightly iffy since file copy operations in most
programming language standard libraries are really "copy the file and
preserve relevant metadata where possible". On balance, I think it would
be a worthwhile addition and we can make it work well enough on the
platforms we support.

> The only reason I did not do that when eliminating the other uses of stream_foldl_io
> in the compiler is that we don't yet have a byte_array type for the buffer.
> We even had a conversation about what to name it, in both read-only
> and read-write versions, a month or two ago. I thought, maybe mistakenly,
> that you said you would implement those types. (I don't know Java and C#
> well enough to do this for them, or I would do it myself.)

There was the small matter of the 22.01 release between then and now ;-)
I am still intending to implement the byte_array types.

Do you have any objections to committing this with the added comment?
I would like to go ahead and deprecate the stream folds in the io
module.

Another question: should the stream foldls in the stream module be
modified to reduce stack usage in debug grades (as some of the ones
in the io module have been)?

Julien.


More information about the reviews mailing list