[m-users.] C foreign language interface question

fabrice nicol fabrnicol at gmail.com
Thu Apr 22 02:57:14 AEST 2021


Thank you for your very clear replies.

I will be able to go ahead and come back when the code is about done 
with (hopefully, good) results.

As I would like to test whether Mercury processing of large data is any 
option, I am trying to avoid copying data as much as possible. So my 
practical design constraints are a bit different from those of the core 
team (which I understand quite well).

I'm doing this for obvious efficiency reasons: copying large R databases 
(exported to C as lists of arrays by the R FFI) to Mercury lists would 
come at a cost and looks sub-optimal since Mercury can manage array-like 
constructs (as is done in several Mercury libs, like mercury_json or 
mercury_cairo, but these are GC'd within Mercury and seem better adapted 
to small/medium-sized data).

So I'll proceed with a few memory management options, see what turns 
out, and report.

Best,

Fabrice

Le 21/04/2021 à 16:12, Zoltan Somogyi a écrit :
>
> On Wed, 21 Apr 2021 15:17:52 +0200, fabrice nicol <fabrnicol at gmail.com> wrote:
>> 1.  The 'sharing' FFI foreign_proc annotation
>>
>> In chapter 15.1., this annotation is not documented, perhaps because it
>> is not advised to use it.
> It was added to support experimental work on compile-time garbage collection.
> Unfortunately, the experiment ended when the person working on it finished
> her PhD. At the moment, such annotations are accepted but unused.
>
>> Is this annotation necessary to allow Mercury to access C arrays?
> No.
>
>> 2. R has its own GC, which is fired by any call to the C FFI of R. To
>> protect recently allocated memory chunks from the R GC, the dedicated C
>> FFI has a useful macro named PROTECT, and another one to unprotect the
>> said chunks, leaving them open to R garbage collection. A
>> quick-and-dirty reference is here:
>> https://github.com/hadley/r-internals/blob/master/gc-rc.md
> Sorry, but in my experience, quick-and-dirty guides are far too likely
> to give their readers only enough info to get them into trouble,
> but enough to get them out of trouble :-( So I am answering your
> questions without reading that link.
>
>> The question is simple. It is either (a) or (b):
>>
>> (a) if C memory chunks allocated by the R C FFI **are not** protected
>> from the Mercury GC, while the R-C FFI code is running:
> I am assuming that by "protected from Mercury gc" you mean "Mercury gc
> won't try to free memory that it did not allocate".
>
>> (b) if C memory chunks allocated by the R-C FFI **are** protected from
>> the Mercury GC when the R-C FFI code is running:
> In the sense above, yes, they are protected, i.e. the gc system used by Mercury,
> the Boehm conservative garbage collection for C, collects only memory blocks
> it allocates itself. A search for "Boehm collector" should tell you what
> you need to know about it.
>
>> --> when the R server has been shut down, how to free these chunks
>> initially allocated by the R-C FFI code:
> I don't know what "these chunks" you are referring to.
>
>> - by calling the Mercury GC?
> You never need to invoke Mercury gc. It is invoked automatically
> when an allocation finds no memory on the free list. (That is the rough
> logic, the details are more complex.)
>
>> I.E. by calling 'MR_GC_register_finalizer' as  in file cairo.m of the
>> mercury_cairo 'extras' library, whose design I overall follow. Is this
>> the right way?
> I don't understand your requirements anywhere well enough to answer
> that question.
>
>> If I understand it well, 'MR_GC_register_finalizer' may call a specific
>> C memory cleanup procedure for non (Mercury) GC-managed memory chunks
>> coming from foreign interfaces?
> No. MR_GC_register_finalizer() asks the Boehm gc to call the specified function
> when a chunk of memory *managed by Boehm* is collected.
>
>> - or by bluntly calling 'free' in due time with a dedicated Mercury-C
>> FFI call, independently of any Mercury GC intervention ?
> I have no idea what is the right thing to do to free memory allocated by R,
> but don't call free() on a memory block allocated by Boehm. Such blocks
> are not allocated by malloc(), so calling free() on them will result in data corruption.
>
>> Perhaps a few words on such issues would be useful in the FLI sections
>> of the Reference manual. Incidentally, perhaps this manual could
>> document MR_ArrayPtr in 15.3.1 (used in array.m). It may be of interest
>> when Mercury arrays are passed along to the C FFI (as in ML_resize_array
>> for example).
> The Mercury reference manual documents how to pass atomic data
> between Mercury and target languages such as C. It deliberately does not
> document how to pass compound data structures, because we did not want
> to be locked into supporting design decisions that we may need to change.
> (We do document how to pass lists between languages, precisely because
> we are pretty sure that representation is set in stone.)
>
> The idea is that if you have e.g. a Mercury array, and you want to
> give access to it to C code, then you can export to C some functions
> that implement basic operations (such as lookup and update) on the array.
> In these operations, the arguments holding the arrays will be MR_Words,
> effectively black boxes that the C code is not supposed to understand
> or change.
>
> If you want to teach C code about how Mercury arrays are implemented,
> then (a) congratulations, you are now a Mercury implementor, and (b)
> commiserations, you will need to update your code whenever the internals
> of Mercury arrays are changed by anybody. They have been stable
> for a long time, but they have changed in the past, and we can't promise
> they won't change again.
>
> One person who could possibly help is Richard O'Keefe, since he knows
> R, Mercury and gc well, and I believe he reads this mailing list.
> Richard, do you have any useful info for Fabrice?
>
> Zoltan.


More information about the users mailing list