[m-rev.] for review: add examples of the C data passing conventions

Zoltan Somogyi zoltan.somogyi at runbox.com
Mon Aug 22 15:11:29 AEST 2022


2022-08-22 13:59 GMT+10:00 "Julien Fischer" <jfischer at opturion.com>:
> +% Mercury's int type corresponds to the C type MR_Integer.
> +% MR_Integer is a typedef defined by the Mercury runtime for a signed
> +% word-sized integral type.

"for" seems strange here (and in later copies).
I would go with something like "that expands to", and would also mention
that the expanded-to C type is autoconfigured, and may differ between
target systems.

> +    % int_add(A, B) = C:
> +    %
> +    % This function computes the sum of two Mercury ints using C code.
> +    %
> +:- func int_add(int, int) = int.
> +:- pragma foreign_proc("C",
> +    int_add(A::in, B::in) = (C::out),
> +    [promise_pure, will_not_call_mercury, thread_safe],
> +"
> +    C = A + B;
> +").

I don't think this example, and similar ones later on, are as effective
as you would want them to be, because they don't demonstrate
the use of the C types corresponding to Mercury types.

I would go with C code such as

 MR_Integer AB = A + B,
 D = AB +C

where the predicate computes D from A, B and C.

I would also explain, in this first example, what the promise_pure,
will_not_call_mercury, and thread_safe annotations mean.
Without that, the kinds of readers who need this introduction
can take them as "incantations to appease the great god mmc",
and include them even in foreign_procs in which they are not appropriate.

> +    % Return the largest uint64 value.
> +    %
> +:- func big_uint64 = uint64.
> +:- pragma foreign_proc("C",
> +    big_uint64 = (A::out),
> +    [promise_pure, will_not_call_mercury, thread_safe],
> +"
> +    // We could also write: A = UINT64_MAX;
> +    A = UINT64_C(18446744073709551615);
> +").

Given that you mentioned special requirements for 64 bit code above,
I would have used an example that illuminates those requirements.

> +%----------------------------------------------------------------------------%
> +%
> +% Floats.
> +%
> +
> +% Mercury's float type corresponds to the C type MR_Float.
> +% MR_Float is a typedef defined by the Mercury runtime.
> +% In spf (single-precision) float grades, it is a typedef for C's float type.
> +% In other grades, it is a typedef for C's double type.

Parentheses in wrong place: should be "(single precision float)".

> +% C code can test whether the macro MR_USE_SINGLE_PREC_FLOAT is defined to
> +% check if an spf grade is being used.
> +
> +    % add_floats(A, B) = C:
> +    % This function computes the sum of two Mercury floats using C code.
> +    %
> +:- func add_floats(float, float) = float.
> +:- pragma foreign_proc("C",
> +    add_floats(A::in, B::in) = (C::out),
> +    [promise_pure, will_not_call_mercury, thread_safe],
> +"
> +    C = A + B;
> +").

I would use an operation that does not make sense for integers,
such as sqrt.

> +%----------------------------------------------------------------------------%
> +%
> +% Characters.
> +%
> +
> +% Mercury's char type corresponds to the C type MR_Char.
> +% MR_Char is a typedef defined by the Mercury runtime for a signed 32-bit
> +% integral type.
> +%
> +% A Mercury char represents a Unicode code point and valid values must be in
> +% the range [0, 0x10ffff]. Mercury's foreign language interface does *not*
> +% check that characters passed back to Mercury are within this range.

Add a comma after "point". And an explanation for the absence of this check
may also be useful.

> +% Mercury's string type corresponds to the C type MR_String.
> +% MR_String is a typedef defined by the Mercury runtime for a pointer to char
> +% (i.e. char *).

Again, "for".

> +% Mercury's list.list/1 type corresponds to the C type MR_Word.

In llds grades it does, but I thought in mlds grades we usually call it MR_Box.

> +% MR_Word is a typedef declared in the Mercury runtime for an unsigned integral
> +% type whose size is the same size as a pointer.
> +%
> +% The Mercury runtime defines the following function-like macros for
> +% manipulating Mercury lists in C code:
> +%
> +%     MR_bool MR_list_is_empty(MR_Word list);
> +%     MR_Word MR_list_head(MR_Word list);
> +%     MR_Word MR_list_tail(MR_Word tail);
> +%     MR_Word MR_list_empty(void);
> +%     MR_Word MR_list_cons(MR_Word head, MR_Word tail);
> +%
> +% When an element is extracted from a list using the MR_list_head() macro, that
> +% element will also have the type MR_Word. How you convert that MR_Word value
> +% to the actual element type depends on what the element type is. For most
> +% element types you can insert a cast to the appropriate type. If the element
> +% type is float, int64 or uint64 you might need to arrange for the element to
> +% be unboxed -- see the following two sections for further details.

This won't make sense to readers who don't know what "boxing" means in this
context.

> In the
> +% following examples, we have lists of int, so adding a cast to MR_Integer will
> +% suffice.

And this won't make sense either, unless you tell readers that data types
whose sizes are one word or less are never boxed.

> +% Because the size of a Mercury float might exceed a word, floats contained in
> +% Mercury data structures might be boxed. That is, they are passed around as a
> +% pointer to a slot on the heap where the actual float is stored.

I wouldn't say "might exceed": I would specify exact when it would exceed a word,
and when it wouldn't. It is not too complicated for users.

> +% When manipulating Mercury data structures that contain floats in C code,
> +% you must account for the possibility that floats are boxed.

"that your code will be compiled on 32 bit machines in grades in which
floats are boxed".


> +% Data structures containing 64-bit integers.

Same comments here.

> +% Foreign types.
> +%
> +
> +% In this section we illustrate how to use a type defined in C from Mercury.
> +
> +    % Here is a C type that we wish to use in Mercury.
> +    %
> +:- pragma foreign_decl("C", "
> +
> +    // A C structure representing a vector in 3-dimensional space.
> +    //
> +    typedef struct {
> +        double i;
> +        double j;
> +        double k;
> +    } c_vector;
> +").

Why not x, y z?

> +    % A declaration for the vector/0 type.
> +    %
> +    % The foreign_type pragma we use below does not act as a type declaration,
> +    % so the following abstract type declaration serves that purpose.

That explains things from a compiler writer point of view, which users
don't care about.

> +    % When using foreign types we must provide this even if the type is not
> +    % exported from its defining module.
> +    %
> +:- type vector.

I would be more direct: say that the declaration must be in the interface section
iff the type is exported, but the foreign_type pragma must be in the impl section
regardless of whether the type is exported.

> +    // The macro MR_GC_NEW() is used to allocate memory using the garbage
> +    // collector.

I know why gc is involved in allocation, but most readers who need this
won't know that. The point you want to get across is that memory allocated
via MR_GC_NEW will be deallocated automatically by the Mercury runtime,
and that it need not and *should* not be deallocated manually.

> It allocates space sufficient for an object of the type named
> +    // by the argument.

by ITS argument

> +% The type io.state/0 is what is known as "dummy type". The Mercury compiler
> +% does not generate code that passes around values of dummy types.
> +% Nevertheless, foreign_proc arguments of type io.state/0 are manifested in the
> +% foreign_proc bodies as local variables of type MR_Word.
> +%
> +% Because the Mercury compiler will emit warnings for foreign_proc arguments
> +% that not referred to by the body of the foreign_proc, you must use one of
> +% the following approaches to handling io.state/0 arguments.
> +
> +    % This example illustrates the first (and preferred) approach to dealing
> +    % with arguments of type io.state/0 in foreign_procs: ignore them.
> +    % The Mercury compiler does not require that foreign_proc arguments whose
> +    % name begins with an underscore be referred to in the foreign_proc body.
> +    %
> +:- pred say_hello(io::di, io::uo) is det.
> +:- pragma foreign_proc("C",
> +    say_hello(_IO0::di, _IO::uo),
> +    [promise_pure, will_not_call_mercury],
> +"
> +    puts(\"Hello!\\n\");
> +").
> +
> +    % If foreign_proc arguments of type io.state/0 are not ignored then they

Add comma after "ignored".

> +    % will manifest in the foreign_proc body as local variables of the same
> +    % names as the arguments.
> +    %
> +    % Any value assigned to the variable IO in the following block of code will
> +    % be ignored.

... by the Mercury code that invoked the foreign_proc.

> The convention is to assign the initial io.state/0 argument
> +    % to the final io.state/0 argument at the end of the foreign_proc body.
> +    %

This convention does not make sense unless you state that
foreign_procs whose body does not mention an argument will get an error
message, *unless* its name starts with _. In other words, move the last two lines
from the previous example here, in suitably mutated form.

> +% In Mercury, an enumeration is a discriminated union type where none of the
> +% data constructors has any arguments. Mercury enumeration types correspond
> +% to the C type MR_Integer.

This is true only for loose senses of the word "correspond". The straightforward
corresponding C type is a C enum. You want to say that Mercury passes values
of Mercury enums to C code as values of type MR_Integer.

> +% This example illustrates how to use Mercury's foreign_enum pragma to assign
> +% the values by which each constructor of a Mercury enumeration is represented
> +% in C code. 

I don't think this says what you want to say. The point of foreign_enum pragmas
is that Mercury conforms to the name->representation mapping set by C,
unlike foreign_export_enum pragmas, which do the opposite.

> +% The MR_ArrayType structure has two fields. The first is named "size" and
> +% has type MR_Integer. Its value gives the number of elements in the array.
> +% The second is named "elements" and is the underlying array of elements.
> +% (The actual definition of this second field varies depending on whether a C
> +% compiler that supports variable-length arrays is being used or not.)

I would use "whether the configured C compiler ..."

Other than all of that nitpicking, the diff is fine :-)

Zoltan.


More information about the reviews mailing list