[m-dev.] foreign type syntax
Tyson Dowd
trd at cs.mu.OZ.AU
Tue Oct 30 17:03:23 AEDT 2001
This reply kind of wanders a bit, but basically I'd like to ask -- what
are you disagreeing with, and why. I've tried to give my views on why I
like the syntax I proposed.
On 30-Oct-2001, Fergus Henderson <fjh at cs.mu.OZ.AU> wrote:
> On 30-Oct-2001, Tyson Dowd <trd at cs.mu.OZ.AU> wrote:
> > On 29-Oct-2001, Fergus Henderson <fjh at cs.mu.OZ.AU> wrote:
> > > On 29-Oct-2001, Tyson Dowd <trd at cs.mu.OZ.AU> wrote:
> > > > If you specify the types using C syntax, you are going to have a bit of
> > > > a hard time marshalling them to and from asm (you won't know how to
> > > > generate type specifications in gcc's tree representation without
> > > > parsing the type specifications).
> > >
> > > What's wrong with just using `gcc__ptr_type_node' for all such types, and
> > > handling the marshalling needed by generating appropriate C code in the
> > > C wrapper function that we generate for C `pragma foreign_proc' procedures?
> >
> > As I said:
> >
> > > > I wouldn't be shocked if you could hack around this by using lots of
> > > > hacky casts and some sneaky interfacing code, but it might be nice if
> > > > you don't have to...
> >
> > I think this solution falls into the hacky casts category.
>
> What exactly do you think is "hacky" or "sneaky" about it?
Hacky:
It's hard to maintain, hard to read, gdb will make a mess of it.
Compare it to the alternative which has no casts to generic types, uses
real type names on arguments, and doesn't have to switch on sizeof.
Sneaky:
You allocate memory behind the programmer's back -- the user may find
that allocating their own heap objects is in fact faster than using
"structs" in some cases (consider the code that will be generated for
the identity function -- a copy will be made).
I guess you might not find that sneaky, but I suspect if you didn't know
much about compiler implementation you might.
I'm not saying the code isn't correct -- this is C code after all. It
can be both hacky and perfectly legal ANSI C.
And I'm not saying it's a bad solution given the constraints.
Just that it would be nice to avoid all of this someday, if possible.
> > The problems are that it forces the user to box non-word sized values,
>
> Who said the *user* has to do it?
Sorry, this was just wrong, the compiler can of course do it.
I should have said it forces non-word sized values to be boxed.
>
> I was thinking about having the Mercury compiler do it,
> by generating appropriate C code to box non-word sized values
> in the C wrapper function that we generate for C
> `pragma foreign_proc' and `pragma export' procedures.
>
> E.g. for
>
> :- type foo.
> :- pragma foreign_type(foo, c("Bar")).
>
> :- func example(foo) = foo.
> :- pragma export(example/1, "my_example").
>
> then as well as compiling the code for example/1 to assembler,
> it could generate some C wrapper code like the following:
>
> extern MR_Box example_2_f_0(MR_Box); /* defined in asm */
> template <class T> MR_Box MR_MAYBE_BOX(T); /* see below */
> template <class T> T MR_MAYBE_UNBOX(MR_Box); /* see below */
>
> /* C wrapper function for pragma export procedure */
> Bar my_example(Bar x) {
> MR_Box boxed_x = MR_MAYBE_BOX<Bar>(x);
> MR_Box y;
> y = example_2_f_0(x);
> return MR_MAYBE_UNBOX<Bar>(y);
> }
>
> This is very similar to the C wrapper code that we already generate
> when handling foreign language imports/exports for the asm backend.
> The only difference is that I'm using slightly different boxing/unboxing
> functions. Here MR_MAYBE_BOX and MR_MAYBE_UNBOX are function templates,
> which in C++ could be defined like this:
>
> template <class T>
> MR_Box
> MR_MAYBE_BOX(T value) {
> if (sizeof(T) > sizeof(MR_Box)) {
> T *p = MR_new_object(T, sizeof(T), name_of_T);
> *p = value;
> return p;
> } else {
> MR_Box b = 0;
> memcpy(&b, &value, sizeof(T));
> return b;
> }
> }
>
> template <class T>
> T
> MR_MAYBE_UNBOX(MR_Box b) {
> if (sizeof(T) > sizeof(MR_Word)) {
> return *(T *)b;
> } else {
> T value;
> memcpy(&value, &b, sizeof(T));
> return value;
> }
> }
>
> Of course since we're generating C rather than C++, we'd couldn't use
> templates; instead, we could have to have the Mercury compiler expand
> out calls to these when generating the C code, or we could just use
> macros rather than templates. But you get the idea...
Yes, but...
What are you trying to convince me of? I'm starting to lose the point
of this discussion.
My understanding was you disagreed with the idea of having an encoding
of the type name in Mercury, and so far your reasoning seems to be "you
don't need it if you do things this way".
Is that right? If so I'm happy to allow that as an option (exactly the
syntax you propose is fine), but I'd like to in parallel experiment with
a term syntax for specifying types so the compiler can possibly do a
better job.
It's very easy to turn term syntax into a string to use it as you have
said. It's much harder to extract information from strings.
I'm trying to choose the path that cuts off few options -- although I will
only support a few possibilities at first, this will not be because the
chosen syntax restricts us, but just because they are NYI.
> > and constrains the implementation such that it cannot carry the exact
> > type, even if it otherwise would be able to.
>
> Well, I don't think that is a problem for the asm backend,
> or for untyped backends in general.
I think that's what I was getting to in the next statement.
>
> > For the C backend it isn't such a big deal because everybody does hacky
> > casts all the time, it often costs you nothing, and having the exact
> > type around doesn't often help the compiler at all.
>
> Hang on, I thought we were talking about the asm backend. For the C
> backend, the issue needn't arise at all, because you could just use the
> C type name directly.
>
> (Well, I guess you might want to use the same approach as in the asm
> backend, just to preserve binary compatibility between the C and asm
> backends. But that would work too.)
That was a thinko, I meant to write "asm backend with C code
interfacing".
>
> > But for other
> > backends (.NET, Java) it does, because casts are checked.
>
> For the strongly-typed backends, the user can specify the appropriate type
> name for the target language in the `pragma foreign_type' declaration,
> and the Mercury compiler can just emit that type name in the generated
> target code.
IL uses a different syntax to C# which has different syntax to MC++ and
so on.
Is your suggestion that the user provide one type name for each
foreign language?
This would be somewhat difficult if we ever decided to use a
programmatic interface on the strongly typed backends (e.g.
.NET's Reflection.Emit or a java compiler's backend).
I'd prefer to specify types in terms of the underlying type system if
possible. For asm and hlc and LLDS grades, it's conceptually the C type
system. For il and ilc grades it's the .NET type system. For java it's
the Java type system.
I find cut and paste style foreign interface generation becomes
very baroque, and in some cases means you spend more time coming up with
clever macros and delving into obscure syntax possibilities than
actually working.
In C we are spoiled by having a very powerful macro system, but many
other languages have no such system and it can be quite difficult to
generate syntactically correct code by just pasting the user's strings
into certain places.
> My suggestion about how to make things work for the asm backend
> only applies to untyped backends.
--
Tyson Dowd #
# Surreal humour isn't everyone's cup of fur.
trd at cs.mu.oz.au #
http://www.cs.mu.oz.au/~trd #
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to: mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions: mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------
More information about the developers
mailing list