[m-users.] Fwd: Documenting univ data passing to and from the C FFI

Fabrice Nicol fabrnicol at gmail.com
Thu May 20 18:48:35 AEST 2021

Previous message: [m-users.] Documenting univ data passing to and from the C FFI
Next message: [m-users.] Documenting univ data passing to and from the C FFI
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Thanks Julien.
I think this is quite an interesting debate actually.

I am taking the point of view of a more-or-less-advanced user in the course
of writing an interface to an external C library.
There are two independent sets of constraints to consider: Mercury language
design constraints and performance issues of the interface when they are
relevant. Actually the former may run contrary to the latter sometimes. So
the question is what must yield first in such cases, and how to strike a
reasonable balance.

 > "*Why do you need to pass univs across the Mercury-C boundary?*
*Why do you need to unpack them in C code?"*

Performance issues.
Consider an experimental user with a 50 GB database and a reasonably good
platform.
User has been performing 'second-wave IA' jobs (say ML neural network stuff
for concreteness), say using R (but it might be TensorFlow and python).
User now wants to process results in a 'third wave IA' perspective, in
other words perform tasks:
- using logic/functional programming techniques
- using expert system methods modernized from good ol' 'first wave IA'
(mostly symbolic rule-based analysis expressing some domain-specific
knowledge).

User now has to churn big data loads and wants to give Mercury a
well-deserved chance.

This is the context. It is not that experimental either, despite some
appearances. It is even quite a trendy thing in some advanced IA curricula,
with other tools/languages like Haskell. (Check out open-access IA courses
at Zurich EPFZ for example).

User now has to minimize Mercury/C interface overloads in a big data
context. After some tests, he needs to thread his big data loads to and
from R using the C FFI.

Understandably, the data from R (and python is not better) will **not** be
type-safe. R nastily and sometimes unpredictably casts types around, and
User has to neutralize this by 'boxing' data flows into some universal data
type. 'univ' looks like a good candidate ( at least a bit better than the
string way out).
User now has a choice: export some Mercury code into C to process the
univ-typed 50 GB data, or use the runtime RTTI macros to do the job.
Understandably s/he will prefer the latter option even in the face of
possible interface modifications in the future: either the software C
interface code will be fixed to reflect those changes, or the Mercury tools
will be frozen at user level for the time it takes. This is mostly a team
resource management issue, not a language design issue (from a user
viewpoint).

> "*The details of the RTTI system are deliberately not documented at the*
*target language level since they are (and have been) subject to change."*

Yes :-(
Understandably so from a Mercury language developper's viewpoint.
Perhaps a bit less so from a Mercury user's viewpoint, as outlined above.

If you have to pack values in univs like that you would be far better
> implementing test/3 as a Mercury predicate and exporting that to C,
> e.g.
>
> (...)
>
>
Great code chunk. Thanks again Julien.  I would suggest to add some version
of it to Janet's nice crash course.
Unfortunately, I tested it against a realistic database and the incurred
CPU time penalty is a bit stiff over the C RTTI macro alternative (link
below).

> "*I assume that you **are attempting something a bit more general in
practice?"*

>
Sure. As you understood, this was a simplified minimal example for
demonstrative purposes.
The real code is here (it runs, yet I must warn that it still is very
unstable/experimental):
https://www.github.com/fabnicol/RMercury/tree/library/ri.m,
lines 4100 and further down.

I'm following a mixed approach there: using 'MR_unravel_ univ' at the C
level yet performing type analysis in Mercury code. This is OK while you
have few columns. But this might have to be changed for transposed matrices
with millions of columns (and few lines), as in this case Mercury code type
analysis would be called as many million times.

Fabrice
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20210520/9875e09d/attachment-0001.html>

Previous message: [m-users.] Documenting univ data passing to and from the C FFI
Next message: [m-users.] Documenting univ data passing to and from the C FFI
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the users mailing list