[mercury-users] How to make huge db in Mercury

Ralph Becket rafe at cs.mu.OZ.AU
Tue May 7 18:28:23 AEST 2002


Fergus Henderson, Tuesday,  7 May 2002:
> On 27-Apr-2002, Ondrej Bojar <oboj7042 at ss1000.ms.mff.cuni.cz> wrote:
> > As far as I know, Mercury has to actually compile the database in your
> > program. (And there is a special technique of so-called "fact tables" just
> > for this purpose.) Naturally, you can compile in any number of database
> > files you wish, but you cannot load them later, when the program is
> > already compiled and linked.
> > 
> > If you need to load (and forget) data on runtime, you can use predicates
> > term_io__read_term/3 and term__try_term_to_type/2, to load the database in
> > a data structure (such as a list of facts).
> 
> Or just io__read/3.
> 
> > Later you can search this data
> > structure with simple predicates such as member/2 or anything else you
> > write.
> 
> Yes.  In many cases it would be best to use one of the data structures from
> the Mercury standard library, such as a set, map, bimap, or hash_table.

This brings up an important point: Mercury (and, indeed, many, if not
all, statically typed declarative programming language compilers) has a
real problem with large static data structures.

This bit me a few years back when I tried to compile something like the
following:

	:- pragma memo(foo/0).

	:- func foo = array(int).

	foo = array([... 1000s of int constants ...]).

It turns out that the problem arises because the Mercury compiler
first converts this to SHNF:

	foo = V1 :-
		V1 = array(V2),
		V2 = [V3 | V4],
		V3 = <int constant #1>,
		V4 = [V5 | V6],
		V5 = <int constant #2>,
		...

and this takes a heavy toll on the analyses performed by the compiler,
not all of which are linear in the number of variables in a clause.

Trying an alternative representation, such as

	:- func foo(int) = int.

	foo(0) = <int constant #1>.
	foo(1) = <int constant #2>.
	foo(2) = <int constant #3>.
	...

will compile very quickly, but even using fact tables will not (I think)
lead to something as efficient as the original array representation.

The right(ish) thing to do is, indeed, use io__read to slurp in the data
from an external file (unfortunate because your executable is now two
files) or (and this may leave a strange taste in the mouth) use
io__read_from_string on a string constant.

The genuinely correct thing to do is use io__read_binary, but at the
moment binary value IO is implemented via the ordinary readable value
IO, and hence still incurrs a larger checking penalty.  One day this
will be fixed, but it hasn't been a serious problem.

- Ralph
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list