[mercury-users] Converting large tables from prolog

Fergus Henderson fjh at cs.mu.OZ.AU
Fri Nov 26 04:39:44 AEDT 1999


On 25-Nov-1999, Richard Tucker <rit at cam.sri.com> wrote:
> 
> I have a prolog program involving a large table of around 50000
> facts of the form: foo(string, list(pair(tag, float))). In prolog
> this is no problem: I just consult a file with them in, and if
> that takes too long I can save an image, which is then quite quick
> to load.

One possible solution, in addition to the ones already mentioned
by other posters, is to code this table in C, and use Mercury's
C interface.

Another alternative is that it may also be possible to reduce
compilation time and memory usage to acceptable values by splitting the
predicate into smaller sub-predicates, e.g.

	foo(S, L) :-
		string__first_char(S, C),
		( C = 'a', foo_a(S, L)
		; C = 'b', foo_b(S, L)
		...
		; C = 'z', foo_z(S, L)
		).
	
	foo_a("abba", ...).
	foo_a("ac/dc", ...).
	...
	foo_b("bananarama", ...).
	foo_b("bob's your uncle", ...).

If you have a lot of data, the simplest way to do that kind of split
may be to write a small program to read in the data file and
generate code in that form.

> I wanted to try converting this to Mercury. Once the table is built,
> I don't need to add or remove entries, and I only ever want to use
> the mode foo(in, out), so it seems a map would be the obvious choice.
> I tried having the program build a map at by reading in all the facts
> from a file, but this is very slow -- it takes around 25 seconds on
> an ultrasparc, and it has to do it every time it starts up.

It would be interesting to examine that in the profiler to see
where the time is being spent.

How exactly are you constructing the map?  Are you inserting each
element at a time?  That would be O(N * log(N)).  It could be more
efficient to first store all the elements in a list, which is O(N), and
then use map__from_sorted_list to convert the already-sorted list
into map, which can also be done in O(N), although currently it
isn't.  Converting a sorted list into a map would be O(N) if
map__from_sorted_list were implemented with an O(N) algorithm,
like binary_tree__from_sorted_assoc_list is, but the current naive
implementation is O(N * log(N)).  It would be great if someone were
to fix this and send us the patch.

But I suspect it is probably more likely that the overhead
may be related to the I/O code and the RTTI and dynamic typing
inside io__read, rather than to actually inserting stuff
in the map.

> So, what's the recommended thing to do?

In the short term, I suggest using the fact table support,
limited though it may be, or using the C interface,
or splitting it into smaller predicates as describe above.

In the medium term, I think the recommended approach will
be somewhat similar to what you do with Prolog:

- at build time, read the terms in and write them out in
  binary representation;

- at run time, read in the binary representation.

However, for this to work efficiently requires a few things:

- Someone needs to implement more efficient versions of
  io__write_binary and io__read_binary (the current versions
  actually just call io__write and io__read).
  Tom Conway and I did a fair bit of work on this at one point,
  but it never quite got finished.

- We need to modify the compiler so that it specializes
  the RTTI operations used by io__read and io__write if the type is
  known.  This will allow io__read_binary to be polymorphic and yet
  still work efficiently when you call it with a particular type.
  Specialization of the RTTI operations would be fairly easy
  and is desirable for other purposes too.

- As noted above, map__from_sorted_list needs to be implemented using
  an O(N) algorithm.

In the long term, a nicer solution would be for someone to implement a
version of the fact table support that did not have its current
limitations, and/or to improve the efficiency of the compiler for these
cases.  (These two may essentially amount to the same thing, because it
would then probably not be too hard for the compiler to detect when a
predicate contained only facts and contained a large number of them and
to invoke the fact table compilation algorithm automatically rather
than requiring an explicit `pragma fact_table' declaration as we
currently do.)

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>  |  of excellence is a lethal habit"
PGP: finger fjh at 128.250.37.3        |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list