[mercury-users] ... Still compiling fact module ... is assoc-list access fast?

Fergus Henderson fjh at cs.mu.OZ.AU
Thu Feb 20 17:46:20 AEDT 2003


On 20-Feb-2003, Douglas Auclair <dauclair at msn.com> wrote:
> I'm working on a system that uses large amounts of data (data sets are 
> usually 1 meg but can be 20 meg stored as line-delimited text).  Since 
> scanning all this data into memory does have a start-up cost (on the order 
> of seconds) and the system I'm designing is stateless (so it will be 
> started up many times -- once for each inquiry), I've thought of compiling 
> the data into code (it's static data sets).
> 
> This works fine for deterministic data sets[0], but for nondeterministic 
> data the compiler dies (it runs out of memory rather quickly).  So I 
> converted the nondeterministic data into semideterministic data (by 
> returning lists instead of creating multiple facts), but it appears that 
> the amount of data is so large that the compiler is still compiling the 
> code after 7 hours.
> 
> Ouch!  Does this group see alternatives I'm missing?

Well, this sort of thing is what the "fact table" support is for.

> I'm next going to convert the large set of facts to an association list 
> (alist) and do look-ups that way (I tried compiling an alist before ... it 
> took a while, as well).  This begs the question:  which is faster at 
> runtime, a semideterminstic fact lookup or an alist search (lookup)?[1]

It depends on the types and values of the input arguments.
The compiler will apply its indexing techniques to a semidet fact lookup.
In general a semidet fact lookup should at least as fast as an
assoc_list search, but in some cases it can be considerably faster.
If, for example, the clauses are switching on a string, the compiler
will build a hash table at compile time and each lookup will be a hash
table search.  If the clauses are switching on a enumeration or a dense
integer range, the compiler may use a jump table or even just an array
of results.  In such cases, the indexed semidet fact lookup is O(1)
whereas an assoc_list search would be O(n) in the number of facts.

> I must say I'm a bit surprised at the prolonged compile time.  There are 
> only ~6500 semideterministic facts for one predicate declaration (~6500 
> nondeterministic facts crashed the compiler) that have an int in parameter 
> and a list out parameter (the list is a list of a type that is composed of 
> a float, a string, and a float).  The total size of the source code of that 
> module is "only" ~850K.  Am I doing something terribly wrong?

Some parts of the compiler take time and/or space O(something nasty) in
the number of sub-terms per clause (or in some cases, per procedure).
So if you have a single very big clause, or a procedure with a very large
number of clauses, this can cause trouble.

Did you try using the fact_tables support for this?
Since the data values in your clauses are all ints, floats, and strings
(the list can be eliminated by going back to a nondet procedure),
you should be able to use a fact table for this.

	:- pred foo(int::in, float::out, string::out, float::out) is nondet.
	:- pragma fact_table(foo/4, "foo.data").

> On the other hand, the compiler is very fast with a fact_table:  I like how 
> the fact_table pragma can very quickly make 10 megs of facts accessible to 
> the system.  What is the level of difficulty in extending it for other 
> types, like lists and tuples?

Well, one of the reasons why the compiler is very fast with fact tables
is that it skips the normal type and mode checking phases, and instead
uses a special type-checking phase that is hard-coded to just handle
ints, chars, strings, and floats.

It would not be too hard to extend the fact table type-checking to
handle a few specific extra types, such as lists and tuples. 
But extending it to handle all types would be more difficult --
that would require changing to to run the normal type checker.
Running the normal type checking on large fact tables would probably
perform OK, I think, since the type checker checks each clause individually
(and thus should only blow up for very large clauses, not for very
procedures that have a very large number of small clauses).
In fact that was how I recommended that the fact table stuff be implemented
originally.  However, the original programmer decided to do it differently.

> Is there something in the compiler that can be improved to speed up the
> compile times for very large lists, alists, and  facts?

Undoubtably yes, but it is not an easy task.

-- 
Fergus Henderson <fjh at cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list