[m-users.] explaining index use

Daniels, Marcus G mdaniels at lanl.gov
Sat Jul 30 02:00:42 AEST 2016


Tomas & Dirk, thanks.   I implemented what Tomas suggested and the load is about 5 minutes, even at 160 million records.  The indexing is more painful, though at about 30 minutes.  I tried to io.read_binary and io.write_binary for both the data and the indices but the files end up to be about 50 GB and it takes hours to load that and seems to have quite a bit of overhead.   Really what would be nice in this situation would be some sort of an attach to a fixed  virtual memory address via a mmap'ed region.   Like the effect of a fact table in a module in a shared library but without the expensive setup costs.  I haven't used logic databases before, but I imagine the latency per lookup goes up by quite a bit compared to an in-memory 2-3-4 tree?

-----Original Message-----
From: Dirk Ziegemeyer [mailto:dirk at ziegemeyer.de] 
Sent: Friday, July 22, 2016 2:42 AM
To: Daniels, Marcus G <mdaniels at lanl.gov>
Cc: Tomas By <tomas at basun.net>; users at lists.mercurylang.org
Subject: Re: [m-users.] explaining index use

Hi Marcus,

>> Tomas wrote:
>> 
>> "As nobody else has replied, I offer my suggestion: forget about fact tables and write your own indexing code."
>> 
> Ok, thanks for the reality check.  I thought I must be doing something wrong with the fact table feature.   Also, besides the C compiler time and memory cost (which I could absorb), having many-way indexing of the fact table (many modes) seems to get slower in a worse than linear way.    It doesn't even get to producing the C code for a many hours.

My experience with large fact tables is that they should be read at runtime with io.read instead of compiling them into the application. Thomas provided an example in this thread:
http://lists.mercurylang.org/archives/users/2015-February/007861.html

If you still want to read the fact table at compile time, Mercury offers the feature ":- pragma fact_table", which speeds up compilation time significantly but has other drawbacks and as far as I remember is planned to be marked as deprecated.

Dirk


More information about the users mailing list