[m-users.] Log parsing benchmark

Julian Fondren jfondren at hostgator.com
Thu Jun 20 18:17:57 AEST 2019


Howdy,

I've added Mercury to a benchmark at
https://github.com/jrfondren/topsender-bench , the task: run a regex over a
400 MB logfile, count email senders (as matched by the regex) and report
the top 5 senders with their counts. I don't have a good sanitized logfile
to use so this test isn't reproducible at the moment unless you have your
own exim logs, sorry.

The Mercury code I've written is using PCRE for the actual regex matching
(by linking pcre_d_shim.o, the same code I wrote to speed up the D
candidates for this benchmark, earlier), the io module to read lines from
the log, and the hash_table module to count senders.

On this benchmark, Mercury's about 5x slower than the reference C version,
or only 1.4x as slow as the Python3 version. Mercury's memory usage is
pretty disappointing; I experimented with adding some manual calls to the
GC on every Nth match, and that could shave off dozens of MB, but at a
significant runtime cost.

The need for `with_type` in the following code really surprised me, but...
as of writing this email, I finally get it: ordering() is so generic that
it could be asked to compare two curried predicate snd()s, rather than two
ints that the func snd() returns.

  :- func cmp(pair(string, int), pair(string, int)) = comparison_result.
  cmp(A, B) = ordering(snd(B) `with_type` int, snd(A) `with_type` int).

This is all with asm_fast.gc. I get a segfault when I compile with hlc.gc.
ltrace (library trace, not strace) shows that the SIGSEGV happens at

mercury__hash_table_search_3_p_0(0x608ce0, 0x608b40, 0x440000002d,
0x7f638d2c1fe0 <no return ....>
--- SIGSEGV (Segmentation fault)

*** Mercury runtime: caught segmentation violation ***
cause: address not mapped to object
address involved: 0x4400000045

which is the very first call to hash_table.search in the program. I'd like
to troubleshoot this further, but I don't have any debugging hlc grades.
Right now I only have the default libgrades installed, but I still have the
compilation environment from the install hanging around. Is there an easy
way to go back there and say "compile and install this one additional
libgrade" without rebuilding everything?


If I were to iterate further on the Mercury candidate, I'd start with a
more optimal file handling, say with a "Applies the given closure to each
regex match against lines from the input file" routine, that could delay
allocating memory for strings until the regex succeeds.

Cheers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20190620/e3b0f554/attachment.html>


More information about the users mailing list