[m-users.] Announcement (aggregates module) + questions (window functions)
Mark Clements
mark.clements at ki.se
Sun Mar 5 02:26:31 AEDT 2023
Zoltan and Richard,
I drafted a response a few months ago that I never sent: given that my main field is not computer science, I deferred to you.
As a follow-up: when should one use non-determinism and when should one use Mercury? To motivate my questions: I came to Mercury because I was looking for a typed "Prolog" that allowed for elegant joins between relationships (essentially to replace SQL:). Zoltan's implementation for the CSV example is canonical and efficient code - but it is also comparatively long and uses several data structures that could be replaced by less efficient non-determinism. When is the elegance of a non-deterministic solution acceptable given its inefficiency?
Sincerely, Mark.
________________________________
From: Richard O'Keefe <raoknz at gmail.com>
Sent: 31 December 2022 04:57
To: zoltan.somogyi at runbox.com <zoltan.somogyi at runbox.com>
Cc: Mark Clements <mark.clements at ki.se>; users <users at lists.mercurylang.org>
Subject: Re: [m-users.] Announcement (aggregates module) + questions (window functions)
You don't often get email from raoknz at gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
I'd like to second Zoltan's point.
I've solved that particular Rosetta Code problem in two different
programming languages, using an imperative style with hash tables,
and modulo the usual hand-waving about hash-tables being "O(1)"
-- which is fair enough in this case -- the code is obviously
linear in the size of the two CSV files the program has to read.
I'm reminded of my experience with XPath, XQuery, and XSLT, that
sometimes it is simpler and more readable NOT to use a specialised
query language but to use plain high level code in a language with
decent facilities. (Such as Mercury.) Not coincidentally, XPath,
XQuery, and XSLT hide practically everything you need to understand
performance.
On Sat, 31 Dec 2022 at 08:35, Zoltan Somogyi <zoltan.somogyi at runbox.com<mailto:zoltan.somogyi at runbox.com>> wrote:
2022-12-30 00:27 GMT+11:00 "Mark Clements" <mark.clements at ki.se<mailto:mark.clements at ki.se>>:
> %% patient and visit are nondet predicates
> main(!IO) :-
> print_line("{Id, RowNumber, Date, Score, CumScore}", !IO),
> aggregate((pred({Id,RowNumber,Datei,Scorei,CumScorei}::out) is nondet :-
> patient(Id,_),
> Combined = (pred(Date::out,Score::out) is nondet :- visit(Id,Date,Score)),
> Combined(Datei,Scorei),
> bag_cum_sum(Combined)(Datei,CumScorei),
> Dates = (pred(Date::out) is nondet :- Combined(Date,_)),
> bag_row_number(Dates)(Datei,RowNumber)),
> print_line,
> !IO).
>
> Third, I have sought to stay with nondet predicates, with the implementation internally using lists -- is there a better approach?
I think that getting your data from nondet predicates is fundamentally a bad idea.
The reason is simple: it bakes the data into the program. If you want to run
the same task on a different data set, you have to modify the program and recompile it.
This is much less convenient than a program that you can run on a different data set
simply by invoking it with different file names.
The attached code is my solution to the same task on rosettacode.org<https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Frosettacode.org%2F&data=05%7C01%7Cmark.clements%40ki.se%7C585dbab45ec34f7f25ef08daeae32777%7Cbff7eef1cf4b4f32be3da1dda043c05d%7C0%7C0%7C638080559250227740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=tHDpCagECzIIv%2FsnHGSobrwGJS4wgMOfxMwkUswQ%2BBo%3D&reserved=0>.
You will note that its operative part is longer than your code above,
but it is also much simpler, and therefore easier to read and to
understand. It is also easier to reason about its performance.
For example,
- the main operation loops over all visits,
- the non-constant-time part of each iteration consists of lookup and update operations
on its main data structure, VisitDataMap,
- VisitDataMap is a map, and is therefore implemented using balanced trees.
This makes it clear that its complexity is O(N log N), where N is the number of visits.
(Technically, it is O(N log M), where M is the number of unique patients, but the
difference is negligible.) By contrast, I cannot tell anything about the performance
of the aggregate-using code above, because all the relevant details are hidden
behind abstraction boundaries, whose documentation is silent about performance.
Note that in the SQL programs from which you draw your inspiration, selecting
the right set of indexes for each relation is usually an important design concern.
Zoltan._______________________________________________
users mailing list
users at lists.mercurylang.org<mailto:users at lists.mercurylang.org>
https://lists.mercurylang.org/listinfo/users<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mercurylang.org%2Flistinfo%2Fusers&data=05%7C01%7Cmark.clements%40ki.se%7C585dbab45ec34f7f25ef08daeae32777%7Cbff7eef1cf4b4f32be3da1dda043c05d%7C0%7C0%7C638080559250227740%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=ifg5Y1A2bX92Y5sjcVVwesIHjFXY42k0fKIydxoUXZs%3D&reserved=0>
N?r du skickar e-post till Karolinska Institutet (KI) inneb?r detta att KI kommer att behandla dina personuppgifter. H?r finns information om hur KI behandlar personuppgifter<https://ki.se/medarbetare/integritetsskyddspolicy>.
Sending email to Karolinska Institutet (KI) will result in KI processing your personal data. You can read more about KI's processing of personal data here<https://ki.se/en/staff/data-protection-policy>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20230304/acfd8107/attachment-0001.html>
More information about the users
mailing list