[m-users.] Announcement (aggregates module) + questions (window functions)

Zoltan Somogyi zoltan.somogyi at runbox.com
Sat Dec 31 06:35:14 AEDT 2022


2022-12-30 00:27 GMT+11:00 "Mark Clements" <mark.clements at ki.se>:
> %% patient and visit are nondet predicates
> main(!IO) :-
>     print_line("{Id, RowNumber, Date, Score, CumScore}", !IO),
>     aggregate((pred({Id,RowNumber,Datei,Scorei,CumScorei}::out) is nondet :-
>                    patient(Id,_),
>                    Combined = (pred(Date::out,Score::out) is nondet :- visit(Id,Date,Score)),
>                    Combined(Datei,Scorei),
>                    bag_cum_sum(Combined)(Datei,CumScorei),
>                    Dates = (pred(Date::out) is nondet :- Combined(Date,_)),
>                    bag_row_number(Dates)(Datei,RowNumber)),
>               print_line,
>               !IO).
> 
> Third, I have sought to stay with nondet predicates, with the implementation internally using lists -- is there a better approach?

I think that getting your data from nondet predicates is fundamentally a bad idea.
The reason is simple: it bakes the data into the program. If you want to run
the same task on a different data set, you have to modify the program and recompile it.
This is much less convenient than a program that you can run on a different data set
simply by invoking it with different file names.

The attached code is my solution to the same task on rosettacode.org.
You will note that its operative part is longer than your code above,
but it is also much simpler, and therefore easier to read and to
understand. It is also easier to reason about its performance.
For example,

-  the main operation loops over all visits,
- the non-constant-time part of each iteration consists of lookup and update operations
  on its main data structure, VisitDataMap,
- VisitDataMap is a map, and is therefore implemented using balanced trees.

This makes it clear that its complexity is O(N log N), where N is the number of visits.
(Technically, it is O(N log M), where M is the number of unique patients, but the
difference is negligible.) By contrast, I cannot tell anything about the performance
of the aggregate-using code above, because all the relevant details are hidden
behind abstraction boundaries, whose documentation is silent about performance.
Note that in the SQL programs from which you draw your inspiration, selecting
the right set of indexes for each relation is usually an important design concern.

Zoltan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patients.csv
Type: text/csv
Size: 59 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/users/attachments/20221231/74d83a11/attachment.csv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: visits.csv
Type: text/csv
Size: 144 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/users/attachments/20221231/74d83a11/attachment-0001.csv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rosetta_zs.m
Type: application/octet-stream
Size: 8760 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/users/attachments/20221231/74d83a11/attachment.obj>


More information about the users mailing list