[m-users.] Announcement (aggregates module) + questions (window functions)

Richard O'Keefe raoknz at gmail.com
Sat Dec 31 14:57:20 AEDT 2022


I'd like to second Zoltan's point.
I've solved that particular Rosetta Code problem in two different
programming languages, using an imperative style with hash tables,
and modulo the usual hand-waving about hash-tables being "O(1)"
-- which is fair enough in this case -- the code is obviously
linear in the size of the two CSV files the program has to read.

I'm reminded of my experience with XPath, XQuery, and XSLT, that
sometimes it is simpler and more readable NOT to use a specialised
query language but to use plain high level code in a language with
decent facilities.  (Such as Mercury.)  Not coincidentally, XPath,
XQuery, and XSLT hide practically everything you need to understand
performance.


On Sat, 31 Dec 2022 at 08:35, Zoltan Somogyi <zoltan.somogyi at runbox.com>
wrote:

>
> 2022-12-30 00:27 GMT+11:00 "Mark Clements" <mark.clements at ki.se>:
> > %% patient and visit are nondet predicates
> > main(!IO) :-
> >     print_line("{Id, RowNumber, Date, Score, CumScore}", !IO),
> >     aggregate((pred({Id,RowNumber,Datei,Scorei,CumScorei}::out) is
> nondet :-
> >                    patient(Id,_),
> >                    Combined = (pred(Date::out,Score::out) is nondet :-
> visit(Id,Date,Score)),
> >                    Combined(Datei,Scorei),
> >                    bag_cum_sum(Combined)(Datei,CumScorei),
> >                    Dates = (pred(Date::out) is nondet :-
> Combined(Date,_)),
> >                    bag_row_number(Dates)(Datei,RowNumber)),
> >               print_line,
> >               !IO).
> >
> > Third, I have sought to stay with nondet predicates, with the
> implementation internally using lists -- is there a better approach?
>
> I think that getting your data from nondet predicates is fundamentally a
> bad idea.
> The reason is simple: it bakes the data into the program. If you want to
> run
> the same task on a different data set, you have to modify the program and
> recompile it.
> This is much less convenient than a program that you can run on a
> different data set
> simply by invoking it with different file names.
>
> The attached code is my solution to the same task on rosettacode.org.
> You will note that its operative part is longer than your code above,
> but it is also much simpler, and therefore easier to read and to
> understand. It is also easier to reason about its performance.
> For example,
>
> -  the main operation loops over all visits,
> - the non-constant-time part of each iteration consists of lookup and
> update operations
>   on its main data structure, VisitDataMap,
> - VisitDataMap is a map, and is therefore implemented using balanced trees.
>
> This makes it clear that its complexity is O(N log N), where N is the
> number of visits.
> (Technically, it is O(N log M), where M is the number of unique patients,
> but the
> difference is negligible.) By contrast, I cannot tell anything about the
> performance
> of the aggregate-using code above, because all the relevant details are
> hidden
> behind abstraction boundaries, whose documentation is silent about
> performance.
> Note that in the SQL programs from which you draw your inspiration,
> selecting
> the right set of indexes for each relation is usually an important design
> concern.
>
> Zoltan._______________________________________________
> users mailing list
> users at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mercurylang.org/archives/users/attachments/20221231/970ca04b/attachment.html>


More information about the users mailing list