[m-rev.] for post-commit review: fix a perf regression for --generate-dependencies
Zoltan Somogyi
zoltan.somogyi at runbox.com
Fri Aug 15 18:36:12 AEST 2025
On Fri, 15 Aug 2025 12:08:29 +1000, Peter Wang <novalazy at gmail.com> wrote:
> > For some of these, I could propose better names, but for the global versions
> > of dfs and dfsrev, I can't, because the documentation does not really tell me
> > what they are for :-(. I can read the code, but that does not tell me *why*
> > it does what it does. Can anyone reading this email answer that question?
>
> Their documentation seems *fairly* clear: return a list of the nodes in
> the graph, such that "parent" nodes appear before "child" nodes,
> or the other way around. Unlike return_vertices_in_from_to_order or
> return_vertices_in_to_from_order, they don't fail on cyclic graphs.
The two halves of that sentence contradict each other. When dfs returns
a list for a cyclic graph, that list *has* to contain two elements that
are both reachable from the other. That violates the rule about
parent nodes coming before child nodes. A more correct description
would have to say something like "if there is a cycle, the edges of it
encountered first in a depth-first traversal will obey the "parent before
child" rule; while the edges encountered later may violate it".
Hell, given a from-to edge, the "depth-first sorting" nomenclature does not even
explicitly tell readers whether it is the "from" node or the "to" node that
it considers to be "deeper" :-(
And the documentation for the global dfs/dfsrev does not tell me
the answers to these questions:
- If the graph contains two or more components disconnected from each other,
the output will consist of the output of dfs invocations on the connected
components, concatenated together in some order, but in what order?
(I get the "concatenated" from the code; the documentation would seem
to also permit "interleaved".)
- How is the first element of the list selected? If the list of cliques that a graph
can be decomposed into has a top clique that is not a singleton, any member
of that clique can be considered the "top" node; how does the algorithm
choose between the candidates?
- The output of dfs/dfsrev is related to the ordered lists of cliques that some
of the other predicates in the module return, but it (a) imposes an order
on the elements of each clique, and then (b) erases the boundaries between
cliques. What are some of the applications that require (a) but don't mind (b)?
Because if there are none, then there is no point in exporting those operations;
they could be a private implementation detail of the operations that return
the lists of cliques.
> profiler/propagate.m uses dfsrev/2 in identify_cycles.
Why not use predicates that tell you about cycles directly,
such as the one that returns cliques? Other than that these
may not have existed early enough.
> I couldn't tell you the difference between cliques/2 and the
> return_sccs_* predicates, other than one returns a set,
> and the others return a list.
I couldn't either. Hooray for vague documentation :-(
Zoltan.
More information about the reviews
mailing list