[m-rev.] for post-commit review: fix a perf regression for --generate-dependencies

Zoltan Somogyi zoltan.somogyi at runbox.com
Fri Aug 15 18:36:12 AEST 2025



On Fri, 15 Aug 2025 12:08:29 +1000, Peter Wang <novalazy at gmail.com> wrote:
> > For some of these, I could propose better names, but for the global versions
> > of dfs and dfsrev, I can't, because the documentation does not really tell me
> > what they are for :-(. I can  read the code, but that does not tell me *why*
> > it does what it does. Can anyone reading this email answer that question?
> 
> Their documentation seems *fairly* clear: return a list of the nodes in
> the graph, such that "parent" nodes appear before "child" nodes,
> or the other way around. Unlike return_vertices_in_from_to_order or
> return_vertices_in_to_from_order, they don't fail on cyclic graphs.

The two halves of that sentence contradict each other. When dfs returns
a list for a cyclic graph, that list *has* to contain two elements that
are both reachable from the other. That violates the rule about
parent nodes coming before child nodes. A more correct description
would have to say something like "if there is a cycle, the edges  of it
encountered first in a depth-first traversal will obey the "parent before
child" rule; while the edges encountered later may violate it".

Hell, given a from-to edge, the "depth-first sorting" nomenclature does not even
explicitly tell readers whether it is the "from" node or the "to" node that
it considers to be "deeper" :-(

And the documentation for the global dfs/dfsrev does not tell me
the answers to these questions:

- If the graph contains two or more components disconnected from each other,
  the output will consist of the output of dfs invocations on the connected
  components, concatenated together in some order, but in what order?
  (I get the "concatenated" from the code; the documentation would seem
  to also permit "interleaved".)

- How is the first element of the list selected? If the list of cliques that a graph
  can be decomposed into has a top clique that is not a singleton, any member
  of that clique can be considered the "top" node; how does the algorithm
  choose between the candidates?

- The output of dfs/dfsrev is related to the ordered lists of cliques that some
  of the other predicates in the module return, but it (a) imposes an order
  on the elements of each clique, and then (b) erases the boundaries between
  cliques. What are some of the applications that require (a) but don't mind (b)?
  Because if there are none, then there is no point in exporting those operations;
  they could be a private implementation detail of the operations that return
  the lists of cliques.

> profiler/propagate.m uses dfsrev/2 in identify_cycles.

Why not use predicates that tell you about cycles directly,
such as the one that returns cliques? Other than that these
may not have existed early enough.

> I couldn't tell you the difference between cliques/2 and the
> return_sccs_* predicates, other than one returns a set,
> and the others return a list.

I couldn't either. Hooray for vague documentation :-(

Zoltan.





More information about the reviews mailing list