[m-dev.] detail profiling

Zoltan Somogyi zs at cs.mu.OZ.AU
Fri Jun 11 13:21:32 AEST 1999


I am in the process of implementing a detail profiler, which is analogous
to basic block profilers for imperative languages. It inserts counters into
the generated code in several places: at entries to procedures, at entries
to switch arms, disjunction arms, then parts and else parts, and at the exits
from primitive (non-compound) goals whose determinism is not det, cc_multi,
erroneous or failure. (For goals with those determinisms, the number of times
the program point after the goal is reached is uniquely determined by the
determinism and by the number of times the program point before the goal
is reached; this is not true for other determinisms.)

I selected those sites, because this is a set of sites that is close to
minimal, yet allows a reasonably simple tool to compute the number of times
each program point is reached.

You can specify the --detail-profiling flag only for a selected set of modules,
or for every module in a program. When the generated executable exits,
it will write the values of all the counters to a file. A tool I am writing
will take the counter values from every module compiling with detail profiling
and put them together with a HLDS dump of those modules that was created during
the compilation process (I put a timestamp into both files, so the tool can
check for potential mismatches).

Such a tool can be used for two sorts of purposes: test coverage and
performance engineering.

For test coverage analysis, the tool should report the name of any procedure
that contains either

- any program point that is not executed that is not after
  a goal whose determinism says it cannot succeed

- any program point that is not executed that is not after or just before
  a goal whose determinism says it cannot succeed

depending on invocation options. The reason for the second option is to
filter out program points just before a call to error. If the call to
error is there because determinism analysis is not smart enough to detect
that a piece of semidet code should in fact always succeed, the programmer
may not want it reported as untested code that should be tested.

For performance engineering, the tool should order the procedure in profiled
modules, ordered on

- the number of times the procedure is called

- the number of times the most frequently reached program point
  in the procedure was executed

- the bisection width of the procedure, which is computed as follows:

  - for a sequence of conjoined goals, the bisection width is the
    maximum number of times any program point in the sequence is reached

  - for a set of disjoined goals, the bisection width is the
    sum of the bisection widths of the alternatives

again depending on invocation options. Two other possibilities, which would
be a bit more difficult to arrange, are ordering on

- the number or total size of memory cells allocated in the procedure

- the number or total size of memory cells allocated at most expensive
  primitive goal in the procedure

It would also be possibly to envisage equivalents to these two that use
estimated execution time instead of allocated memory. Estimations
can be based on such things as the number of memory accesses required
to fill in a cell, the number of call parameters that must be moved into
position etc, but I think such estimations would be quite misleading
if they ignore cache behavior, as they will have to do. Since the estimation
code would also be complex and require tedious validation (e.g. by
curve-fitting on many runs), I don't propose to do it.

The tool should of course display the code of the reported procedures as well,
together with execution counts. The code will be HLDS code, not source, since
some locations of interest do not correspond to any source location (e.g.
entries to switches introduced as a result of factoring e.g. a four-way
disjunction into two two-way switches). There ought to be a way to have
execution counts displayed the same set of locations as the counters occupy,
or those locations plus after branched goals (doing the sums for the user),
or all locations, depending on invocation options.

I am seeking feedback on two points.

- Are the alternative outputs I propose sufficient? Are there others I left
  out that would be sufficiently useful to justify their implementation cost?

- What form should the output take? One possibility is to just list names
  and then the definitions of the top N procedures (or all procedures with
  non-executed parts for coverage analysis) in plan ASCII. Another is to
  generate a list of procedure names in HTML, with each name being a link
  to the corresponding definition in a HTML file of definitions. However,
  other, better possibilities probably exist.

Zoltan.
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list