[m-dev.] proposed command set for the internal trace based debugger

Zoltan Somogyi zs at cs.mu.OZ.AU
Tue Jun 2 15:28:13 AEST 1998


On Mon, 1 Jun 1998, Zoltan Somogyi wrote:
> The user may associate a break point with a specific goal path within
> a predicate. Break points have the same states and the same effects as spy
> points. XXX should an empty goal path cause a break on all of the
> CALL, EXIT and FAIL ports (the interface ports), or only on the CALL port?

Peter Schachte wrote:
> I'm not sure I understand the difference between break and spy
> points.  Is a break point just a spy point on a particular (lexical)
> atom?  Or by "goal path" do you mean something like "the second call
> to foo/4 in the fourth clause of bar/2 when called from the first call
> to baz/5 in the first clause of zip/2 if it was called from the third
> call to zap/2 in the eighth clause for zop/2...."

Sorry, I was assuming too much background knowledge. A goal path is the
position of a subgoal within the body of a procedure, e.g. take the
second disjunct of the top-level disjunction, (that goal being a conjunction)
take the third conjunct, (that goal being a switch) take the fourth
switch arm. If you view the body of the procedure as a tree of subgoals,
a goal path is a path in this tree down from the root. It denotes a subgoal,
which may or may not be atomic.

Note that the tree is the form of the procedure body at code generation,
which may differ substantially from the source form, due to mode reordering,
switch detection, superhomogeneous form expansion, etc. Therefore it is
unlikely that most users can specify goal paths in advance, though they
should be able to recognize what a goal path corresponds to once they have
arrived at its port.

A spy point is on a predicate; it affects all events within the predicate.
A break point is on the event at a given goal path within the predicate.
Not all goal paths have events, just those that are necessary to reconstruct
which path execution takes through the procedure (entry into switch arms,
disjuncts, then parts and else parts).

> It would be very useful to be able to (separately) turn on and off stopping
> and printing based on port. 

If by "based on port" you mean events at a particular goal path, then
that's what break points are for. Lumping e.g. entry into all switch arms
does not seem useful.

> It would also be good to have a way to turn off printing (and maybe to turn
> it on, as well) for all subgoals of a certain goal, or subgoals of a certain
> predicate.  Ideally, there would be a stateless (just zip through this goal
> now without stopping or printing anything:  skip in Prolog debuggers) and a
> stateful (until I say otherwise, always zip through calls to this predicate
> (or this lexical call to this predicate)) without stopping or printing
> anything.

If you are at predicate p, and you want to skip to the end of p, you
can do a "finish" with printlevel none. If what you mean is that you
are in predicate q, and you want to say "pay attention to break points
unless they are within a call to p", then you are out of luck, both
with the proposed debugger and with all other debuggers I know of,
although in debuggers in which you can attach code to execute to
trace points, you can laboriously synthesize this effect.

> > 	An empty command line is interpreted as "skip 1".
> > 	XXX or maybe an empty command should do nothing?
> 
> How about allowing users to alias "" to do whatever they want?

That's a good idea, especially since Tyson will implement it, not me :-)

> > redo
> 
> Since this command, as far as I know, only exists until now in Prolog
> debuggers, and Prolog debuggers call this "retry," and use "redo" to
> mean something completely different, how about calling this "retry?"
> You'll cause less confusion.

Ok.

> Wouldn't it be possible to have a debug version of the library that
> makes file opening and closing commands "trail" their effects so they
> can be redone?  Ideally it would be possible for writers of libraries
> containing destructive modes to also write mostly destructive versions and
> have the debugger optionally use those instead in a systematic way, so they
> could be retried.  Note that when code is being "skipped" over (in Prolog
> terminology), you can just use the destructive version, for full speed and
> lower memory usage.

In fact we have a better scheme in mind (based on *all* I/O actions
generating a log record, and being able to roll the log forward/backward
at will), but that requires a significant amount of extra work. For the
time being, we won't change the I/O library. (You could also apply the
logging technique to other libraries, but automating the technique
does not look feasible.)

> One other command that would be useful (if it's implementable) would
> be to replace the goal at the current port with one the user types in.
> The user's goal would be able to include the variables appearing in
> the goal at the current port.  This would be useful when you've found
> a bug, and you'd like to keep going and find the next, so you just
> want to supply the correct result for a goal and keep going.

Being able to compile code (a query, a condition on a break point)
at debug time is also something that we will address later, but not now.
However, *replacing* part of the code of the running program is not something
that I envisage the trace based debugger will *ever* do, because it requires
working at the object code level, which is something we have avoided
assidiously since the start of the project. A bytecode based debugger
could do it, but even then I would be queasy about doing it myself;
too much chance of introducing another bug in the "fix".

> > stack
> > 	Prints the names of the ancestors of the call specified by the
> > 	current event. If two or more ancestor calls are for the same
> > 	predicate, the predicate name will be printed once with the
> > 	appropriate multiplicity annotation.
> 
> What about for mutually recursive predicates?  For example, the case where
> the stack has 4 calls to foo/3, then a few calls to other predicates, then 8
> more calls to foo/3?  I presume this just shows up as 4 and then 8 calls to
> foo/3, not as 12 calls? 

Basically all the debugger is doing is run-length encoding, in which the
repeated element must be a single call, and cannot be a sequence of two or
more calls. If the call stack is "p, q, q, p, p, p, q, p, q, p, q, main",
the printout will be

	p
	q * 2
	p * 3
	q
	p
	q
	p
	q
	main

If you allow the repeated element to be a sequence of calls, you can get
shorter stack dumps, but two problems arise: the identity of the "right"
output becomes unclear, and the implementation becomes much harder (partly
as a result of the unclear spec, and partly because it requires a large
buffer of calls to match against). To see the ambiguity problem,
consider the alternatives for the above trace:

	p
	q * 2
	p * 2
	(p, q) * 3
	main

or

	p
	q * 2
	p * 3
	(q, p) * 2
	q
	main

The first is arguably better, but in general it cannot be produced without
a global analysis of the entire call stack, probably by an algorithm whose
complexity is more than linear. Given that call stacks can be several thousand
deep (I have personally seen a stack 4500 calls deep while debugging the
debugger :-(), this is unacceptable. The second can be produced by a greedy
algorithm, but it helps if there is a limit on the lengths of the sequences
you are willing to look for.

In any case, I don't think you can do anything very helpful when you have
an SCC with two or more procedures in which at each level you have an equal
or near-equal chance of calling any of the procedures in the SCC; in that
case, the printout will look chaotic no matter how you try to compress it.
This will be the case with tree234, whose SCCs (insert2/insert3, set2/set3 etc)
do have this near-random behavior, as do the SCCs in the code generator.
I can only think of one SCC with size > 1 with potentially deep calls whose
stack fragments have a regular pattern, and that one may not be around much
longer, since the recursive calls are tail recursive module constructor
application (this is in the parser or the lexer, I forget which).

> It would be useful to have this print a number for each stack frame, and to
> have a command like up and down (maybe "show" or "goto"?) that takes one of
> these numbers and makes that the current frame directly. 

That's a good idea. I suppose for a line such as p * 3, we could print
the numbers 4-6. Or just the number 4, with the number 7 for the next line.

> > spy <module name> <predicate name>
> 
> I think it would be better to use the standard module:name/arity
> syntax to spy predicates.  As a convenience to users, it'd be good to
> allow everything but name to be omitted, treating the omitted items as
> wildcards, and to then spy every matching predicate.

By "everything but name", I presume you mean everything but the predicate
or function name.

Allowing parts of the procedure identification to be omitted is a good
idea, although it requires linking all the proc layout structures together
into one searchable structure. (This is SMOP for the debug grade; for the
non-debug grades, i.e. when only part of the program has debugging information,
it may be trickier.)

However, instead of spying on all matches, I think it is better to ask
the user which one(s) he/she means.

> Also it's nice to be able to spy multiple predicates in one command.

I don't think so. At most we should treat a ";" as a newline and thus
allow more than one command on a line. That would be more generally useful, 
but since this is readline()'s territory, it sounds like Tyson's job :-)

> I think one wants the following abilities regarding spy/break points:
> 
> 	1.  Stop at some or all ports of calls to a particular
> 	    predicate.

If you want all, use a spy point; it you want some, use one or more break
points.

> 	2.  Stop at some or all ports of a particular (lexical) call
> 	    to a particular predicate.  By this I mean something like
> 	    "the third call to foo/2 in the fourth clause of bar/3."

We do not propose being able to do this for the time being, although
it would be nice.

> 	3.  Print the goal but don't stop, and maybe stop but don't
> 	    print the goal (just some indication of where you are) in
> 	    any of those contexts.

"Print the goal but don't stop": this is what a spy/break point with state
"print" does. "stop but don't print the goal": the proposed debugger *never*
prints the goal. Before it stops, it always prints a line like this:

8:      5  4 DISJ NON   queens:qdelete/3-0 c2;d1;

explanation:

^ event number (sequential counter)
        ^ call number (sequential counter)
	   ^ call depth
	     ^ port type: entering a disjunct
	          ^ predicate determinism is nondet
		        ^module:pred/arity-mode
			                   ^goal path:
					   disjunct #1 in conjunct #2

If you want the values of the variables, e.g. the input arguments,
you can issue the print command. At the moment that will give you:

Headvar_1	value of Headvar_1
Headvar_2	value of Headvar_2

Probably a command that will print the head as an atom, the way Prolog
systems do it, would also be useful. Should it be restricted to the
interface ports? At internal ports, it does not give the full story,
since at internal ports you may have variables that are not arguments.
(This is also true for the exit port, but at that port you have much more
of a reason to restrict yourself to just wanting to know about the
arguments.)

> 	4.  Ideally, some of the capabilities pretty much standard in
> 	    C debuggers like "only stop/print if this condition holds,"
> 	    and "execute this command whenever this condition holds,
> 	    then print/don't print and stop/don't stop."

We will consider these later, but not now. BTW, it took about a decade
for this capability to appear in the C debuggers I had access to.

> > 	named module. (XXX Should it be on all the interface ports, i.e.
> > 	CALL, EXIT and FAIL?)
> 
> You seem to have omitted the REDO port.  That one's pretty important.

We did not omit it by oversight. The Mercury execution model does not
permit a redo port without requiring the compiler to perform a pessimization.
This pessimization is:

	When compiling a nondet predicate, just before the code that
	performs a successful exit, insert code that creates a new nondet
	stack frame on top of the nondet stack. The redoip slot of this
	frame should point to a new code label in the runtime, say
	"simulate_redo", which would perform a failure. This label
	would therefore have the exact same effect as the current label
	do_fail, but the debugger would know that any stack frame
	that has this redoip represents a redo port, not an ordinary
	disjunct (which may have a do_fail redoip).

It also requires undoing the optimization that removes nondet stack frames
on last success (roughly parallel to "trust" in the WAM), although this
is not a big deal.

The pessimization is not yet workable, since it guarantees that at
the exit port the predicate will have two or more nondet stack frames.
With the current code generator and stack layout structure, I don't think
the debugger can find the bottom stack frame (which has all the variables)
from the top stack frame. The changes to failure handling in the new code
generator should fix this, RSN.

How strongly do people feel a REDO port is needed? I myself expect using it 
very rarely, or not at all.

> > here
> > 	Puts a break point on the predicate referred to by the current event
> > 	at the goal path named by the current event.
> 
> I think it would be much better to have a syntax for specifying "here"
> as a predicate spec in spy and break commands.

Would you care to propose a syntax? It should not shadow any predicate names
the user may use.

> Also, does all of this work exactly the same for functions?  Maybe
> there should be a syntax for spying functions, in case there is a
> function and predicate with the same name and arity?

At the moment the matching works by comparing the module name and pred/func
name in MR_trace with the module and pred/func name in the spy points table.
This design does not allow you to distinguish between

	a function and a predicate with the same module and pred/func names
	two predicates with the same module and pred names but different arity
	two procedures of the same predicate or function

Switching to a spy table that stores pointers to proc layouts not only allows
us to make all of the above distinctions, but will make the matching faster,
and will permit a larger spy table. The break point table will then store
pointers to label layouts, which means that one break point cannot be
on all of the interface ports. Probably we need a third concept:

	- all ports at all goal paths in the procedure (now a spy point)
	- all interface ports in the procedure
	- a given port at a given goal path in the procedure (now a break point)

Maybe we should call the top two both spy points, and associate a "scope"
with each spy point; either interface ports or all ports.

Zoltan.



More information about the developers mailing list