[mercury-users] XML / DOM

Richard A. O'Keefe ok at atlas.otago.ac.nz
Tue Dec 19 12:15:13 AEDT 2000


Michael Day asked:
    > Has anyone tried a DOM implementation in Mercury? Presumably it would be
    > easier to implement one in a Mercury style on top of the existing XML
    > parser than wrap a C/C++ DOM library, given the style of the interface?
	
Thomas Conway replied
    I haven't, but it would be fairly simple to implement one on top of
    the XML document representation used in my XML parser.
	
I must challenge this.  Early this year I set out to implement the DOM
in Smalltalk.  Smalltalk was a real joy to use, it takes the "Oh NO"
out of "O-O".  But the DOM was a major pain.  I started to write a
detailed critique of what is wrong with the DOM, but couldn't think of
anyone who'd publish it.

Basically:
    The Level 1 DOM is woolly.  It's entirely typical of W3C recommendations
    in general:  lots of (somewhat inexpert) attention to interfaces (the
    DOM is desperately in need of refactoring) and no real thought given to
    semantics.  Here are some typical examples.

	1.  Any time you ask the DOM for a string, it is allowed to say
	"sorry, that's too big for me to give you".  But there is no way
	to ask "how big a string *can* I have, then?" and no guaranteed
	minimum.

	2.  If you create a comment node whose content includes "--" as
	a substring, or a processing instruction node whose content
	includes "?>" as a substring, the interface is NOT ALLOWED TO
	COMPLAIN, even though the result can never be legal XML.

    The Level 2 DOM adds a huge amount of complexity to the Level 1 DOM,
    without fixing any of the basic problems.  Here are three problems in
    the Level 2 DOM:
	
	3.  Having comment nodes, and splitting CDATA section nodes out
	from other character data, is great if you are writing an XML
	editor, but lousy if you are doing almost anything else with XML.
	That is, from an SGML point of view,
	    <p>Hello, world!</p>
	and
	    <p>Hello<![CDATA[, ]]>world<![CDATA[!]]></p>
	are the *same* grove, with the <p> element having a *single*
	child.  But in the DOM, the first one has one child, and the
	second one has four.  You may never intend this to happen, but
	the possibility is enough to complicate your application code
	like you wouldn't believe (that, or simply have it wrong...)
	The Level 2 DOM provides a way to filter various things out
	in a traversal, but no way of never building them in the first
	place.

	4.  The specification of traversal (iterators and such) is made
	enormously complicated by the question "What happens to an iterator
	if the position it refers to disappears?"  The answer is complex,
	to me counter-intuitive, and difficult to implement correctly.
	I stopped at this point, because I didn't see the point of working
	hard to implement something I couldn't imagine any sane programmer
	wanting.

	5.  The central design aspect of the DOM, Level 1, and Level 2,
	is "Thou shalt not share structure".  This not only makes editing
	(the apparent primary purpose of the DOM) rather more expensive
	than it should have been, it makes even single-level UNDO hard to
	provide, let alone multi-level UNDO.

I note that since the Level 2 DOM is 469 pages, there is no way it
could possibly be "fairly simple" to implement it, even if the design
were lucid perfection.
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list