[mercury-users] XML Parsing

Ralph Becket rbeck at microsoft.com
Tue Jun 5 02:46:56 AEST 2001


> From: Michael Day [mailto:mikeday at corplink.com.au]
> Sent: 02 June 2001 02:32
> 
> I tried going the other way and writing an XML Schema -> DTD
converter,
> which hit a snag due to the fact that the XML parser in mercury-extras
> takes 42 seconds to parse a simple XML file that takes under a second
to
> parse with any other XML parser.

I haven't tried profiling the XML parser, but looking at the source it
seems that the real sticking point is the production for spotting
letters which is an enormous disjunction of Unicode character ranges.

(Thomas mentions this in reference to compilation times in
http://www.cs.mu.oz.au/research/mercury/mailing-lists/mercury-users/merc
ury-users.0012/0054.html)

If this is indeed a major bottleneck, three solutions come to mind:

1. (Prepare for scream from R.A.O'K :o) rewrite those parts of the
parser
that are Unicode aware to be ASCII-only and use bitmaps to identify
character subsets.

2. Rewrite xml.parse.chars to avoid the use of range and lit tokens
in favour of lookups into 64K-entry bitmaps;

3. Try coding the xml parser up using moose.

> Aside from the astonishingly poor performance, the parser seems well
> written and full featured. It would be a shame not to be able to use
it.
> Does anyone have even the most tentative estimate of when the parser
> combinator style of programming will be compiled into something
> approaching efficiency?

I'm not sure it's the combinator style that's the problem here.  Every
char processed has to go through a 250-odd way disjunction (`baseChar'
has about 180 subranges and `ideographic' has about 70 more).  How many
of those cases have to be checked on average I have no idea.

Perhaps some kind of preprocessing step prior to parsing might help to
optimize the common case (assuming ASCII chars are the common case)?

- Ralph
--------------------------------------------------------------------------
mercury-users mailing list
post:  mercury-users at cs.mu.oz.au
administrative address: owner-mercury-users at cs.mu.oz.au
unsubscribe: Address: mercury-users-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-users-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the users mailing list