[m-rev.] fix to improve the performance of the XML parser.

Thomas Conway conway at cs.mu.OZ.AU
Sat Jun 16 15:45:51 AEST 2001


On Wed, Jun 13, 2001 at 07:36:36PM EST, Ralph Becket wrote:
> > From: Thomas Conway [mailto:conway at cs.mu.OZ.AU]
> > Sent: 13 June 2001 01:36
> >
> > Improve the performance of the XML parser.
> > 
> > extras/xml/xml.parse.chars.m:
> > 	Change baseChar and friends so that the big combinator
> > 	expressions in them are static constants.
> 
> A comment in the code explaining the problem and solution
> would be handy.  In fact, a comment about this in
> extras/xml/parsing.m would be good, too.

Here's a revised diff.
Since it's all pretty peripheral, I'll commit this and I/you can
make any changes later.

-- 
  Thomas Conway )O+
 <conway at cs.mu.oz.au>       499 User error! Replace user, and press any key.


Improve the performance of the XML parser.

extras/xml/parsing.m:
	Put some comments in explaining the performance problem and
	its solution.

extras/xml/xml.parse.chars.m:
	Change baseChar and friends so that the big combinator
	expressions in them are static constants.

Index: parsing.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/extras/xml/parsing.m,v
retrieving revision 1.1
diff -u -r1.1 parsing.m
--- parsing.m	2000/09/05 22:33:57	1.1
+++ parsing.m	2001/06/16 05:42:11
@@ -6,6 +6,31 @@
 %
 % Main author: conway at cs.mu.oz.au.
 %
+% This module provides a bunch of parsing combinators directed towards
+% parsing text (in some encoding or bunch of encodings). The parsing state
+% that gets threadded through is polymorphic in the type of the result
+% stored in it. This can cause problems if you construct a big combinator
+% expression (particularly using the "or" combinator) where the type
+% of this result in the initial parsing state is unbound and is inherited
+% from its context. In this case, the combinator expression cannot be made
+% into a static ground term (the typeinfo arguments which must come first
+% are not known until runtime), so it gets constructed every time through.
+% (See e.g. xml.parse.chars.m for some examples.)
+% A useful way to avoid this problem, at least in some cases, is to
+% bind the type variable by setting a dummy result value.
+% e.g. instead of
+%     parseChar -->
+%         a or b or c or d or e or ....
+% you can write
+%     :- type dummy ---> dummy.
+%     parseChar -->
+%         return(dummy),
+%         a or b or c or d or e or ....
+%
+% This does have a slight runtime cost (doing the return), but it has
+% the benefit that it makes that great big combinator expression a
+% constants - a big win.
+%
 %---------------------------------------------------------------------------%
 :- module parsing.
 
Index: xml.parse.chars.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/extras/xml/xml.parse.chars.m,v
retrieving revision 1.1
diff -u -r1.1 xml.parse.chars.m
--- xml.parse.chars.m	2000/09/05 22:34:00	1.1
+++ xml.parse.chars.m	2001/06/13 00:24:25
@@ -104,7 +104,10 @@
 %   | [#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094]
 %   | [#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3]
 
+:- type nil ---> nil.
+
 baseChar -->
+    return(nil),
     (0x0041-0x005A) or (0x0061-0x007A) or (0x00C0-0x00D6)
     or (0x00D8-0x00F6) or (0x00F8-0x00FF) or (0x0100-0x0131)
     or (0x0134-0x013E) or (0x0141-0x0148) or (0x014A-0x017E)
@@ -171,6 +174,7 @@
 %   [86]  Ideographic ::= [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]
 
 ideographic -->
+    return(nil),
     (0x4E00-0x9FA5) or lit1(0x3007) or (0x3021-0x3029).
 
 %   [87]  CombiningChar ::= [#x0300-#x0345] | [#x0360-#x0361]
@@ -201,6 +205,7 @@
 %   | [#x302A-#x302F] | #x3099 | #x309A
 
 combiningChar -->
+    return(nil),
     (0x0300-0x0345) or (0x0360-0x0361)
     or (0x0483-0x0486) or (0x0591-0x05A1) or (0x05A3-0x05B9)
     or (0x05BB-0x05BD) or lit1(0x05BF) or (0x05C1-0x05C2) or lit1(0x05C4)
@@ -237,6 +242,7 @@
 %   | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]
 
 digit -->
+    return(nil),
     (0x0030-0x0039) or (0x0660-0x0669) or (0x06f0-0x06f9)
     or (0x0966-0x096f) or (0x09e6-0x09ef) or (0x0a66-0x0a6f)
     or (0x0ae6-0x0aef) or (0x0b66-0x0b6f) or (0x0be7-0x0bef)
@@ -248,6 +254,7 @@
 %   | [#x30fc-#x30fe]
 
 extender -->
+    return(nil),
     lit1(0x00b7) or lit1(0x02d0) or lit1(0x02d1) or lit1(0x0387)
     or lit1(0x0640) or lit1(0x0e46)
     or lit1(0x0ec6) or lit1(0x3005) or (0x3031-0x3035) or (0x309d-0x309e)
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list