[m-rev.] fix to improve the performance of the XML parser.
Thomas Conway
conway at cs.mu.OZ.AU
Sat Jun 16 15:45:51 AEST 2001
On Wed, Jun 13, 2001 at 07:36:36PM EST, Ralph Becket wrote:
> > From: Thomas Conway [mailto:conway at cs.mu.OZ.AU]
> > Sent: 13 June 2001 01:36
> >
> > Improve the performance of the XML parser.
> >
> > extras/xml/xml.parse.chars.m:
> > Change baseChar and friends so that the big combinator
> > expressions in them are static constants.
>
> A comment in the code explaining the problem and solution
> would be handy. In fact, a comment about this in
> extras/xml/parsing.m would be good, too.
Here's a revised diff.
Since it's all pretty peripheral, I'll commit this and I/you can
make any changes later.
--
Thomas Conway )O+
<conway at cs.mu.oz.au> 499 User error! Replace user, and press any key.
Improve the performance of the XML parser.
extras/xml/parsing.m:
Put some comments in explaining the performance problem and
its solution.
extras/xml/xml.parse.chars.m:
Change baseChar and friends so that the big combinator
expressions in them are static constants.
Index: parsing.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/extras/xml/parsing.m,v
retrieving revision 1.1
diff -u -r1.1 parsing.m
--- parsing.m 2000/09/05 22:33:57 1.1
+++ parsing.m 2001/06/16 05:42:11
@@ -6,6 +6,31 @@
%
% Main author: conway at cs.mu.oz.au.
%
+% This module provides a bunch of parsing combinators directed towards
+% parsing text (in some encoding or bunch of encodings). The parsing state
+% that gets threadded through is polymorphic in the type of the result
+% stored in it. This can cause problems if you construct a big combinator
+% expression (particularly using the "or" combinator) where the type
+% of this result in the initial parsing state is unbound and is inherited
+% from its context. In this case, the combinator expression cannot be made
+% into a static ground term (the typeinfo arguments which must come first
+% are not known until runtime), so it gets constructed every time through.
+% (See e.g. xml.parse.chars.m for some examples.)
+% A useful way to avoid this problem, at least in some cases, is to
+% bind the type variable by setting a dummy result value.
+% e.g. instead of
+% parseChar -->
+% a or b or c or d or e or ....
+% you can write
+% :- type dummy ---> dummy.
+% parseChar -->
+% return(dummy),
+% a or b or c or d or e or ....
+%
+% This does have a slight runtime cost (doing the return), but it has
+% the benefit that it makes that great big combinator expression a
+% constants - a big win.
+%
%---------------------------------------------------------------------------%
:- module parsing.
Index: xml.parse.chars.m
===================================================================
RCS file: /home/staff/zs/imp/mercury/extras/xml/xml.parse.chars.m,v
retrieving revision 1.1
diff -u -r1.1 xml.parse.chars.m
--- xml.parse.chars.m 2000/09/05 22:34:00 1.1
+++ xml.parse.chars.m 2001/06/13 00:24:25
@@ -104,7 +104,10 @@
% | [#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094]
% | [#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3]
+:- type nil ---> nil.
+
baseChar -->
+ return(nil),
(0x0041-0x005A) or (0x0061-0x007A) or (0x00C0-0x00D6)
or (0x00D8-0x00F6) or (0x00F8-0x00FF) or (0x0100-0x0131)
or (0x0134-0x013E) or (0x0141-0x0148) or (0x014A-0x017E)
@@ -171,6 +174,7 @@
% [86] Ideographic ::= [#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]
ideographic -->
+ return(nil),
(0x4E00-0x9FA5) or lit1(0x3007) or (0x3021-0x3029).
% [87] CombiningChar ::= [#x0300-#x0345] | [#x0360-#x0361]
@@ -201,6 +205,7 @@
% | [#x302A-#x302F] | #x3099 | #x309A
combiningChar -->
+ return(nil),
(0x0300-0x0345) or (0x0360-0x0361)
or (0x0483-0x0486) or (0x0591-0x05A1) or (0x05A3-0x05B9)
or (0x05BB-0x05BD) or lit1(0x05BF) or (0x05C1-0x05C2) or lit1(0x05C4)
@@ -237,6 +242,7 @@
% | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]
digit -->
+ return(nil),
(0x0030-0x0039) or (0x0660-0x0669) or (0x06f0-0x06f9)
or (0x0966-0x096f) or (0x09e6-0x09ef) or (0x0a66-0x0a6f)
or (0x0ae6-0x0aef) or (0x0b66-0x0b6f) or (0x0be7-0x0bef)
@@ -248,6 +254,7 @@
% | [#x30fc-#x30fe]
extender -->
+ return(nil),
lit1(0x00b7) or lit1(0x02d0) or lit1(0x02d1) or lit1(0x0387)
or lit1(0x0640) or lit1(0x0e46)
or lit1(0x0ec6) or lit1(0x3005) or (0x3031-0x3035) or (0x309d-0x309e)
--------------------------------------------------------------------------
mercury-reviews mailing list
post: mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
More information about the reviews
mailing list