[m-rev.] For review: Convert Mercury terms to XML
Ian MacLarty
maclarty at cs.mu.OZ.AU
Mon Dec 6 14:59:42 AEDT 2004
On Sun, Dec 05, 2004 at 07:34:58PM +1100, Julien Fischer wrote:
>
> On Sat, 4 Dec 2004, Ian MacLarty wrote:
>
> > For review by anyone.
> >
> > Estimated hours taken: 60
> > Branches: main
> >
> > Add library module to convert Mercury terms to XML documents and generate DTDs
> > for Mercury types.
> >
> I think that the log message could include more detail than that.
>
I've added the following:
There is a natural mapping from Mercury terms to XML documents since both are
tree structures. Each functor of a Mercury type is mapped to an element name
in the XML document. Extra information such as the original functor name, type
and field name are stored as attribute values.
> > extras/xml_stylesheets/mercury_term.xsl
> > extras/xml_stylesheets/xul_tree.xsl
> > Some example stylesheets. One to convert XML generate with the
> > to_xml library to a Mercury term and one to generate a XUL
> > term browser for viewing with Mozilla or Firefox.
> >
> You should also have a README file that explains how to use these stylesheets.
>
Index: extras/xml_stylesheets/README
===================================================================
RCS file: extras/xml_stylesheets/README
diff -N extras/xml_stylesheets/README
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ extras/xml_stylesheets/README 6 Dec 2004 03:40:10 -0000
@@ -0,0 +1,17 @@
+This directory contains some example stylesheets for use with the term_to_xml
+library module.
+
+You can apply a stylesheet to an XML document using the program
+xsltproc (available from http://xmlsoft.org/XSLT) or any other XSLT tool. You
+can also include a reference to the stylesheet in the XML document and have
+your browser apply the stylesheet, although this only works with some browsers.
+
+mercury_term.xsl
+ Stylesheet to convert an XML document generated with
+ term.write_xml_doc/6 back to a Mercury term suitable for reading with
+ io.read/3.
+
+xul_tree.xsl
+ Stylesheet to convert an XML document generated with
+ term.write_xml_doc/6 to a XUL tree structure suitable for browsing
+ with Mozilla or Firefox.
> > library/library.m
> > Add to_xml.
> >
> > library/to_xml.m
> > The to_xml module with predicates for generating XML for
> > Mercury terms and DTDs for Mercury types.
> >
> I'm not really keen on the name `to_xml'. How about either
> `xml' or `xml_util'?
>
Everyone seems to agree on term_to_xml, so I'll use that.
> > tests/hard_coded/Mmakefile
> > tests/hard_coded/write_xml.exp
> > tests/hard_coded/write_xml.m
> > Test to_xml predicates.
> >
> > Index: extras/xml_stylesheets/mercury_term.xsl
> > ===================================================================
> > RCS file: extras/xml_stylesheets/mercury_term.xsl
> > diff -N extras/xml_stylesheets/mercury_term.xsl
> > --- /dev/null 1 Jan 1970 00:00:00 -0000
> > +++ extras/xml_stylesheets/mercury_term.xsl 4 Dec 2004 07:03:37 -0000
> > @@ -0,0 +1,65 @@
> > +<?xml version="1.0"?>
> > +<xsl:stylesheet version="1.0"
> > +xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > +xmlns="http://www.w3.org/TR/xhtml1/strict">
> > +<!--
> > + This template produces a Mercury term from an xml document
> > + generated using write_xml.write_xml_doc/3. The term is suitable for
> > + reading into a Mercury program using io.read/3.
> > +-->
> Shouldn't that be to_xml.write_xml_doc/3 in this comment?
> ...
Yep. As you can see I also struggled with a name for this library...
>
> > Index: library/library.m
> > ===================================================================
> > RCS file: /home/mercury1/repository/mercury/library/library.m,v
> > retrieving revision 1.75
> > diff -u -r1.75 library.m
> > --- library/library.m 16 Nov 2004 00:45:12 -0000 1.75
> > +++ library/library.m 3 Dec 2004 02:51:13 -0000
> > @@ -108,6 +108,7 @@
> > :- import_module version_hash_table.
> > :- import_module version_store.
> > :- import_module version_types.
> > +:- import_module to_xml.
> >
> > % The modules intended for Mercury system implementors.
> > :- import_module private_builtin.
> > @@ -226,6 +227,7 @@
> > mercury_std_library_module("version_hash_table").
> > mercury_std_library_module("version_store").
> > mercury_std_library_module("version_types").
> > +mercury_std_library_module("to_xml").
> >
> These lists of modules should be in alphabetical order.
>
Fixed.
> > %---------------------------------------------------------------------------%
> > %---------------------------------------------------------------------------%
> > Index: library/to_xml.m
> > ===================================================================
> > RCS file: library/to_xml.m
> > diff -N library/to_xml.m
> > --- /dev/null 1 Jan 1970 00:00:00 -0000
> > +++ library/to_xml.m 4 Dec 2004 10:46:28 -0000
> > @@ -0,0 +1,1046 @@
> > +%-----------------------------------------------------------------------------r
> > +% Copyright (C) 1993-2004 The University of Melbourne.
> > +% This file may only be copied under the terms of the GNU Library General
> > +% Public License - see the file COPYING.LIB in the Mercury distribution.
> > +%-----------------------------------------------------------------------------%
> > +%
> > +% File: xml_out.m.
> It's now to_xml.m
>
Fixed.
> > +% Main author: maclarty.
> > +% Stability: low.
> > +%
> > +% A Mercury term to XML converter.
> > +%
> > +% This module contains predicates to write arbitrary Mercury terms to
> > +% an output stream as XML.
> > +%
> I suggest "This module contains predicates that write arbitrary ..."
>
Okay.
> > +% Each functor in a term is given a corresponding well-formed element name
> > +% in the XML document.
> > +%
> > +% The element names `String', `Int', `Char' and `Float' are reserved for the
> > +% corresponding Mercury builtin types. The values for these elements are
> > +% stored as parsed characater data inside the element.
> > +%
> > +% Elements for discriminated unions get their name from the functor name and
> > +% the type name. Array element names consist of the string "Array--" followed
> > +% by the type name of the elements of the array. All other types are assigned
> > +% the reserved `Unrecognised' element name.
> > +%
> > +% Each element (including `Unrecognised') may also have certain attributes set:
> > +%
> > +% functor - the original functor name as returned by
> > +% deconstruct.deconstruct/5. This attribute will be present for
> > +% every element except elements for builtin types.
> > +%
> > +% typename - the type name of the Mercury type the element represents.
> > +% This attribute will also always be present for all elements except
> > +% elements for builtin types.
> > +%
> > +% field - discriminated union functor arguments (including those with a
> > +% builtin type) that have a field name will have this attribute set.
> > +%
> > +% The XML document can also be annotated with a style sheet reference. Once a
> > +% Mercury term is in XML it can be converted to many different formats using
> > +% the appropriate stylesheet. For example in the extras/xml_stylesheets
> > +% distribution there are stylesheets to convert XML documents generated with
> > +% this library back to Mercury terms that can be read with io.read/3. There is
> > +% also a stylesheet in extras to browse a Mercury term with Mozilla using XUL.
> > +%
> > +% To support third parties generating XML which is compatible with the XML
> > +% generated by this library, a DTD for a Mercury type can also be generated.
> > +%
> > +% The generated DTD is also a good reference when creating a stylesheet as
> > +% it contains comments describing the mapping from functors to elements.
> > +%
> > +%-----------------------------------------------------------------------------%
> > +%-----------------------------------------------------------------------------%
> > +
> > +:- module to_xml.
> > +:- interface.
> > +
> > +:- import_module io, int, deconstruct, std_util.
> > +
> > +%-----------------------------------------------------------------------------%
> > +
> > +:- type maybe_stylesheet
> > + ---> with_stylesheet(string, string) % stylesheet type and href.
> > + ; no_stylesheet.
> > +
> > + % Values of this type specifys the DOCTYPE of an XML document when
> > + % the DOCTYPE is defined by an external DTD.
> s/specifys/specify/
>
Fixed.
> > + %
> > +:- type doctype
> > + ---> public(string) % FPI
> > + ; public(string, string) % FPI, URL
> > + ; system(string). % URL
> > +
> > + % Values of this type specify whether a DTD should be included in
> > + % a generated XML document and if so how.
> > + %
> > +:- type maybe_dtd
> > + % Embed the entire DTD in the document.
> > + ---> embed
> > + % Included a reference to an external DTD.
> > + ; external(doctype)
> > + % Do not include any DOCTYPE information.
> > + ; no_dtd.
> > +
> > + % Values of this type indicate whether a DTD was successfully
> > + % generated or not. A DTD cannot be generated for a type with more
> > + % than one top-level functor since only one root element can be
> > + % specified by a DTD.
> > + %
> > +:- type dtd_generation_result
> > + ---> ok
> > + ; multiple_functors_for_root.
> > +
> > + % write_xml_doc(Term, MaybeStyleSheet, MaybeDTD, DTDResult, !IO).
> > + % Write Term to the current output stream as an XML document.
> > + % MaybeStyleSheet and MaybeDTD specify whether or not a stylesheet
> > + % reference and/or a DTD should be included. Any non-canonical terms
> > + % will be canonicalized. If an embedded DTD was requested and the type
> > + % has more than one top level functor then multiplt_functors_for_root
> > + % will be returned in DTDResult and nothing will be written.
> > + %
> s/multiplt_functors_for_root/multiple_functor_for_root/
>
Fixed.
> > +:- pred write_xml_doc(T::in, maybe_stylesheet::in, maybe_dtd::in,
> > + dtd_generation_result::out, io::di, io::uo) is det.
> > +
> > + % write_xml_doc(Stream, Term, MaybeStyleSheet, MaybeDTD, DTDResult,
> > + % !IO).
> > + % Same as write_xml_doc/5 except write the XML doc to the given
> > + % output stream.
> > + %
> > +:- pred write_xml_doc(io.output_stream::in, T::in, maybe_stylesheet::in,
> > + maybe_dtd::in, dtd_generation_result::out, io::di, io::uo) is det.
> > +
> > + % write_xml_doc_cc(Term, MaybeStyleSheet, MaybeDTD, DTDResult, !IO).
> > + % Write Term to the current output stream as an XML document.
> > + % MaybeStyleSheet and MaybeDTD specify whether or not a stylesheet
> > + % reference and/or a DTD should be included. Any non-canonical terms
> > + % will be be written out in full. If an embedded DTD was requested and
> > + % the type has more than one top level functor then
> > + % multiplt_functors_for_root will be returned in DTDResult and nothing
> > + % will be written.
> > + %
> s/multiplt_functors_for_root/multiple_functors_for_root/
>
Fixed.
> > +:- pred write_xml_doc_cc(T::in, maybe_stylesheet::in, maybe_dtd::in,
> > + dtd_generation_result::out, io::di, io::uo) is cc_multi.
> > +
> > + % write_xml_doc_cc(Stream, Term, MaybeStyleSheet, MaybeDTD, DTDResult,
> > + % !IO).
> > + % Same as write_xml_doc/5 except write the XML doc to the given
> > + % output stream.
> > + %
> > +:- pred write_xml_doc_cc(io.output_stream::in, T::in, maybe_stylesheet::in,
> > + maybe_dtd::in, dtd_generation_result::out, io::di, io::uo) is cc_multi.
> > +
> > + % True if the given type doesn't have multiple top level functors.
> > + %
> > +:- pred ok_to_generate_dtd(type_desc::in) is semidet.
> > +
> > + % write_dtd(Term, DTDResult, !IO).
> > + % Write a DTD for the given term to the current output stream. If the
> > + % type of Term has more than one top level functor then
> > + % multiple_functors_for_root will be returned in DTDResult and
> > + % nothing will be written, otherwise ok is returned in DTDResult.
> > + %
> > +:- pred write_dtd(T::unused, dtd_generation_result::out, io::di, io::uo)
> > + is det.
> > +
> > + % write_dtd(Stream, Term, DTDResult, !IO).
> > + % Write a DTD for the given term to the specified output stream. If the
> > + % type of Term has more than one top level functor then
> > + % multiple_functors_for_root will be returned in DTDResult and
> > + % nothing will be written, otherwise ok is returned in DTDResult.
> > + %
> > +:- pred write_dtd(io.output_stream::in, T::unused, dtd_generation_result::out,
> > + io::di, io::uo) is det.
> > +
> > + % write_dtd_for_type(Type, DTDResult, !IO).
> > + % Write a DTD for the given type to the current output stream. If the
> > + % type has more than one top level functor then
> > + % multiple_functors_for_root will be returned in DTDResult and nothing
> > + % will be written, otherwise ok is returned in DTDResult.
> > + %
> > +:- pred write_dtd_from_type(type_desc::in, dtd_generation_result::out,
> > + io::di, io::uo) is det.
> > +
> > + % write_dtd_for_type(Stream, Type, DTDResult, !IO).
> > + % Write a DTD for the given type to the given output stream. If the
> > + % type has more than one top level functor then
> > + % multiple_functors_for_root will be returned in DTDResult and nothing
> > + % will be written, otherwise ok is returned in DTDResult.
> > + %
> > +:- pred write_dtd_from_type(io.output_stream::in, type_desc::in,
> > + dtd_generation_result::out, io::di, io::uo) is det.
> > +
> > + % write_xml_element(NonCanon, IndentLevel, Term, !IO).
> > + % Write XML elements for the given term and all it's descendents,
> s/it's/its/
>
Fixed.
> > + % using IndentLevel as the initial indentation level (each
> > + % indentation level is one space character). No <?xml ... ?>
> > + % header will be written. Non canonical terms will be handled
> s/Non canonical/Non-canonical/
>
Fixed.
>
> > +:- pred write_xml_element(deconstruct.noncanon_handling, int, T, io, io).
> > +:- mode write_xml_element(in(do_not_allow), in, in, di, uo) is det.
> > +:- mode write_xml_element(in(canonicalize), in, in, di, uo) is det.
> > +:- mode write_xml_element(in(include_details_cc), in, in, di, uo) is cc_multi.
> > +:- mode write_xml_element(in, in, in, di, uo) is cc_multi.
> > +
> > +%-----------------------------------------------------------------------------%
> > +%-----------------------------------------------------------------------------%
> > +
> > +:- implementation.
> > +
> > +:- import_module string, list, char, exception, bool, array.
> > +:- import_module exception, map.
> > +
> You've imported the exception module twice there.
>
Fixed.
> > +%-----------------------------------------------------------------------------%
> > +
> > +write_xml_doc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> > + (
> > + ( MaybeDTD \= embed ; ok_to_generate_dtd(type_of(X)) )
> > + ->
> > + DTDResult = ok,
> > + write_xml_header(no, !IO),
> > + write_stylesheet_ref(MaybeStyleSheet, !IO),
> > + write_doctype(canonicalize, X, MaybeDTD, _, !IO),
> > + write_xml_element(canonicalize, 0, X, !IO)
> > + ;
> > + DTDResult = multiple_functors_for_root
> > + ).
> > +
> > +write_xml_doc(Stream, X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> > + io.set_output_stream(Stream, OrigStream, !IO),
> > + write_xml_doc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO),
> > + io.set_output_stream(OrigStream, _, !IO).
> > +
> > +write_xml_doc_cc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> > + (
> > + ( MaybeDTD = embed ; ok_to_generate_dtd(type_of(X)) )
> > + ->
> > + DTDResult = ok,
> > + write_xml_header(no, !IO),
> > + write_stylesheet_ref(MaybeStyleSheet, !IO),
> > + write_doctype(include_details_cc, X, MaybeDTD, _, !IO),
> > + write_xml_element(include_details_cc, 0, X, !IO)
> > + ;
> > + DTDResult = multiple_functors_for_root
> > + ).
> > +
> > +write_xml_doc_cc(Stream, X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> > + io.set_output_stream(Stream, OrigStream, !IO),
> > + write_xml_doc_cc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO),
> > + io.set_output_stream(OrigStream, _, !IO).
> > +
> > +write_xml_element(NonCanon, IndentLevel, X, !IO) :-
> > + type_to_univ(X, Univ),
> > + write_xml_element_univ(NonCanon, IndentLevel, Univ, [], _, !IO).
> > +
> > +write_dtd(Term, DTDResult, !IO) :-
> > + type_of(Term) = TypeDesc,
> > + write_dtd_from_type(TypeDesc, DTDResult, !IO).
> > +
> > +write_dtd(Stream, Term, DTDResult, !IO) :-
> > + io.set_output_stream(Stream, OrigStream, !IO),
> > + write_dtd(Term, DTDResult, !IO),
> > + io.set_output_stream(OrigStream, _, !IO).
> > +
> > +write_dtd_from_type(Stream, TypeDesc, DTDResult, !IO) :-
> > + io.set_output_stream(Stream, OrigStream, !IO),
> > + write_dtd_from_type(TypeDesc, DTDResult, !IO),
> > + io.set_output_stream(OrigStream, _, !IO).
> > +
> > +:- pred write_xml_header(maybe(string)::in, io::di, io::uo) is det.
> > +
> > +write_xml_header(MaybeEncoding, !IO) :-
> > + io.write_string("<?xml version=""1.0""", !IO),
> > + (
> > + MaybeEncoding = yes(Encoding),
> > + io.write_string(" encoding=""", !IO),
> > + io.write_string(Encoding, !IO),
> > + io.write_string("""?>\n", !IO)
> > + ;
> > + MaybeEncoding = no,
> > + io.write_string("?>\n", !IO)
> > + ).
> > +
> > +:- pred write_stylesheet_ref(maybe_stylesheet::in, io::di, io::uo) is det.
> > +
> > +write_stylesheet_ref(no_stylesheet, !IO).
> > +write_stylesheet_ref(with_stylesheet(Type, Href), !IO) :-
> > + io.write_string("<?xml-stylesheet type=""", !IO),
> > + io.write_string(Type, !IO),
> > + io.write_string(""" href=""", !IO),
> > + io.write_string(Href, !IO),
> > + io.write_string("""?>\n", !IO).
> > +
> > +:- pred write_doctype(deconstruct.noncanon_handling, T, maybe_dtd,
> > + dtd_generation_result, io, io).
> > +:- mode write_doctype(in(canonicalize), in, in, out, di, uo) is det.
> > +:- mode write_doctype(in(do_not_allow), in, in, out, di, uo) is det.
> > +:- mode write_doctype(in(include_details_cc), in, in, out, di, uo) is cc_multi.
> > +:- mode write_doctype(in, in, in, out, di, uo) is cc_multi.
> > +
> > +write_doctype(_, _, no_dtd, ok, !IO).
> > +write_doctype(_, T, embed, DTDResult, !IO) :-
> > + write_dtd(T, DTDResult, !IO),
> > + io.nl(!IO).
> > +write_doctype(NonCanon, T, external(DocType), ok, !IO) :-
> > + deconstruct.deconstruct(T, NonCanon, Functor, _, _),
> > + Root = get_element(type_of(T), from_functor_name(Functor)),
> > + io.write_string("<!DOCTYPE ", !IO),
> > + io.write_string(Root, !IO),
> > + (
> > + DocType = public(PUBLIC),
> > + io.write_string(" PUBLIC """, !IO),
> > + io.write_string(PUBLIC, !IO)
> > + ;
> > + DocType = public(PUBLIC, SYSTEM),
> > + io.write_string(" PUBLIC """, !IO),
> > + io.write_string(PUBLIC, !IO),
> > + io.write_string(""" """, !IO),
> > + io.write_string(SYSTEM, !IO)
> > + ;
> > + DocType = system(SYSTEM),
> > + io.write_string(" SYSTEM """, !IO),
> > + io.write_string(SYSTEM, !IO)
> > + ),
> > + io.write_string(""">\n", !IO).
> > +
> > +%-----------------------------------------------------------------------------%
> > +%
> > +% Some reserved element names. Reserved element names all start with a
> > +% capital letter so as not to conflict with a mangled element name.
> > +%
> > +
> > +:- func reserved_prefix = string.
> > +
> > + % A prefix for functors that start with a capital letter or
> > + % a non-letter.
> > + %
> > +reserved_prefix = "Tag_".
> > +
> > +:- pred common_mercury_functor(string, string).
> > +:- mode common_mercury_functor(in, out) is semidet.
> > +:- mode common_mercury_functor(out, in) is semidet.
> > +
> > + % These should all start with a capital letter so as not to
> > + % conflict with a mangled name.
> > + %
> > +common_mercury_functor("[|]", "List").
> > +common_mercury_functor("[]", "Nil").
> > +common_mercury_functor("{}", "Tuple").
> > +
> > + % A general element for types whos structure we do not generate
> > + % DTD rules for.
> > + %
> > +:- func unrecognized_element = string.
> > +
> > +unrecognized_element = "Unrecognized".
> > +
> > +:- func array_element = string.
> > +
> > +array_element = "Array".
> > +
> > +:- pred is_primitive_type(type_desc::in, string::out) is semidet.
> > +
> > +is_primitive_type(TypeDesc, Element) :-
> > + (
> > + type_of("") = TypeDesc
> > + ->
> > + Element = "String"
> > + ;
> > + type_of('c') = TypeDesc
> > + ->
> > + Element = "Char"
> > + ;
> > + type_of(1) = TypeDesc
> > + ->
> > + Element = "Int"
> > + ;
> > + type_of(1.0) = TypeDesc,
> > + Element = "Float"
> > + ).
> > +
> > +%-----------------------------------------------------------------------------%
> > +%
> > +% Mangling functions.
> > +%
> > +% We use the following mangling scheme to create well formed element names
> > +% that do not begin with a capital letter (capitals are used for reserved
> > +% elements).
> > +%
> > +% If the string to be mangled begins with a capital letter then we prefix it
> > +% with another string reserved for this purpose. Then we replace all
> > +% characters which aren't alpha numeric or underscores with '-' followed by
> > +% the character code.
> > +%
> > +
> > +:- func mangle(string) = string.
> > +
> > +mangle(Functor) = Element :-
> > + string.split(Functor, 1, Head, Tail),
> > + (
> > + string.is_alpha(Head),
> > + string.to_lower(Head) = Head
> > + ->
> > + First = Head,
> > + Rest = Tail
> > + ;
> > + First = reserved_prefix,
> > + Rest = Head ++ Tail
> > + ),
> > + string.foldl(mangle_char, Rest, [], ElementChrs),
> > + Element = First ++ string.from_char_list(ElementChrs).
> > +
> > +:- pred mangle_char(char::in, list(char)::in, list(char)::out)
> > + is det.
> > +
> > + % XXX This is system dependent since char.to_int is system dependent.
> > + %
> > +mangle_char(Chr, PrevChrs, list.append(PrevChrs, Chrs)) :-
> > + (
> > + char.is_alnum_or_underscore(Chr)
> > + ->
> > + Chrs = [Chr]
> > + ;
> > + Chrs = ['-' | string.to_char_list(string.int_to_string(
> > + char.to_int(Chr)))]
> > + ).
> > +
> > +%-----------------------------------------------------------------------------%
> > +
> > +:- type element_request
> > + ---> from_functor_name(string)
> > + ; from_functor_num(int).
> > +
> > +:- pragma memo(get_element/2).
> > +
> Using memoing in the standard library is going to mean that it no
> longer compiles for those grades that don't support it, e.g. the java
> and il grades.
>
> If it is really essential that this be memoed then one way around
> this might be to add an option `--ignore-tabling-in-unsupported-grades'
> to the compiler that disables the tabling transformation for those
> grades that don't support it and then modify library/Mercury.options
> appropriately.
>
Is the consensus then that I should take out the pragma memos?
Ian.
--------------------------------------------------------------------------
mercury-reviews mailing list
post: mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
More information about the reviews
mailing list