[m-rev.] For review: Convert Mercury terms to XML
Ralph Becket
rafe at cs.mu.OZ.AU
Mon Dec 6 11:32:19 AEDT 2004
Julien Fischer, Sunday, 5 December 2004:
>
> I'm not really keen on the name `to_xml'. How about either
> `xml' or `xml_util'?
I'm unhappy with all three suggestions.
term_to_xml would describe what it does.
> > Index: library/to_xml.m
I don't think this is of sufficiently general utility to belong in the
standard library. It should go in extras.
> > ===================================================================
> > RCS file: library/to_xml.m
> > diff -N library/to_xml.m
> > --- /dev/null 1 Jan 1970 00:00:00 -0000
> > +++ library/to_xml.m 4 Dec 2004 10:46:28 -0000
> > @@ -0,0 +1,1046 @@
> > +%-----------------------------------------------------------------------------r
> > +% Copyright (C) 1993-2004 The University of Melbourne.
> > +% This file may only be copied under the terms of the GNU Library General
> > +% Public License - see the file COPYING.LIB in the Mercury distribution.
> > +%-----------------------------------------------------------------------------%
> > +%
> > +% File: xml_out.m.
> It's now to_xml.m
>
> > +% Main author: maclarty.
> > +% Stability: low.
> > +%
> > +% A Mercury term to XML converter.
> > +%
> > +% This module contains predicates to write arbitrary Mercury terms to
> > +% an output stream as XML.
> > +%
> I suggest "This module contains predicates that write arbitrary ..."
>
> > +% Each functor in a term is given a corresponding well-formed element name
> > +% in the XML document.
> > +%
> > +% The element names `String', `Int', `Char' and `Float' are reserved for the
> > +% corresponding Mercury builtin types. The values for these elements are
> > +% stored as parsed characater data inside the element.
> > +%
> > +% Elements for discriminated unions get their name from the functor name and
> > +% the type name. Array element names consist of the string "Array--" followed
> > +% by the type name of the elements of the array. All other types are assigned
> > +% the reserved `Unrecognised' element name.
> > +%
> > +% Each element (including `Unrecognised') may also have certain attributes set:
> > +%
> > +% functor - the original functor name as returned by
> > +% deconstruct.deconstruct/5. This attribute will be present for
> > +% every element except elements for builtin types.
> > +%
> > +% typename - the type name of the Mercury type the element represents.
> > +% This attribute will also always be present for all elements except
> > +% elements for builtin types.
> > +%
> > +% field - discriminated union functor arguments (including those with a
> > +% builtin type) that have a field name will have this attribute set.
> > +%
> > +% The XML document can also be annotated with a style sheet reference. Once a
> > +% Mercury term is in XML it can be converted to many different formats using
> > +% the appropriate stylesheet. For example in the extras/xml_stylesheets
> > +% distribution there are stylesheets to convert XML documents generated with
> > +% this library back to Mercury terms that can be read with io.read/3. There is
> > +% also a stylesheet in extras to browse a Mercury term with Mozilla using XUL.
> > +%
> > +% To support third parties generating XML which is compatible with the XML
> > +% generated by this library, a DTD for a Mercury type can also be generated.
> > +%
> > +% The generated DTD is also a good reference when creating a stylesheet as
> > +% it contains comments describing the mapping from functors to elements.
> > +%
> > +%-----------------------------------------------------------------------------%
> > +%-----------------------------------------------------------------------------%
> > +
> > +:- module to_xml.
> > +:- interface.
> > +
> > +:- import_module io, int, deconstruct, std_util.
> > +
> > +%-----------------------------------------------------------------------------%
> > +
> > +:- type maybe_stylesheet
> > + ---> with_stylesheet(string, string) % stylesheet type and href.
> > + ; no_stylesheet.
> > +
> > + % Values of this type specifys the DOCTYPE of an XML document when
> > + % the DOCTYPE is defined by an external DTD.
> s/specifys/specify/
>
> > + %
> > +:- type doctype
> > + ---> public(string) % FPI
> > + ; public(string, string) % FPI, URL
> > + ; system(string). % URL
> > +
> > + % Values of this type specify whether a DTD should be included in
> > + % a generated XML document and if so how.
> > + %
> > +:- type maybe_dtd
> > + % Embed the entire DTD in the document.
> > + ---> embed
> > + % Included a reference to an external DTD.
> > + ; external(doctype)
> > + % Do not include any DOCTYPE information.
> > + ; no_dtd.
> > +
> > + % Values of this type indicate whether a DTD was successfully
> > + % generated or not. A DTD cannot be generated for a type with more
> > + % than one top-level functor since only one root element can be
> > + % specified by a DTD.
> > + %
> > +:- type dtd_generation_result
> > + ---> ok
> > + ; multiple_functors_for_root.
> > +
> > + % write_xml_doc(Term, MaybeStyleSheet, MaybeDTD, DTDResult, !IO).
> > + % Write Term to the current output stream as an XML document.
> > + % MaybeStyleSheet and MaybeDTD specify whether or not a stylesheet
> > + % reference and/or a DTD should be included. Any non-canonical terms
> > + % will be canonicalized. If an embedded DTD was requested and the type
> > + % has more than one top level functor then multiplt_functors_for_root
> > + % will be returned in DTDResult and nothing will be written.
> > + %
> s/multiplt_functors_for_root/multiple_functor_for_root/
>
> > +:- pred write_xml_doc(T::in, maybe_stylesheet::in, maybe_dtd::in,
> > + dtd_generation_result::out, io::di, io::uo) is det.
> > +
> > + % write_xml_doc(Stream, Term, MaybeStyleSheet, MaybeDTD, DTDResult,
> > + % !IO).
> > + % Same as write_xml_doc/5 except write the XML doc to the given
> > + % output stream.
> > + %
> > +:- pred write_xml_doc(io.output_stream::in, T::in, maybe_stylesheet::in,
> > + maybe_dtd::in, dtd_generation_result::out, io::di, io::uo) is det.
> > +
> > + % write_xml_doc_cc(Term, MaybeStyleSheet, MaybeDTD, DTDResult, !IO).
> > + % Write Term to the current output stream as an XML document.
> > + % MaybeStyleSheet and MaybeDTD specify whether or not a stylesheet
> > + % reference and/or a DTD should be included. Any non-canonical terms
> > + % will be be written out in full. If an embedded DTD was requested and
> > + % the type has more than one top level functor then
> > + % multiplt_functors_for_root will be returned in DTDResult and nothing
> > + % will be written.
> > + %
> s/multiplt_functors_for_root/multiple_functors_for_root/
>
> > +:- pred write_xml_doc_cc(T::in, maybe_stylesheet::in, maybe_dtd::in,
> > + dtd_generation_result::out, io::di, io::uo) is cc_multi.
> > +
> > + % write_xml_doc_cc(Stream, Term, MaybeStyleSheet, MaybeDTD, DTDResult,
> > + % !IO).
> > + % Same as write_xml_doc/5 except write the XML doc to the given
> > + % output stream.
> > + %
> > +:- pred write_xml_doc_cc(io.output_stream::in, T::in, maybe_stylesheet::in,
> > + maybe_dtd::in, dtd_generation_result::out, io::di, io::uo) is cc_multi.
> > +
> > + % True if the given type doesn't have multiple top level functors.
> > + %
> > +:- pred ok_to_generate_dtd(type_desc::in) is semidet.
> > +
> > + % write_dtd(Term, DTDResult, !IO).
> > + % Write a DTD for the given term to the current output stream. If the
> > + % type of Term has more than one top level functor then
> > + % multiple_functors_for_root will be returned in DTDResult and
> > + % nothing will be written, otherwise ok is returned in DTDResult.
> > + %
> > +:- pred write_dtd(T::unused, dtd_generation_result::out, io::di, io::uo)
> > + is det.
> > +
> > + % write_dtd(Stream, Term, DTDResult, !IO).
> > + % Write a DTD for the given term to the specified output stream. If the
> > + % type of Term has more than one top level functor then
> > + % multiple_functors_for_root will be returned in DTDResult and
> > + % nothing will be written, otherwise ok is returned in DTDResult.
> > + %
> > +:- pred write_dtd(io.output_stream::in, T::unused, dtd_generation_result::out,
> > + io::di, io::uo) is det.
> > +
> > + % write_dtd_for_type(Type, DTDResult, !IO).
> > + % Write a DTD for the given type to the current output stream. If the
> > + % type has more than one top level functor then
> > + % multiple_functors_for_root will be returned in DTDResult and nothing
> > + % will be written, otherwise ok is returned in DTDResult.
> > + %
> > +:- pred write_dtd_from_type(type_desc::in, dtd_generation_result::out,
> > + io::di, io::uo) is det.
> > +
> > + % write_dtd_for_type(Stream, Type, DTDResult, !IO).
> > + % Write a DTD for the given type to the given output stream. If the
> > + % type has more than one top level functor then
> > + % multiple_functors_for_root will be returned in DTDResult and nothing
> > + % will be written, otherwise ok is returned in DTDResult.
> > + %
> > +:- pred write_dtd_from_type(io.output_stream::in, type_desc::in,
> > + dtd_generation_result::out, io::di, io::uo) is det.
> > +
> > + % write_xml_element(NonCanon, IndentLevel, Term, !IO).
> > + % Write XML elements for the given term and all it's descendents,
> s/it's/its/
>
> > + % using IndentLevel as the initial indentation level (each
> > + % indentation level is one space character). No <?xml ... ?>
> > + % header will be written. Non canonical terms will be handled
> s/Non canonical/Non-canonical/
>
> > + % according to the value of NonCanon. See the deconstruct
> > + % library for more information on this argument.
> > + %
>
>
> > +:- pred write_xml_element(deconstruct.noncanon_handling, int, T, io, io).
> > +:- mode write_xml_element(in(do_not_allow), in, in, di, uo) is det.
> > +:- mode write_xml_element(in(canonicalize), in, in, di, uo) is det.
> > +:- mode write_xml_element(in(include_details_cc), in, in, di, uo) is cc_multi.
> > +:- mode write_xml_element(in, in, in, di, uo) is cc_multi.
> > +
> > +%-----------------------------------------------------------------------------%
> > +%-----------------------------------------------------------------------------%
> > +
> > +:- implementation.
> > +
> > +:- import_module string, list, char, exception, bool, array.
> > +:- import_module exception, map.
> > +
> You've imported the exception module twice there.
>
> > +%-----------------------------------------------------------------------------%
> > +
> > +write_xml_doc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> > + (
> > + ( MaybeDTD \= embed ; ok_to_generate_dtd(type_of(X)) )
> > + ->
> > + DTDResult = ok,
> > + write_xml_header(no, !IO),
> > + write_stylesheet_ref(MaybeStyleSheet, !IO),
> > + write_doctype(canonicalize, X, MaybeDTD, _, !IO),
> > + write_xml_element(canonicalize, 0, X, !IO)
> > + ;
> > + DTDResult = multiple_functors_for_root
> > + ).
> > +
> > +write_xml_doc(Stream, X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> > + io.set_output_stream(Stream, OrigStream, !IO),
> > + write_xml_doc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO),
> > + io.set_output_stream(OrigStream, _, !IO).
> > +
> > +write_xml_doc_cc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> > + (
> > + ( MaybeDTD = embed ; ok_to_generate_dtd(type_of(X)) )
> > + ->
> > + DTDResult = ok,
> > + write_xml_header(no, !IO),
> > + write_stylesheet_ref(MaybeStyleSheet, !IO),
> > + write_doctype(include_details_cc, X, MaybeDTD, _, !IO),
> > + write_xml_element(include_details_cc, 0, X, !IO)
> > + ;
> > + DTDResult = multiple_functors_for_root
> > + ).
> > +
> > +write_xml_doc_cc(Stream, X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> > + io.set_output_stream(Stream, OrigStream, !IO),
> > + write_xml_doc_cc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO),
> > + io.set_output_stream(OrigStream, _, !IO).
> > +
> > +write_xml_element(NonCanon, IndentLevel, X, !IO) :-
> > + type_to_univ(X, Univ),
> > + write_xml_element_univ(NonCanon, IndentLevel, Univ, [], _, !IO).
> > +
> > +write_dtd(Term, DTDResult, !IO) :-
> > + type_of(Term) = TypeDesc,
> > + write_dtd_from_type(TypeDesc, DTDResult, !IO).
> > +
> > +write_dtd(Stream, Term, DTDResult, !IO) :-
> > + io.set_output_stream(Stream, OrigStream, !IO),
> > + write_dtd(Term, DTDResult, !IO),
> > + io.set_output_stream(OrigStream, _, !IO).
> > +
> > +write_dtd_from_type(Stream, TypeDesc, DTDResult, !IO) :-
> > + io.set_output_stream(Stream, OrigStream, !IO),
> > + write_dtd_from_type(TypeDesc, DTDResult, !IO),
> > + io.set_output_stream(OrigStream, _, !IO).
> > +
> > +:- pred write_xml_header(maybe(string)::in, io::di, io::uo) is det.
> > +
> > +write_xml_header(MaybeEncoding, !IO) :-
> > + io.write_string("<?xml version=""1.0""", !IO),
> > + (
> > + MaybeEncoding = yes(Encoding),
> > + io.write_string(" encoding=""", !IO),
> > + io.write_string(Encoding, !IO),
> > + io.write_string("""?>\n", !IO)
> > + ;
> > + MaybeEncoding = no,
> > + io.write_string("?>\n", !IO)
> > + ).
> > +
> > +:- pred write_stylesheet_ref(maybe_stylesheet::in, io::di, io::uo) is det.
> > +
> > +write_stylesheet_ref(no_stylesheet, !IO).
> > +write_stylesheet_ref(with_stylesheet(Type, Href), !IO) :-
> > + io.write_string("<?xml-stylesheet type=""", !IO),
> > + io.write_string(Type, !IO),
> > + io.write_string(""" href=""", !IO),
> > + io.write_string(Href, !IO),
> > + io.write_string("""?>\n", !IO).
> > +
> > +:- pred write_doctype(deconstruct.noncanon_handling, T, maybe_dtd,
> > + dtd_generation_result, io, io).
> > +:- mode write_doctype(in(canonicalize), in, in, out, di, uo) is det.
> > +:- mode write_doctype(in(do_not_allow), in, in, out, di, uo) is det.
> > +:- mode write_doctype(in(include_details_cc), in, in, out, di, uo) is cc_multi.
> > +:- mode write_doctype(in, in, in, out, di, uo) is cc_multi.
> > +
> > +write_doctype(_, _, no_dtd, ok, !IO).
> > +write_doctype(_, T, embed, DTDResult, !IO) :-
> > + write_dtd(T, DTDResult, !IO),
> > + io.nl(!IO).
> > +write_doctype(NonCanon, T, external(DocType), ok, !IO) :-
> > + deconstruct.deconstruct(T, NonCanon, Functor, _, _),
> > + Root = get_element(type_of(T), from_functor_name(Functor)),
> > + io.write_string("<!DOCTYPE ", !IO),
> > + io.write_string(Root, !IO),
> > + (
> > + DocType = public(PUBLIC),
> > + io.write_string(" PUBLIC """, !IO),
> > + io.write_string(PUBLIC, !IO)
> > + ;
> > + DocType = public(PUBLIC, SYSTEM),
> > + io.write_string(" PUBLIC """, !IO),
> > + io.write_string(PUBLIC, !IO),
> > + io.write_string(""" """, !IO),
> > + io.write_string(SYSTEM, !IO)
> > + ;
> > + DocType = system(SYSTEM),
> > + io.write_string(" SYSTEM """, !IO),
> > + io.write_string(SYSTEM, !IO)
> > + ),
> > + io.write_string(""">\n", !IO).
> > +
> > +%-----------------------------------------------------------------------------%
> > +%
> > +% Some reserved element names. Reserved element names all start with a
> > +% capital letter so as not to conflict with a mangled element name.
> > +%
> > +
> > +:- func reserved_prefix = string.
> > +
> > + % A prefix for functors that start with a capital letter or
> > + % a non-letter.
> > + %
> > +reserved_prefix = "Tag_".
> > +
> > +:- pred common_mercury_functor(string, string).
> > +:- mode common_mercury_functor(in, out) is semidet.
> > +:- mode common_mercury_functor(out, in) is semidet.
> > +
> > + % These should all start with a capital letter so as not to
> > + % conflict with a mangled name.
> > + %
> > +common_mercury_functor("[|]", "List").
> > +common_mercury_functor("[]", "Nil").
> > +common_mercury_functor("{}", "Tuple").
> > +
> > + % A general element for types whos structure we do not generate
> > + % DTD rules for.
> > + %
> > +:- func unrecognized_element = string.
> > +
> > +unrecognized_element = "Unrecognized".
> > +
> > +:- func array_element = string.
> > +
> > +array_element = "Array".
> > +
> > +:- pred is_primitive_type(type_desc::in, string::out) is semidet.
> > +
> > +is_primitive_type(TypeDesc, Element) :-
> > + (
> > + type_of("") = TypeDesc
> > + ->
> > + Element = "String"
> > + ;
> > + type_of('c') = TypeDesc
> > + ->
> > + Element = "Char"
> > + ;
> > + type_of(1) = TypeDesc
> > + ->
> > + Element = "Int"
> > + ;
> > + type_of(1.0) = TypeDesc,
> > + Element = "Float"
> > + ).
> > +
> > +%-----------------------------------------------------------------------------%
> > +%
> > +% Mangling functions.
> > +%
> > +% We use the following mangling scheme to create well formed element names
> > +% that do not begin with a capital letter (capitals are used for reserved
> > +% elements).
> > +%
> > +% If the string to be mangled begins with a capital letter then we prefix it
> > +% with another string reserved for this purpose. Then we replace all
> > +% characters which aren't alpha numeric or underscores with '-' followed by
> > +% the character code.
> > +%
> > +
> > +:- func mangle(string) = string.
> > +
> > +mangle(Functor) = Element :-
> > + string.split(Functor, 1, Head, Tail),
> > + (
> > + string.is_alpha(Head),
> > + string.to_lower(Head) = Head
> > + ->
> > + First = Head,
> > + Rest = Tail
> > + ;
> > + First = reserved_prefix,
> > + Rest = Head ++ Tail
> > + ),
> > + string.foldl(mangle_char, Rest, [], ElementChrs),
> > + Element = First ++ string.from_char_list(ElementChrs).
> > +
> > +:- pred mangle_char(char::in, list(char)::in, list(char)::out)
> > + is det.
> > +
> > + % XXX This is system dependent since char.to_int is system dependent.
> > + %
> > +mangle_char(Chr, PrevChrs, list.append(PrevChrs, Chrs)) :-
> > + (
> > + char.is_alnum_or_underscore(Chr)
> > + ->
> > + Chrs = [Chr]
> > + ;
> > + Chrs = ['-' | string.to_char_list(string.int_to_string(
> > + char.to_int(Chr)))]
> > + ).
> > +
> > +%-----------------------------------------------------------------------------%
> > +
> > +:- type element_request
> > + ---> from_functor_name(string)
> > + ; from_functor_num(int).
> > +
> > +:- pragma memo(get_element/2).
> > +
> Using memoing in the standard library is going to mean that it no
> longer compiles for those grades that don't support it, e.g. the java
> and il grades.
I agree. It's not a good idea. For a start, memo tables persist
through GC, so they are likely to be a type of space leak for
long-running applications (this is a hole in the memoing API IMHO).
I'd leave it to the user to add memoed wrapper predicates if they need
them.
-- Ralph
--------------------------------------------------------------------------
mercury-reviews mailing list
post: mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------
More information about the reviews
mailing list