[m-rev.] For review: Convert Mercury terms to XML

Julien Fischer juliensf at cs.mu.OZ.AU
Sun Dec 5 19:34:58 AEDT 2004


On Sat, 4 Dec 2004, Ian MacLarty wrote:

> For review by anyone.
>
> Estimated hours taken: 60
> Branches: main
>
> Add library module to convert Mercury terms to XML documents and generate DTDs
> for Mercury types.
>
I think that the log message could include more detail than that.

> extras/xml_stylesheets/mercury_term.xsl
> extras/xml_stylesheets/xul_tree.xsl
> 	Some example stylesheets.  One to convert XML generate with the
> 	to_xml library to a Mercury term and one to generate a XUL
> 	term browser for viewing with Mozilla or Firefox.
>
You should also have a README file that explains how to use these stylesheets.

> library/library.m
> 	Add to_xml.
>
> library/to_xml.m
> 	The to_xml module with predicates for generating XML for
> 	Mercury terms and DTDs for Mercury types.
>
I'm not really keen on the name `to_xml'.  How about either
`xml' or `xml_util'?

> tests/hard_coded/Mmakefile
> tests/hard_coded/write_xml.exp
> tests/hard_coded/write_xml.m
> 	Test to_xml predicates.
>
> Index: extras/xml_stylesheets/mercury_term.xsl
> ===================================================================
> RCS file: extras/xml_stylesheets/mercury_term.xsl
> diff -N extras/xml_stylesheets/mercury_term.xsl
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ extras/xml_stylesheets/mercury_term.xsl	4 Dec 2004 07:03:37 -0000
> @@ -0,0 +1,65 @@
> +<?xml version="1.0"?>
> +<xsl:stylesheet version="1.0"
> +xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> +xmlns="http://www.w3.org/TR/xhtml1/strict">
> +<!--
> +	This template produces a Mercury term from an xml document
> +	generated using write_xml.write_xml_doc/3.  The term is suitable for
> +	reading into a Mercury program using io.read/3.
> +-->
Shouldn't that be to_xml.write_xml_doc/3 in this comment?
...

> Index: library/library.m
> ===================================================================
> RCS file: /home/mercury1/repository/mercury/library/library.m,v
> retrieving revision 1.75
> diff -u -r1.75 library.m
> --- library/library.m	16 Nov 2004 00:45:12 -0000	1.75
> +++ library/library.m	3 Dec 2004 02:51:13 -0000
> @@ -108,6 +108,7 @@
>  :- import_module version_hash_table.
>  :- import_module version_store.
>  :- import_module version_types.
> +:- import_module to_xml.
>
>  % The modules intended for Mercury system implementors.
>  :- import_module private_builtin.
> @@ -226,6 +227,7 @@
>  mercury_std_library_module("version_hash_table").
>  mercury_std_library_module("version_store").
>  mercury_std_library_module("version_types").
> +mercury_std_library_module("to_xml").
>
These lists of modules should be in alphabetical order.

>  %---------------------------------------------------------------------------%
>  %---------------------------------------------------------------------------%
> Index: library/to_xml.m
> ===================================================================
> RCS file: library/to_xml.m
> diff -N library/to_xml.m
> --- /dev/null	1 Jan 1970 00:00:00 -0000
> +++ library/to_xml.m	4 Dec 2004 10:46:28 -0000
> @@ -0,0 +1,1046 @@
> +%-----------------------------------------------------------------------------r
> +% Copyright (C) 1993-2004 The University of Melbourne.
> +% This file may only be copied under the terms of the GNU Library General
> +% Public License - see the file COPYING.LIB in the Mercury distribution.
> +%-----------------------------------------------------------------------------%
> +%
> +% File: xml_out.m.
It's now to_xml.m

> +% Main author: maclarty.
> +% Stability: low.
> +%
> +% A Mercury term to XML converter.
> +%
> +% This module contains predicates to write arbitrary Mercury terms to
> +% an output stream as XML.
> +%
I suggest "This module contains predicates that write arbitrary ..."

> +% Each functor in a term is given a corresponding well-formed element name
> +% in the XML document.
> +%
> +% The element names `String', `Int', `Char' and `Float' are reserved for the
> +% corresponding Mercury builtin types.  The values for these elements are
> +% stored as parsed characater data inside the element.
> +%
> +% Elements for discriminated unions get their name from the functor name and
> +% the type name.  Array element names consist of the string "Array--" followed
> +% by the type name of the elements of the array.  All other types are assigned
> +% the reserved `Unrecognised' element name.
> +%
> +% Each element (including `Unrecognised') may also have certain attributes set:
> +%
> +% functor - the original functor name as returned by
> +% 	deconstruct.deconstruct/5.  This attribute will be present for
> +%	every element except elements for builtin types.
> +%
> +% typename - the type name of the Mercury type the element represents.
> +%	This attribute will also always be present for all elements except
> +%	elements for builtin types.
> +%
> +% field - discriminated union functor arguments (including those with a
> +%	builtin type) that have a field name will have this attribute set.
> +%
> +% The XML document can also be annotated with a style sheet reference.  Once a
> +% Mercury term is in XML it can be converted to many different formats using
> +% the appropriate stylesheet.  For example in the extras/xml_stylesheets
> +% distribution there are stylesheets to convert XML documents generated with
> +% this library back to Mercury terms that can be read with io.read/3.  There is
> +% also a stylesheet in extras to browse a Mercury term with Mozilla using XUL.
> +%
> +% To support third parties generating XML which is compatible with the XML
> +% generated by this library, a DTD for a Mercury type can also be generated.
> +%
> +% The generated DTD is also a good reference when creating a stylesheet as
> +% it contains comments describing the mapping from functors to elements.
> +%
> +%-----------------------------------------------------------------------------%
> +%-----------------------------------------------------------------------------%
> +
> +:- module to_xml.
> +:- interface.
> +
> +:- import_module io, int, deconstruct, std_util.
> +
> +%-----------------------------------------------------------------------------%
> +
> +:- type maybe_stylesheet
> +	--->	with_stylesheet(string, string)	% stylesheet type and href.
> +	;	no_stylesheet.
> +
> +	% Values of this type specifys the DOCTYPE of an XML document when
> +	% the DOCTYPE is defined by an external DTD.
s/specifys/specify/

> +	%
> +:- type doctype
> +	--->	public(string)			% FPI
> +	;	public(string, string)		% FPI, URL
> +	;	system(string).			% URL
> +
> +	% Values of this type specify whether a DTD should be included in
> +	% a generated XML document and if so how.
> +	%
> +:- type maybe_dtd
> +			% Embed the entire DTD in the document.
> +	--->	embed
> +			% Included a reference to an external DTD.
> +	;	external(doctype)
> +			% Do not include any DOCTYPE information.
> +	;	no_dtd.
> +
> +	% Values of this type indicate whether a DTD was successfully
> +	% generated or not.  A DTD cannot be generated for a type with more
> +	% than one top-level functor since only one root element can be
> +	% specified by a DTD.
> +	%
> +:- type dtd_generation_result
> +	--->	ok
> +	;	multiple_functors_for_root.
> +
> +	% write_xml_doc(Term, MaybeStyleSheet, MaybeDTD, DTDResult, !IO).
> +	% Write Term to the current output stream as an XML document.
> +	% MaybeStyleSheet and MaybeDTD specify whether or not a stylesheet
> +	% reference and/or a DTD should be included.  Any non-canonical terms
> +	% will be canonicalized.  If an embedded DTD was requested and the type
> +	% has more than one top level functor then multiplt_functors_for_root
> +	% will be returned in DTDResult and nothing will be written.
> +	%
s/multiplt_functors_for_root/multiple_functor_for_root/

> +:- pred write_xml_doc(T::in, maybe_stylesheet::in, maybe_dtd::in,
> +	dtd_generation_result::out, io::di, io::uo) is det.
> +
> +	% write_xml_doc(Stream, Term, MaybeStyleSheet, MaybeDTD, DTDResult,
> +	%	!IO).
> +	% Same as write_xml_doc/5 except write the XML doc to the given
> +	% output stream.
> +	%
> +:- pred write_xml_doc(io.output_stream::in, T::in, maybe_stylesheet::in,
> +	maybe_dtd::in, dtd_generation_result::out, io::di, io::uo) is det.
> +
> +	% write_xml_doc_cc(Term, MaybeStyleSheet, MaybeDTD, DTDResult, !IO).
> +	% Write Term to the current output stream as an XML document.
> +	% MaybeStyleSheet and MaybeDTD specify whether or not a stylesheet
> +	% reference and/or a DTD should be included.  Any non-canonical terms
> +	% will be be written out in full. If an embedded DTD was requested and
> +	% the type has more than one top level functor then
> +	% multiplt_functors_for_root will be returned in DTDResult and nothing
> +	% will be written.
> +	%
s/multiplt_functors_for_root/multiple_functors_for_root/

> +:- pred write_xml_doc_cc(T::in, maybe_stylesheet::in, maybe_dtd::in,
> +	dtd_generation_result::out, io::di, io::uo) is cc_multi.
> +
> +	% write_xml_doc_cc(Stream, Term, MaybeStyleSheet, MaybeDTD, DTDResult,
> +	%	!IO).
> +	% Same as write_xml_doc/5 except write the XML doc to the given
> +	% output stream.
> +	%
> +:- pred write_xml_doc_cc(io.output_stream::in, T::in, maybe_stylesheet::in,
> +	maybe_dtd::in, dtd_generation_result::out, io::di, io::uo) is cc_multi.
> +
> +	% True if the given type doesn't have multiple top level functors.
> +	%
> +:- pred ok_to_generate_dtd(type_desc::in) is semidet.
> +
> +	% write_dtd(Term, DTDResult, !IO).
> +	% Write a DTD for the given term to the current output stream.  If the
> +	% type of Term has more than one top level functor then
> +	% multiple_functors_for_root will be returned in DTDResult and
> +	% nothing will be written, otherwise ok is returned in DTDResult.
> +	%
> +:- pred write_dtd(T::unused, dtd_generation_result::out, io::di, io::uo)
> +	is det.
> +
> +	% write_dtd(Stream, Term, DTDResult, !IO).
> +	% Write a DTD for the given term to the specified output stream. If the
> +	% type of Term has more than one top level functor then
> +	% multiple_functors_for_root will be returned in DTDResult and
> +	% nothing will be written, otherwise ok is returned in DTDResult.
> +	%
> +:- pred write_dtd(io.output_stream::in, T::unused, dtd_generation_result::out,
> +	io::di, io::uo) is det.
> +
> +	% write_dtd_for_type(Type, DTDResult, !IO).
> +	% Write a DTD for the given type to the current output stream. If the
> +	% type has more than one top level functor then
> +	% multiple_functors_for_root will be returned in DTDResult and nothing
> +	% will be written, otherwise ok is returned in DTDResult.
> +	%
> +:- pred write_dtd_from_type(type_desc::in, dtd_generation_result::out,
> +	io::di, io::uo) is det.
> +
> +	% write_dtd_for_type(Stream, Type, DTDResult, !IO).
> +	% Write a DTD for the given type to the given output stream. If the
> +	% type has more than one top level functor then
> +	% multiple_functors_for_root will be returned in DTDResult and nothing
> +	% will be written, otherwise ok is returned in DTDResult.
> +	%
> +:- pred write_dtd_from_type(io.output_stream::in, type_desc::in,
> +	dtd_generation_result::out, io::di, io::uo) is det.
> +
> +	% write_xml_element(NonCanon, IndentLevel, Term, !IO).
> +	% Write XML elements for the given term and all it's descendents,
s/it's/its/

> +	% using IndentLevel as the initial indentation level (each
> +	% indentation level is one space character).  No <?xml ... ?>
> +	% header will be written.  Non canonical terms will be handled
s/Non canonical/Non-canonical/

> +	% according to the value of NonCanon.  See the deconstruct
> +	% library for more information on this argument.
> +	%


> +:- pred write_xml_element(deconstruct.noncanon_handling, int, T, io, io).
> +:- mode write_xml_element(in(do_not_allow), in, in, di, uo) is det.
> +:- mode write_xml_element(in(canonicalize), in, in,  di, uo) is det.
> +:- mode write_xml_element(in(include_details_cc), in, in, di, uo) is cc_multi.
> +:- mode write_xml_element(in, in, in, di, uo) is cc_multi.
> +
> +%-----------------------------------------------------------------------------%
> +%-----------------------------------------------------------------------------%
> +
> +:- implementation.
> +
> +:- import_module string, list, char, exception, bool, array.
> +:- import_module exception, map.
> +
You've imported the exception module twice there.

> +%-----------------------------------------------------------------------------%
> +
> +write_xml_doc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> +	(
> +		( MaybeDTD \= embed ; ok_to_generate_dtd(type_of(X)) )
> +	->
> +		DTDResult = ok,
> +		write_xml_header(no, !IO),
> +		write_stylesheet_ref(MaybeStyleSheet, !IO),
> +		write_doctype(canonicalize, X, MaybeDTD, _, !IO),
> +		write_xml_element(canonicalize, 0, X, !IO)
> +	;
> +		DTDResult = multiple_functors_for_root
> +	).
> +
> +write_xml_doc(Stream, X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> +	io.set_output_stream(Stream, OrigStream, !IO),
> +	write_xml_doc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO),
> +	io.set_output_stream(OrigStream, _, !IO).
> +
> +write_xml_doc_cc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> +	(
> +		( MaybeDTD = embed ; ok_to_generate_dtd(type_of(X)) )
> +	->
> +		DTDResult = ok,
> +		write_xml_header(no, !IO),
> +		write_stylesheet_ref(MaybeStyleSheet, !IO),
> +		write_doctype(include_details_cc, X, MaybeDTD, _, !IO),
> +		write_xml_element(include_details_cc, 0, X, !IO)
> +	;
> +		DTDResult = multiple_functors_for_root
> +	).
> +
> +write_xml_doc_cc(Stream, X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO) :-
> +	io.set_output_stream(Stream, OrigStream, !IO),
> +	write_xml_doc_cc(X, MaybeStyleSheet, MaybeDTD, DTDResult, !IO),
> +	io.set_output_stream(OrigStream, _, !IO).
> +
> +write_xml_element(NonCanon, IndentLevel, X, !IO) :-
> +	type_to_univ(X, Univ),
> +	write_xml_element_univ(NonCanon, IndentLevel, Univ, [], _, !IO).
> +
> +write_dtd(Term, DTDResult, !IO) :-
> +	type_of(Term) = TypeDesc,
> +	write_dtd_from_type(TypeDesc, DTDResult, !IO).
> +
> +write_dtd(Stream, Term, DTDResult, !IO) :-
> +	io.set_output_stream(Stream, OrigStream, !IO),
> +	write_dtd(Term, DTDResult, !IO),
> +	io.set_output_stream(OrigStream, _, !IO).
> +
> +write_dtd_from_type(Stream, TypeDesc, DTDResult, !IO) :-
> +	io.set_output_stream(Stream, OrigStream, !IO),
> +	write_dtd_from_type(TypeDesc, DTDResult, !IO),
> +	io.set_output_stream(OrigStream, _, !IO).
> +
> +:- pred write_xml_header(maybe(string)::in, io::di, io::uo) is det.
> +
> +write_xml_header(MaybeEncoding, !IO) :-
> +	io.write_string("<?xml version=""1.0""", !IO),
> +	(
> +		MaybeEncoding = yes(Encoding),
> +		io.write_string(" encoding=""", !IO),
> +		io.write_string(Encoding, !IO),
> +		io.write_string("""?>\n", !IO)
> +	;
> +		MaybeEncoding = no,
> +		io.write_string("?>\n", !IO)
> +	).
> +
> +:- pred write_stylesheet_ref(maybe_stylesheet::in, io::di, io::uo) is det.
> +
> +write_stylesheet_ref(no_stylesheet, !IO).
> +write_stylesheet_ref(with_stylesheet(Type, Href), !IO) :-
> +	io.write_string("<?xml-stylesheet type=""", !IO),
> +	io.write_string(Type, !IO),
> +	io.write_string(""" href=""", !IO),
> +	io.write_string(Href, !IO),
> +	io.write_string("""?>\n", !IO).
> +
> +:- pred write_doctype(deconstruct.noncanon_handling, T, maybe_dtd,
> +	dtd_generation_result, io, io).
> +:- mode write_doctype(in(canonicalize), in, in, out, di, uo) is det.
> +:- mode write_doctype(in(do_not_allow), in, in, out, di, uo) is det.
> +:- mode write_doctype(in(include_details_cc), in, in, out, di, uo) is cc_multi.
> +:- mode write_doctype(in, in, in, out, di, uo) is cc_multi.
> +
> +write_doctype(_, _, no_dtd, ok, !IO).
> +write_doctype(_, T, embed, DTDResult, !IO) :-
> +	write_dtd(T, DTDResult, !IO),
> +	io.nl(!IO).
> +write_doctype(NonCanon, T, external(DocType), ok, !IO) :-
> +	deconstruct.deconstruct(T, NonCanon, Functor, _, _),
> +	Root = get_element(type_of(T), from_functor_name(Functor)),
> +	io.write_string("<!DOCTYPE ", !IO),
> +	io.write_string(Root, !IO),
> +	(
> +		DocType = public(PUBLIC),
> +		io.write_string(" PUBLIC """, !IO),
> +		io.write_string(PUBLIC, !IO)
> +	;
> +		DocType = public(PUBLIC, SYSTEM),
> +		io.write_string(" PUBLIC """, !IO),
> +		io.write_string(PUBLIC, !IO),
> +		io.write_string(""" """, !IO),
> +		io.write_string(SYSTEM, !IO)
> +	;
> +		DocType = system(SYSTEM),
> +		io.write_string(" SYSTEM """, !IO),
> +		io.write_string(SYSTEM, !IO)
> +	),
> +	io.write_string(""">\n", !IO).
> +
> +%-----------------------------------------------------------------------------%
> +%
> +% Some reserved element names.  Reserved element names all start with a
> +% capital letter so as not to conflict with a mangled element name.
> +%
> +
> +:- func reserved_prefix = string.
> +
> +	% A prefix for functors that start with a capital letter or
> +	% a non-letter.
> +	%
> +reserved_prefix = "Tag_".
> +
> +:- pred common_mercury_functor(string, string).
> +:- mode common_mercury_functor(in, out) is semidet.
> +:- mode common_mercury_functor(out, in) is semidet.
> +
> +	% These should all start with a capital letter so as not to
> +	% conflict with a mangled name.
> +	%
> +common_mercury_functor("[|]", "List").
> +common_mercury_functor("[]", "Nil").
> +common_mercury_functor("{}", "Tuple").
> +
> +	% A general element for types whos structure we do not generate
> +	% DTD rules for.
> +	%
> +:- func unrecognized_element = string.
> +
> +unrecognized_element = "Unrecognized".
> +
> +:- func array_element = string.
> +
> +array_element = "Array".
> +
> +:- pred is_primitive_type(type_desc::in, string::out) is semidet.
> +
> +is_primitive_type(TypeDesc, Element) :-
> +	(
> +		type_of("") = TypeDesc
> +	->
> +		Element = "String"
> +	;
> +		type_of('c') = TypeDesc
> +	->
> +		Element = "Char"
> +	;
> +		type_of(1) = TypeDesc
> +	->
> +		Element = "Int"
> +	;
> +		type_of(1.0) = TypeDesc,
> +		Element = "Float"
> +	).
> +
> +%-----------------------------------------------------------------------------%
> +%
> +% Mangling functions.
> +%
> +% We use the following mangling scheme to create well formed element names
> +% that do not begin with a capital letter (capitals are used for reserved
> +% elements).
> +%
> +% If the string to be mangled begins with a capital letter then we prefix it
> +% with another string reserved for this purpose.  Then we replace all
> +% characters which aren't alpha numeric or underscores with '-' followed by
> +% the character code.
> +%
> +
> +:- func mangle(string) = string.
> +
> +mangle(Functor) = Element :-
> +	string.split(Functor, 1, Head, Tail),
> +	(
> +		string.is_alpha(Head),
> +		string.to_lower(Head) = Head
> +	->
> +		First = Head,
> +		Rest = Tail
> +	;
> +		First = reserved_prefix,
> +		Rest = Head ++ Tail
> +	),
> +	string.foldl(mangle_char, Rest, [], ElementChrs),
> +	Element = First ++ string.from_char_list(ElementChrs).
> +
> +:- pred mangle_char(char::in, list(char)::in, list(char)::out)
> +	is det.
> +
> +	% XXX This is system dependent since char.to_int is system dependent.
> +	%
> +mangle_char(Chr, PrevChrs, list.append(PrevChrs, Chrs)) :-
> +	(
> +		char.is_alnum_or_underscore(Chr)
> +	->
> +		Chrs = [Chr]
> +	;
> +		Chrs = ['-' | string.to_char_list(string.int_to_string(
> +			char.to_int(Chr)))]
> +	).
> +
> +%-----------------------------------------------------------------------------%
> +
> +:- type element_request
> +	--->	from_functor_name(string)
> +	;	from_functor_num(int).
> +
> +:- pragma memo(get_element/2).
> +
Using memoing in the standard library is going to mean that it no
longer compiles for those grades that don't support it, e.g. the java
and il grades.

If it is really essential that this be memoed then one way around
this might be to add an option `--ignore-tabling-in-unsupported-grades'
to the compiler that disables the tabling transformation for those
grades that don't support it and then modify library/Mercury.options
appropriately.

More later...

Cheers,
Julien.
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list