[m-rev.] for review: extend coverage of builtin types in the reference manual

Zoltan Somogyi zoltan.somogyi at runbox.com
Tue Sep 18 17:28:45 AEST 2018



On Mon, 17 Sep 2018 05:46:38 +0000 (UTC), Julien Fischer <jfischer at opturion.com> wrote:

> 
> For review by anyone.
> 
> This one probably requires a few set of eyes; it changes the language in
> a number of ways.  Most of this of been discussed on the mailing lists
> before and there has been no disagreement.
> 
> There has been some discussion in the past about allowing float to be
> a binary128 IEEE (i.e. quad-precision) number; this is *not* what the
> change below allows for.  The reasons for this is that hardware support
> for that format is pretty rare and it isn't generally accessible in the
> majority of environments in which Mercury runs.  My preferred direction
> for floating-point numbers in Mercury is that we transition to having
> multiple floating-point types and that the float type would eventually
> be restricted to be a binary64 value.
> 
> Julien.
> 
> ---------------------------------------
> 
> Extend coverage of builtin types in the reference manual.
> 
> Replace the table of builtin types with separate subsections that cover
> the primitive and other builtin types.
> 
> Add documentation for the fixed size integer types.
> 
> Tighten up the specification of builtin types in various ways, namely:
> 
>   - require that int and uint are at least 32-bits in width.
>   - require that int and uint be the same width.
>   - require that float be either a 32- or 64-bit IEEE floating-point value.
>   - specify that char corresponds to a Unicode code point.
>   - restrict the allowed encoding for Mercury strings to be either UTF-8 or UTF-16.
> 
> doc/reference_manual.texi:
>      As above.
> 
> 
> diff --git a/doc/reference_manual.texi b/doc/reference_manual.texi
> index f9862f2..be29fc5 100644
> --- a/doc/reference_manual.texi
> +++ b/doc/reference_manual.texi
> @@ -1999,46 +1999,155 @@ type classes (@pxref{Type classes}), and existentially quantified types
>   @node Builtin types
>   @section Builtin types
> 
> -Certain special types are builtin, or are defined in the Mercury library:
> +This section describes the special types that are builtin into the Mercury
> +implementation or defined in the standard library.

"are built into"

", or are defined"

> + at menu
> +* Primitive types::
> +* Other builtin types::
> + at end menu
> +
> + at node Primitive types
> + at subsection Primitive types
> +
> +There is a special syntax for constants of all primitive types except
> + at code{char}.
> +(For @code{char}, the standard syntax suffixes.)

I would actually prefer special syntax for char constants, precisely because
I disagree with "the standard syntax suffices" if that statement is taken
at a deeper level of meaning. But that is a separate discussion.

> + at menu
> +* Signed integer types::
> +* Unsigned integer types::
> +* Floating-point type::
> +* Character type::
> +* String type::
> + at end menu
> +
> + at node Signed integer types
> + at subsubsection Signed integer types
> +There are five primitive signed integer types: @code{int}, @code{int8},
> + at code{int16}, @code{int32} and @code{int64}.
> +
> +Except for @code{int}, the width of each of these is given by the numeric
> +suffix in its name.
> +
> +The width of @code{int} is implementation defined, but must be at least 32-bits

I would end the sentence here, ...

> +and must be equal to the width of the type @code{uint}.

... and move this to the discussion of the unsigned types, since it is not about
the *signed* int types.

> +All signed integer types use two's-complement representation.
> +
> +Values of the type @code{int8} must be in the range @math{-128} to @math{127},
> +inclusive.

Here and everywhere else, I would replace "inclusive" with "both inclusive".
Likewise, after every decimal int constant, I would put the binary expression
that mandates its use as a limit (i.e. -2^(n-1) - 1 and 2^(n-1) for n bit signed ints).

> +Values of the type @code{int16} must be in the range @math{-32768} to @math{32767},
> +inclusive.
> +
> +Values of the type @code{int32} must be in the range @math{-2147483648} to
> + at math{2147483647}, inclusive.
> +
> +Values of the type @code{int64} must be in the range @math{-9223372036854775808}
> +to @math{9223372036854775807}, inclusive.
> +
> +Values of the type @code{int} must be in the range to @math{-(2^{N - 1})} to
> + at math{2^{N - 1} - 1}, inclusive; @math{N} being the width of @code{int}.

And then the above line should not be needed.

> + at node Unsigned integer types
> + at subsubsection Unsigned integer types
> +There are five primitive unsigned integer types: @code{uint}, @code{uint8},
> + at code{uint16}, @code{uint32} and @code{uint64}.
> +
> +Except for @code{uint}, the width of each of these types is given by the numeric
> +suffix in its name.

Agree with Peter: "width in bits".

> +The width of @code{uint} is implementation defined, but must be at least
> +32-bits and must be equal to the width of the type @code{int}.

I would generalize this to apply also to *every* uint type: their widths must *all* be
equal to the corresponding signed type.

> +It is represented using either the 32-bit single-precision IEEE 754 format or
> +the 64-bit double-precision IEEE 754 format.
> +
> +The choice between the two formats is implementation dependent.

Wouldn't it be more useful to say that it is *grade* dependent,
being 32 bit in spf grades and 64 bit in every other grade?

> + at node Character type
> + at subsubsection Character type
> +There is one character type: @code{char}.
> +
> +Values of this type represent Unicode code points.
> +
> + at node String type
> + at subsubsection String type
> +There is one string type: @code{string}.
> +
> +A string is a sequence of characters encoded using either the UTF-8 or UTF-16
> +encoding of Unicode.
> +
> +The choice between the two encodings is implementation dependent.

Again, isn't this grade dependent? I thought all C grades are UTF-8, and
the Java grade is UTF-16 (no idea about C# or Erlang). The more specific
we can be, the better, since I don't think we will ever want to change away
from UTF-8 for C, nor will we be able to change away from UTF-16 for Java,
unless Java itself changes.

> + at noindent
> +The function types are  @code{(func) = T}, @code{func(T1) = T}, @code{func(T1, T2) = T}, @dots{}
> +
> +Higher-order predicate and function types are used to pass procedure addresses
> +and closures to other predicates and functions.  @xref{Higher-order}.

Why mention "procedure addresses" here? They are a special case of closures,
and an implementation detail. And an xref should not be a sentence by itself.

> - at item The universal type: @code{univ}.
> + at node The universal type
> + at subsubsection The universal type
>   The type @code{univ} is defined in the standard library module @code{univ},
>   along with the predicates @code{type_to_univ/2} and @code{univ_to_type/2}.
>   With those predicates, any type can be converted to the universal type
>   and back again.

"VALUES OF any type can be converted ..."

> - at item The ``state-of-the-world'' type: @code{io.state}.
> + at node The ``state-of-the-world'' type
> + at subsubsection The ``state-of-the-world'' type
>   The type @code{io.state} is defined in the standard library module @code{io},
>   and represents the state of the world.
> -Predicates which perform I/O are passed the old state of the world
> -and produce a new state of the world.
> +Predicates which perform I/O are passed the old state of the world and produce
> +a new state of the world.
>   In this way, we can give a declarative semantics to code that performs I/O.

We may want to expand this to say "are passed the last reference to the old state
of the world, and produce a unique reference to the new state ..."

Zoltan.


More information about the reviews mailing list