[m-rev.] for review: extend coverage of builtin types in the reference manual

Julien Fischer jfischer at opturion.com
Wed Sep 19 13:19:00 AEST 2018
Previous message: [m-rev.] for review: extend coverage of builtin types in the reference manual
Next message: [m-rev.] for review: extend coverage of builtin types in the reference manual
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Zoltan,

On Tue, 18 Sep 2018, Zoltan Somogyi wrote:

>> diff --git a/doc/reference_manual.texi b/doc/reference_manual.texi
>> index f9862f2..be29fc5 100644
>> --- a/doc/reference_manual.texi
>> +++ b/doc/reference_manual.texi
>> @@ -1999,46 +1999,155 @@ type classes (@pxref{Type classes}), and existentially quantified types
>>   @node Builtin types
>>   @section Builtin types
>>
>> -Certain special types are builtin, or are defined in the Mercury library:
>> +This section describes the special types that are builtin into the Mercury
>> +implementation or defined in the standard library.
>
> "are built into"

Done.

> ", or are defined"

Done.

>> + at menu
>> +* Primitive types::
>> +* Other builtin types::
>> + at end menu
>> +
>> + at node Primitive types
>> + at subsection Primitive types
>> +
>> +There is a special syntax for constants of all primitive types except
>> + at code{char}.
>> +(For @code{char}, the standard syntax suffixes.)
>
> I would actually prefer special syntax for char constants, precisely because
> I disagree with "the standard syntax suffices" if that statement is taken
> at a deeper level of meaning. But that is a separate discussion.

That was the existing wording (i.e. the one that's been there for
over 20 years).

And yes, we do need a special syntax for chars, but as you say, that's
a separate discussion.

>> + at menu
>> +* Signed integer types::
>> +* Unsigned integer types::
>> +* Floating-point type::
>> +* Character type::
>> +* String type::
>> + at end menu
>> +
>> + at node Signed integer types
>> + at subsubsection Signed integer types
>> +There are five primitive signed integer types: @code{int}, @code{int8},
>> + at code{int16}, @code{int32} and @code{int64}.
>> +
>> +Except for @code{int}, the width of each of these is given by the numeric
>> +suffix in its name.
>> +
>> +The width of @code{int} is implementation defined, but must be at least 32-bits
>
> I would end the sentence here, ...

Done.

>> +and must be equal to the width of the type @code{uint}.
>
> ... and move this to the discussion of the unsigned types, since it is not about
> the *signed* int types.

It's a constraint on the width of the type int; IMO it does belong here.

>> +All signed integer types use two's-complement representation.
>> +
>> +Values of the type @code{int8} must be in the range @math{-128} to @math{127},
>> +inclusive.
>
> Here and everywhere else, I would replace "inclusive" with "both inclusive".

Done.

> Likewise, after every decimal int constant, I would put the binary expression
> that mandates its use as a limit (i.e. -2^(n-1) - 1 and 2^(n-1) for n bit signed ints).

Done.

>> +Values of the type @code{int16} must be in the range @math{-32768} to @math{32767},
>> +inclusive.
>> +
>> +Values of the type @code{int32} must be in the range @math{-2147483648} to
>> + at math{2147483647}, inclusive.
>> +
>> +Values of the type @code{int64} must be in the range @math{-9223372036854775808}
>> +to @math{9223372036854775807}, inclusive.
>> +
>> +Values of the type @code{int} must be in the range to @math{-(2^{N - 1})} to
>> + at math{2^{N - 1} - 1}, inclusive; @math{N} being the width of @code{int}.
>
> And then the above line should not be needed.

Do you mean something like:

     Values of signed integer types must be in the range @math{-(2^n - 1})} to
     @math{2^{n - 1} - 1} for @math{n} bit signed integers, both inclusive.

     Values of the type @code{int8) ...

     Values of the type @code{int16} ...

?  Is it still worth listing the ranges of the fixed size separately then?

>> + at node Unsigned integer types
>> + at subsubsection Unsigned integer types
>> +There are five primitive unsigned integer types: @code{uint}, @code{uint8},
>> + at code{uint16}, @code{uint32} and @code{uint64}.
>> +
>> +Except for @code{uint}, the width of each of these types is given by the numeric
>> +suffix in its name.
>
> Agree with Peter: "width in bits".

Added.

>> +The width of @code{uint} is implementation defined, but must be at least
>> +32-bits and must be equal to the width of the type @code{int}.
>
> I would generalize this to apply also to *every* uint type: their widths must *all* be
> equal to the corresponding signed type.

Why?  That follows for all of them anway except uint since their width is
fixed.

>> +It is represented using either the 32-bit single-precision IEEE 754 format or
>> +the 64-bit double-precision IEEE 754 format.
>> +
>> +The choice between the two formats is implementation dependent.
>
> Wouldn't it be more useful to say that it is *grade* dependent,
> being 32 bit in spf grades and 64 bit in every other grade?

The reference manual (mostly) treats grades as an implementation detail
of the Melbourne Mercury compiler.  The wording here is supposed to describe
the choices that an arbitrary Mercury implementation is allowed to make.

>> + at node String type
>> + at subsubsection String type
>> +There is one string type: @code{string}.
>> +
>> +A string is a sequence of characters encoded using either the UTF-8 or UTF-16
>> +encoding of Unicode.
>> +
>> +The choice between the two encodings is implementation dependent.
>
> Again, isn't this grade dependent? I thought all C grades are UTF-8, and
> the Java grade is UTF-16 (no idea about C# or Erlang). The more specific
> we can be, the better, since I don't think we will ever want to change away
> from UTF-8 for C, nor will we be able to change away from UTF-16 for Java,
> unless Java itself changes.

For the MMC implementation it's grade dependent.  The current situation is:

    C, Erlang backends: UTF-8
    C#, Java backends: UTF-16

(It's possible that both the latter may move to UTF-8 in the very long run;
but for the forseeable future UTF-16 needs to remain a possibility.)

I guess we could put some language in there about the preferred encoding
being UTF-8 (which it certainly is).

>> + at noindent
>> +The function types are  @code{(func) = T}, @code{func(T1) = T}, @code{func(T1, T2) = T}, @dots{}
>> +
>> +Higher-order predicate and function types are used to pass procedure addresses
>> +and closures to other predicates and functions.  @xref{Higher-order}.
>
> Why mention "procedure addresses" here?

You would need to travel back to 1995 and ask Fergus!  (I haven't
modified anything here, just shifted it about.)

> They are a special case of closures, and an implementation detail.

This whole description of higher-order types (such as it is) needs
to be replaced.  That's a separate change though.

> And an xref should not be a sentence by itself.

It will exapnd to something like:

     See Chapter 8 [Higher-order], page 60.

which seems fine.  (It's also not the only spot we do this.)


>> - at item The universal type: @code{univ}.
>> + at node The universal type
>> + at subsubsection The universal type
>>   The type @code{univ} is defined in the standard library module @code{univ},
>>   along with the predicates @code{type_to_univ/2} and @code{univ_to_type/2}.
>>   With those predicates, any type can be converted to the universal type
>>   and back again.
>
> "VALUES OF any type can be converted ..."

Done.

>> - at item The ``state-of-the-world'' type: @code{io.state}.
>> + at node The ``state-of-the-world'' type
>> + at subsubsection The ``state-of-the-world'' type
>>   The type @code{io.state} is defined in the standard library module @code{io},
>>   and represents the state of the world.
>> -Predicates which perform I/O are passed the old state of the world
>> -and produce a new state of the world.
>> +Predicates which perform I/O are passed the old state of the world and produce
>> +a new state of the world.
>>   In this way, we can give a declarative semantics to code that performs I/O.
>
> We may want to expand this to say "are passed the last reference to the old state
> of the world, and produce a unique reference to the new state ..."

We can expand it in a separate change.

There's an updated version of the diff below.

Julien.

diff --git a/doc/reference_manual.texi b/doc/reference_manual.texi
index f9862f2..90eadf9 100644
--- a/doc/reference_manual.texi
+++ b/doc/reference_manual.texi
@@ -1999,46 +1999,161 @@ type classes (@pxref{Type classes}), and existentially quantified types
  @node Builtin types
  @section Builtin types

-Certain special types are builtin, or are defined in the Mercury library:
+This section describes the special types that are built into the Mercury
+implementation, or are defined in the standard library.
+
+ at menu
+* Primitive types::
+* Other builtin types::
+ at end menu
+
+ at node Primitive types
+ at subsection Primitive types
+
+There is a special syntax for constants of all primitive types except
+ at code{char}.
+(For @code{char}, the standard syntax suffixes.)
+
+ at menu
+* Signed integer types::
+* Unsigned integer types::
+* Floating-point type::
+* Character type::
+* String type::
+ at end menu
+
+ at node Signed integer types
+ at subsubsection Signed integer types
+There are five primitive signed integer types: @code{int}, @code{int8},
+ at code{int16}, @code{int32} and @code{int64}.
+
+Except for @code{int}, the width in bits of each of these is given by the
+numeric suffix in its name.
+
+The width in bits of @code{int} is implementation defined, but must be at least
+32-bits.
+It must be equal to the width of the type @code{uint}.
+
+All signed integer types use two's-complement representation.
+
+Values of the type @code{int8} must be in the range @math{-128}
+(@math{-(2^{8 - 1})}) to @math{127} (@math{2^{8 - 1} - 1}),
+both inclusive.
+
+Values of the type @code{int16} must be in the range @math{-32768}
+(@math{-(2^{16 - 1})}) to @math{32767} (@math{2^{16 - 1} - 1}),
+both inclusive.
+
+Values of the type @code{int32} must be in the range @math{-2147483648}
+(@math{-(2^{32 - 1})}) to @math{2147483647} (@math{2^{32 - 1} - 1}),
+both inclusive.
+
+Values of the type @code{int64} must be in the range @math{-9223372036854775808}
+(@math{-(2^{64 - 1})}) to @math{9223372036854775807} (@math{2^{64 - 1} - 1}),
+both inclusive.
+
+Values of the type @code{int} must be in the range to @math{-(2^{N - 1})} to
+ at math{2^{N - 1} - 1}, both inclusive; @math{N} being the width of @code{int} in bits.
+
+ at node Unsigned integer types
+ at subsubsection Unsigned integer types
+There are five primitive unsigned integer types: @code{uint}, @code{uint8},
+ at code{uint16}, @code{uint32} and @code{uint64}.
+
+Except for @code{uint}, the width in bits of each of these types is given by
+the numeric suffix in its name.
+
+The width in bits of @code{uint} is implementation defined, but must be at
+least 32-bits.
+It must be equal to the width of the type @code{int}.
+
+Values of the type @code{uint8} must be in the range @math{0} (@math{2^0 - 1})
+to @math{255} (@math{2^8 - 1}), both inclusive.
+
+Values of the type @code{uint16} must be in the range @math{0} (@math{2^0- 1})
+to @math{65535} (@math{2^16 - 1}), both inclusive.
+
+Values of the type @code{uint32} must be in the range @math{0} (@math{2^0 - 1})
+to @math{4294967295} (@math{2^32 - 1}), both inclusive.
+
+Values of the type @code{uint64} must be in the range @math{0} (@math{2^0 - 1})
+to @math{18446744073709551615} (@math{2^64 - 1}), both inclusive.
+
+Values of the type @math{uint} must be in the range @math{0} (@math{2^0 - 1}) to
+ at math{2^N - 1}, both inclusive; @math{N} being the width of @code{uint} in bits.
+
+ at node Floating-point type
+ at subsubsection Floating-point type
+There is one floating-point type: @code{float}.
+
+It is represented using either the 32-bit single-precision IEEE 754 format or
+the 64-bit double-precision IEEE 754 format.
+
+The choice between the two formats is implementation dependent.
+
+ at node Character type
+ at subsubsection Character type
+There is one character type: @code{char}.
+
+Values of this type represent Unicode code points.
+
+ at node String type
+ at subsubsection String type
+There is one string type: @code{string}.
+
+A string is a sequence of characters encoded using either the UTF-8 or UTF-16
+encoding of Unicode.
+
+The choice between the two encodings is implementation dependent.
+
+ at node Other builtin types
+ at subsection Other builtin types
+
+ at menu
+* Predicate and function types::
+* Tuple types::
+* The universal type::
+* The ``state-of-the-world'' type::
+ at end menu
+
+ at node Predicate and function types
+ at subsubsection Predicate and function types
+The predicate types are @code{pred}, @code{pred(T)}, @code{pred(T1, T2)}, @dots{}
+
+ at noindent
+The function types are  @code{(func) = T}, @code{func(T1) = T}, @code{func(T1, T2) = T}, @dots{}
+
+Higher-order predicate and function types are used to pass procedure addresses
+and closures to other predicates and functions.  @xref{Higher-order}.
+
+ at node Tuple types
+ at subsubsection Tuple types
+The tuple types are @code{@{@}}, @code{@{T@}}, @code{@{T1, T2@}}, @dots{}

- at table @asis
- at item Primitive types: @code{char}, @code{int}, @code{int8}, @code{int16},
- at code{int32}, @code{int64}, @code{uint}, @code{uint8}, @code{uint16},
- at code{uint32}, @code{uint64}, @code{float}, @code{string}.
-There is a special syntax for constants for all primitive types except
- at code{char}.  (For @code{char}, the standard syntax suffices.)
-
- at item Predicate types: @code{pred}, @code{pred(T)}, @code{pred(T1, T2)}, @dots{}
- at itemx Function types: @code{(func) = T}, @code{func(T1) = T},
- at itemx @code{func(T1, T2) = T}, @dots{}
-These higher-order function and predicate types are used to pass procedure
-addresses and closures to other predicates.  @xref{Higher-order}.
-
- at item Tuple types: @code{@{@}}, @code{@{T@}}, @code{@{T1, T2@}}, @dots{}.
  A tuple type is equivalent to a discriminated union type
  (@pxref{Discriminated unions}) with declaration
  @example
-:- type @{Arg1, Arg2, @dots{}, ArgN@}
-        --->    @{ @{Arg1, Arg2, @dots{}, ArgN@} @}.
+ :- type @{Arg1, Arg2, @dots{}, ArgN@}
+         --->    @{ @{Arg1, Arg2, @dots{}, ArgN@} @}.
  @end example

- at item The universal type: @code{univ}.
+ at node The universal type
+ at subsubsection The universal type
  The type @code{univ} is defined in the standard library module @code{univ},
  along with the predicates @code{type_to_univ/2} and @code{univ_to_type/2}.
-With those predicates, any type can be converted to the universal type
-and back again.
-The universal type is useful for situations
-where you need heterogeneous collections.
+With those predicates, values of any type can be converted to the universal
+type and back again.
+The universal type is useful for situations where you need heterogeneous
+collections.

- at item The ``state-of-the-world'' type: @code{io.state}.
+ at node The ``state-of-the-world'' type
+ at subsubsection The ``state-of-the-world'' type
  The type @code{io.state} is defined in the standard library module @code{io},
  and represents the state of the world.
-Predicates which perform I/O are passed the old state of the world
-and produce a new state of the world.
+Predicates which perform I/O are passed the old state of the world and produce
+a new state of the world.
  In this way, we can give a declarative semantics to code that performs I/O.

- at end table
-
  @node User-defined types
  @section User-defined types
Previous message: [m-rev.] for review: extend coverage of builtin types in the reference manual
Next message: [m-rev.] for review: extend coverage of builtin types in the reference manual
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list