[m-dev.] for discussion: design issue for new integer types
Sebastian Godelet
sebastian.godelet at outlook.com
Fri Oct 28 14:25:31 AEDT 2016
Hi Zoltan,
> From: developers [mailto:developers-bounces at lists.mercurylang.org] On
> Behalf Of Zoltan Somogyi
> Sent: Friday, October 28, 2016 11:06
> To: Julien Fischer <jfischer at opturion.com>
> Cc: developers <developers at lists.mercurylang.org>
> Subject: Re: [m-dev.] for discussion: design issue for new integer types
>
>
>
> On Fri, 28 Oct 2016 11:51:51 +1100 (AEDT), Julien Fischer
> <jfischer at opturion.com> wrote:
> > Just a reminder: please ensure you have updated to rotd-2016-10-26 by
> > Monday; I will be commiting part 2 of the uint change then.
>
> I have updated.
>
> > In order to not overload the current type checker, literals for each
> > integer type will need to be lexically distinct. My suggestion is
> > that each integer type have a distinguishing suffix. For the fixed
> > size integer types these would be:
> >
> > i8, i16, i32, i64
> > u8, u16, u32, u64
>
> I think those are fine.
>
> > For int and uint, there are couple of choices:
> >
> > i, u
> > iw, uw (where w == "word sized")
>
> I prefer i and u.
>
> > The suffix would not be required literals of type 'int'.
>
> I don't think anyone would argue against that.
>
> > Should the suffixes use only lowercase 'u' and 'i' or would uppercase
> > also be acceptable? Or uppercase only?
>
> I would strongly argue against allowing I as a suffix. I am probably the only
> one on this list who is old enough to have used a typewriter whose numbers
> did not include 0 and 1: you were supposed to type 'o' instead of '0', and 'l'
> instead of '1'. This works for human readers precisely because an 'l' is easily
> visually confusable with '1'. For them, this fact allowed them to save the cost
> of a key. For us, the fact that an uppercase i is also visually confusable with 1
> would be a fruitful source of confusing bugs, where people *think* they are
> looking at 421, but are actually looking at 42 with an I suffix.
>
> On the basis of symmetry, this would argue against U as well.
I know breaking symmetry is not always good but in this case I think that using only uppercase "L" and not allowing "uppercase i" or "lowercase L" would work better than not allowing any uppercase suffixes at all.
Additionally maybe "i" shouldn't be used at all since a) it is the default and b) it could be reserved for eventual inclusion of complex number literals
>
> > As an aside: it's long since time we allowed some form of separator
> > between groups of digits in integer (and float) literals. I propose
> > that we allow '_' between digits as in Java and C#.
>
> I agree that is a good idea.
Yes +1 on this.
I'm not sure what others views on C++ having user-defined suffixes are. Can be useful for certain type of code
>
> A followup question: should we require that the _s be where western
> convention dictates the decimal commas should go, i.e. between every third
> digit? I for one would prefer that, but people using the indian number
> system, which puts commas around groups of *two* digits above the
> thousands, would probably prefer that there not be such a rule (look up
> "lakh" or "crore" on wikipedia).
Same is for the Chinese number system, so they might not group integers with 3 digits each, so this should be flexible as well.
>
> We would need to delete the _s at some point anyway. If we do it in the
> compiler, we can make the coding doing the deletion generate a warning if
> the _s are in the "wrong" place, with the notion of "wrong" being selected by
> compiler options such as --warn-misplaced-integer-underscores-
> {western,indian}.
>
> When the compiler finds e.g. an i8 suffix on any integer outside the -128 to
> 127 range, we want to generate an error message anyway.
> The _ check could be done at the same time.
>
> I don't think the *scanner* should do such checks, because it would result in
> suboptimal error messages. The compiler has access to error_util.m; the
> library does not.
>
> > 2. Automatic coercion and promotion.
> >
> > There won't be any in Mercury. If you are converting between integer
> > types then you will be required to say so.
>
> Agreed.
>
> What form would those explicit coercions take? Would we have a specific
> function for each pair of integer types? How about e.g. i16 to float: would
> you have to convert the i16 to int first?
>
> > 3. Representation of new integer types in the term type.
> >
> > How should the new new integer types be represented in the
> > term.const/0 type?
> >
> > The obvious way would be:
> >
> > :- type const
> > ---> atom(string)
> > ; integer(int)
> > ; big_integer(integer_base, integer)
> > % An integer that is too big for `int'.
> >
> > ; unsigned_integer(uint)
> > ; big_unsigned_integer(integer_base, integer).
> > % An unsigned integer that is too big for `uint'.
> >
> > ; string(string)
> > ; float(float)
> > ; implementation_defined(string)
> >
> > ; uint8(uint8)
> > ; uint16(uint16)
> > ; uint32(uint32)
> > ; uint64(uint64)
> > ; int8(int8)
> > ; int16(int16)
> > ; int32(int32)
> > ; int64(int64).
>
> I would instead suggest that we keep just the existing integer and
> big_integer functors, and add a new argument to both.
> This argument would say int vs uint, and 8 vs 16 vs 32 vs 64 vs default size,
> *purely on the basis of the suffix, without any check in the scanner*, for
> reason given above.
>
> To allow the underscore check mentioned above, the existing argument of
> the integer and big_integer functors would need to be a string, with the
> conversion done in the compiler. However, doing that would erase the need
> for the big_integer functor, since the integer functor would then be able to
> represent everything it can.
>
> Two other things. First, some people may be using the library's lexer and
> parser modules for their own purposes (e.g. Prolog interpreters), so if we
> change their basic representation, we should add their old versions to e.g.
> extras under names such as old_{lexer,parser}.m.
> Second, I have a big outstanding change to fact_table.m that would be
> affected by a change to the term type, so please warn be before committing
> such a change.
>
> A question you did not ask was how the representation of integers should
> change in the HLDS, i.e. in the cons_id type. I think I would prefer adding a
> size argument to the int_const and uint_const functors to adding a new
> int8_const, int16_const etc functors to the type, because most code would
> want to treat all integers the same regardless of size. I would even prefer to
> erase the distinction between int_const and uint_const, but realize that this
> cannot be done, because in the HLDS, we definitely want the constant in
> integer, not string, form, and there is no word sized type that can hold both
> all ints and all uints. However, we could switch to int_const(integer,
> signedness, maybe(size)).
>
> The checks I mentioned above (does e.g. a i8 fit in -128 to 127, are the _s in
> the right place) would naturally fit in the code (in superhomogeneous.m, I
> think) that converts from term consts to cons_ids.
>
> > 4. poly_type and format.
> > 5. Reverse modes of arithmetic operations.
>
> I will comment on these later.
>
> Zoltan.
>
>
> _______________________________________________
> developers mailing list
> developers at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/developers
More information about the developers
mailing list