[m-dev.] for discussion: design issue for new integer types

Peter Wang novalazy at gmail.com
Fri Oct 28 15:29:38 AEDT 2016


On Fri, 28 Oct 2016 14:06:00 +1100 (AEDT), "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> 
> 
> On Fri, 28 Oct 2016 11:51:51 +1100 (AEDT), Julien Fischer <jfischer at opturion.com> wrote:
> > Just a reminder: please ensure you have updated to rotd-2016-10-26
> > by Monday; I will be commiting part 2 of the uint change then.
> 
> I have updated.
>  
> > In order to not overload the current type checker, literals for each integer
> > type will need to be lexically distinct.  My suggestion is that each integer
> > type have a distinguishing suffix.  For the fixed size integer types these
> > would be:
> > 
> >        i8, i16, i32, i64
> >        u8, u16, u32, u64
> 
> I think those are fine.
>  
> > For int and uint, there are couple of choices:
> > 
> >        i, u
> >        iw, uw     (where w == "word sized")
> 
> I prefer i and u.
>  

As do I.

> > The suffix would not be required literals of type 'int'.
> 
> I don't think anyone would argue against that.
>  
> > Should the suffixes use only lowercase 'u' and 'i' or would uppercase
> > also be acceptable?  Or uppercase only?
> 
> I would strongly argue against allowing I as a suffix. I am probably
> the only one on this list who is old enough to have used a typewriter
> whose numbers did not include 0 and 1: you were supposed to type
> 'o' instead of '0', and 'l' instead of '1'. This works for human readers
> precisely because an 'l' is easily visually confusable with '1'. For them,
> this fact allowed them to save the cost of a key. For us, the fact that
> an uppercase i is also visually confusable with 1 would be
> a fruitful source of confusing bugs, where people *think* they are
> looking at 421, but are actually looking at 42 with an I suffix.
> 
> On the basis of symmetry, this would argue against U as well.

I am fine with only lowercase.

> > As an aside: it's long since time we allowed some form of separator
> > between groups of digits in integer (and float) literals.  I propose
> > that we allow '_' between digits as in Java and C#.
> 
> I agree that is a good idea.
> 
> A followup question: should we require that the _s be where western
> convention dictates the decimal commas should go, i.e. between
> every third digit? I for one would prefer that, but people using the
> indian number system, which puts commas around groups of *two*
> digits above the thousands, would probably prefer that there
> not be such a rule (look up "lakh" or "crore" on wikipedia).
> 
> We would need to delete the _s at some point anyway. If we do it
> in the compiler, we can make the coding doing the deletion
> generate a warning if the _s are in the "wrong" place, with the
> notion of "wrong" being selected by compiler options such as
> --warn-misplaced-integer-underscores-{western,indian}.
> 
> When the compiler finds e.g. an i8 suffix on any integer outside
> the -128 to 127 range, we want to generate an error message anyway.
> The _ check could be done at the same time.
> 
> I don't think the *scanner* should do such checks, because
> it would result in suboptimal error messages. The compiler
> has access to error_util.m; the library does not.

Underscores will also help the readability of literals in other bases.
For hexadecimal you'd probably want to separate digits into groups of
4 or 8.

When working with some protocol or file format, it may be useful to
group digits according to how information is packed into certain bits
in that format.

You could have different rules depending on the base, or only check
decimal literals.  However, in my experience long integer literals are
rare (say, over five digits long), and most of *those* are hexadecimal.
Therefore, I doubt the value of such a check for catching errors.

Peter


More information about the developers mailing list