[m-rev.] request for opinions: string.format_table_max output

Mon May 22 17:15:07 AEST 2023

Hi Zoltan,

On Sun, 21 May 2023, Zoltan Somogyi wrote:

> This diff adds the first test in the test suite of format_table_max.
> For input for which format_table output
>
> aaa|áaȧ|1111111|¹½⅓¼⅕⅙⅛|  1,300,000.00
>
> format_table_max output
>
> aaa|áaȧ|1111111|
>              ¹½⅓¼⅕⅙⅛|
>                    1,300,000.00
>
> It is probably not visible in this email (but is visible in the diff)
> that each table column here does start at the correct offset from
> the start of the line, which is what the current code is trying to do.
> My question is: *should* it try to do this? I can see two alternatives
> that could be better in some respects.
>
> A Don't split lines even if some columns are over their width limit.
>  Instead, just let such fields take more than their usual maximum width,
>  resulting in the same output as if somedid JJ in vim on the above
>  three-line output.

Doesn't that defeat the point of having a limit in the first place?

>  Plus: each conceptual line gets output on one actual line. Minuses:
>  an overlong column will move later columns to the right, and the
>  is no way enforce an overall line limit.
>
> B If *any* column is overlong, then take as many characters from each column
>  as that column's limit allows, and construct an output line as usual from that,
>  then repeat the process with any leftover characters as needed.
>
>  Pluses: fully enforces column width limits; puts initial parts of each column
>  on same line. Minus: breaks the text of overlong columns, possibly at points
>  that are suboptimal for understanding.
>
> Should we switch to either option A or B? Alternatively, should we have
> different functions for A, B and the status quo?

The only current user of format_table_max within the Mercury system
appears to be the mslice tool (via mdbcomp/slice_and_dice.m). While
it's not particularly clear, I think the intention was to truncate
overlong columns. (That's a bit tricky in the presence of full Unicode,
since you don't want to put a split between a combining character and
what it applies to.)

I agree with Peter, it's not really something that belongs in the
standard library since doing it properly requires (1) more Unicode
support than the stdlib currently has and (2) knowledge of the font
being used.  (The world was a bit more ASCII when Ian originally wrote
all that.)

One comment on the diff:

> Base string.format_table{,_max} on common code.
> 
> library/string.m:
>     Even though format_table_max is a minor tweak on format_table,
>     its implementation used to be completely separate. Act on an old XXX
>     and make format_table use the same primitive ops as format_table_max.
>
>     Document the operation of format_table a bit better.
>
>     Add an XXX about a problem with the current code of format_table_max
>     (which does not affect format_table).
>
>     Use predmode decls when possible.
> 
> tests/general/string_test.{m,exp}:
>     Add a test of format_table_max, which previously did not have one.

> diff --git a/library/string.m b/library/string.m
> index e60ca4287..83f4c8248 100644
> --- a/library/string.m
> +++ b/library/string.m

...

> @@ -1416,17 +1413,34 @@
>      % format_table(Columns, Separator) = Table:
>      %
>      % This function takes a list of columns and a column separator,
> -    % and returns a formatted table, where each field in each column
> -    % has been aligned and fields are separated with Separator.
> -    % There will be a newline character between each pair of rows.
> -    % Throws an exception if the columns are not all the same length.
> -    % Lengths are currently measured in terms of code points.
> +    % and returns a formatted table, where
>      %
> -    % For example:
> +    % - the Nth line contains the Nth string in each column;

s/Nth/N'th/ (which is what we use almost everywhere else in the library documentation).

Julien.