[m-users.] (cough) "AI"
Richard O'Keefe
raoknz at gmail.com
Sun Oct 26 23:30:56 AEDT 2025
I have tried AI code generators with several problems and several
programming languages.
Perhaps the clearest experiment was "implement the complementary error
function in Fortran
taking particular care to make the tail accurate" where the result was
good Fortran but it
effectively did
y = the error function of x
return 1 - y
which of course flagrantly violates the "make the tail accurate" part.
I have now done over a dozen experiments and EVERY SINGLE ONE resulted in code
which at best contained subtle errors and in one case was in entirely
the wrong programming
language.
I then thought, "perhaps I could use it for testing" and asked for
some test case generation.
You wouldn't believe how many iterations I had to go through (OK, so
it was six) to force the
wretched thing to generate tests that were actually legal tests
according to the prompt. And
then the couple of dozen tests I get were all essentially the same test.
To be honest, I was *AMAZED* at how well the AIs I tried did. The
code quality ranged from
poor to shocking, but it *was* (usually) plausible code having *some*
connection with the
requirements. In every case it was more effort to fix the
AI-generated code than the write it
from scratch.
What's really important here is that the AI programs did much better with
- popular programming languages
- standard algorithms
- no floating-point calculations.
Basically, these programs work by training from large collections of
existing code, and if
they don't *have* large collections they go wonky.
Here's an example. I gave the prompt
"write a procedure in Unisys Extended Algol to merge two sorted arrays
of DOUBLE into a third array."
Note that Extended Algol is still in use; this is not a dead language,
just an unpopular one.
The response started out
PROCEDURE MERGE_SORTED_ARRAYS(A, B, C, NA, NB);
VALUE NA, NB;
INTEGER NA, NB;
DOUBLE ARRAY A[1:NA], B[1:NB], C[1:NA+NB];
which looks quite plausible, but happens to be illegal.
Algol 60 didn't let you say anything about array bounds (or even the
number of subscripts) in the
parameters of a procedure. Extended Algol requires you to specify the
lower bounds to be used
(with * meaning use whatever the caller wants) but did not and does
not allow you to specify the
upper bounds. So correct code would be
DOUBLE ARRAY A[1], B[1], C[1];
And then we hit the point that the manual (yes of course I have a
manual!) strongly recommends
using a lower bound of 0 for efficiency...
I hadn't previously tried with Mercury. I was beginning to be optimistic when
an exercise from the Programming Praxis site produced a subtly wrong answer.
The problem is to find out whether a list of integers has two elements
that sum to 0.
has_zero_sum_pair(List) :-
list.member(X, List),
list.member(Y, List),
X + Y = 0,
X \= Y.
The subtle error is that [0,0] DOES contain two elements that sum to 0 but
this code says it DOESN'T.
Of course there's the efficiency issue that this takes quadratic time,
while O(N.log N) is doable and even better is possible.
So you WILL have to scrutinize Mercury code generated by an AI with great care
and you had BETTER have some good test cases ready.
On Sun, 26 Oct 2025 at 09:32, Tomas By <tomas at basun.net> wrote:
>
> Hi all,
>
> Just read this on Slashdot:
>
> | I'm a programmer who started out hesitant about AI, and at first I
> | thought all that it could do was auto-complete better.
> | Then I tried Claude Code, and it really is like having your own
> | personal junior dev assisting you're every need. Like a junior, it
> | makes mistakes, but using the *massive* amount of good code that it
> | creates, and fixing what's left, is so much faster than writing it
> | all from scratch yourself.
> https://slashdot.org/story/25/10/25/0324244/meet-the-people-who-dare-to-say-no-to-ai
>
> and wonder if anybody has tried this with Mercury?
>
> I suspect that (1) this person uses C[*]/Java, and (2) the usefulness
> of this "AI" stuff for Mercury will be proportionally less in a
> similar magnitude as code length for same functionality, ie a factor
> ten or so.
>
> Anybody has any experiences?
>
> /Tomas
> _______________________________________________
> users mailing list
> users at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/users
More information about the users
mailing list