[m-users.] (cough) "AI"

Mark Brown mark at mercurylang.org
Thu Oct 30 22:37:17 AEDT 2025


Hi,

I've just been experimenting with Claude Pro (paid subscription),
using variations on this query: "Write a Mercury program that, given a
real value y, finds the smallest non-negative value of x such that y =
x * sin(x). If you use numerical methods, be sure to choose good
initial values."

I've also done this with Rust, C and Fortran, and with varying amounts
of hinting as to initial values. It never got close to a working
design. My conclusion so far is that it knows plenty about programming
techniques and coding style, but at this stage it can't do problem
analysis anywhere near the level required of a CS graduate. Needless
to say, it's not a silver bullet.

After this I tried again, except I guided it as if I was walking a
very bright student through a programming assignment, which I think is
more like how Anthropic intends Claude to be used. It has produced
what looks like an excellent result - I'm honestly mind-blown that it
could do this - though in effect I had to do most of the problem
analysis myself. The fact that it was Mercury does not appear to have
impeded Claude. This is definitely an effective tool for Mercury
development in my view, though of course whether you can leverage this
tool on your particular project is always the question.

On Sun, Oct 26, 2025 at 11:31 PM Richard O'Keefe <raoknz at gmail.com> wrote:
>
> I have tried AI code generators with several problems and several
> programming languages.
> Perhaps the clearest experiment was "implement the complementary error
> function in Fortran
> taking particular care to make the tail accurate" where the result was
> good Fortran but it
> effectively did
>    y = the error function of x
>    return 1 - y
> which of course flagrantly violates the "make the tail accurate" part.
>
> I have now done over a dozen experiments and EVERY SINGLE ONE resulted in code
> which at best contained subtle errors and in one case was in entirely
> the wrong programming
> language.
>
> I then thought, "perhaps I could use it for testing" and asked for
> some test case generation.
> You wouldn't believe how many iterations I had to go through (OK, so
> it was six) to force the
> wretched thing to generate tests that were actually legal tests
> according to the prompt.  And
> then the couple of dozen tests I get were all essentially the same test.
>
> To be honest, I was *AMAZED* at how well the AIs I tried did.   The
> code quality ranged from
> poor to shocking, but it *was* (usually) plausible code having *some*
> connection with the
> requirements.  In every case it was more effort to fix the
> AI-generated code than the write it
> from scratch.
>
> What's really important here is that the AI programs did much better with
> - popular programming languages
> - standard algorithms
> - no floating-point calculations.

In my case, I'm hoping to generate system tests for a HTML->PDF
converter, so the generated code will be HTML/CSS with minimal
javascript. I think this fits squarely into your criteria :-)

The tests will be small, self-contained, and each will contain the
minimal logic required to test some combination of CSS properties. I
figure that if AI code generation can do anything at all for our
project then it ought to be able to do this.

Cheers,
Mark

>
> Basically, these programs work by training from large collections of
> existing code, and if
> they don't *have* large collections they go wonky.
>
> Here's an example.  I gave the prompt
> "write a procedure in Unisys Extended Algol to merge two sorted arrays
> of DOUBLE into a third array."
> Note that Extended Algol is still in use; this is not a dead language,
> just an unpopular one.
> The response started out
>
> PROCEDURE MERGE_SORTED_ARRAYS(A, B, C, NA, NB);
>     VALUE NA, NB;
>     INTEGER NA, NB;
>     DOUBLE ARRAY A[1:NA], B[1:NB], C[1:NA+NB];
>
> which looks quite plausible, but happens to be illegal.
> Algol 60 didn't let you say anything about array bounds (or even the
> number of subscripts) in the
> parameters of a procedure.  Extended Algol requires you to specify the
> lower bounds to be used
> (with * meaning use whatever the caller wants) but did not and does
> not allow you to specify the
> upper bounds.  So correct code would be
>   DOUBLE ARRAY A[1], B[1], C[1];
> And then we hit the point that the manual (yes of course I have a
> manual!) strongly recommends
> using a lower bound of 0 for efficiency...
>
> I hadn't previously tried with Mercury.  I was beginning to be optimistic when
> an exercise from the Programming Praxis site produced a subtly wrong answer.
> The problem is to find out whether a list of integers has two elements
> that sum to 0.
>
> has_zero_sum_pair(List) :-
>     list.member(X, List),
>     list.member(Y, List),
>     X + Y = 0,
>     X \= Y.
>
> The subtle error is that [0,0] DOES contain two elements that sum to 0 but
> this code says it DOESN'T.
>
> Of course there's the efficiency issue that this takes quadratic time,
> while O(N.log N) is doable and even better is possible.
>
> So you WILL have to scrutinize Mercury code generated by an AI with great care
> and you had BETTER have some good test cases ready.
>
> On Sun, 26 Oct 2025 at 09:32, Tomas By <tomas at basun.net> wrote:
> >
> > Hi all,
> >
> > Just read this on Slashdot:
> >
> > | I'm a programmer who started out hesitant about AI, and at first I
> > | thought all that it could do was auto-complete better.
> > | Then I tried Claude Code, and it really is like having your own
> > | personal junior dev assisting you're every need. Like a junior, it
> > | makes mistakes, but using the *massive* amount of good code that it
> > | creates, and fixing what's left, is so much faster than writing it
> > | all from scratch yourself.
> > https://slashdot.org/story/25/10/25/0324244/meet-the-people-who-dare-to-say-no-to-ai
> >
> > and wonder if anybody has tried this with Mercury?
> >
> > I suspect that (1) this person uses C[*]/Java, and (2) the usefulness
> > of this "AI" stuff for Mercury will be proportionally less in a
> > similar magnitude as code length for same functionality, ie a factor
> > ten or so.
> >
> > Anybody has any experiences?
> >
> > /Tomas
> > _______________________________________________
> > users mailing list
> > users at lists.mercurylang.org
> > https://lists.mercurylang.org/listinfo/users
> _______________________________________________
> users mailing list
> users at lists.mercurylang.org
> https://lists.mercurylang.org/listinfo/users


More information about the users mailing list