@PietroMonticone: AI is increasingly changing ho...
@PietroMonticone
8 views
Apr 07, 2026
Advertisement
2
Let me tell you what happened and why I think it nicely illustrates how formal verification could significantly mitigate the risk of informal hallucination, whether from human or non-human agents.
5
Prompted by Yixin He, Yanyang Li, and Quanyu Tang, GPT 5.4 Pro was able to rediscover the Erdős–Selfridge upper bound f(m) ≤ 2⌈√m⌉.
Inspired by this, Quanyu kept conversing with GPT 5.4 Pro, which claimed it could prove the exact formula f(m) = ⌈2√m⌉ for all m ≥ 4.
Inspired by this, Quanyu kept conversing with GPT 5.4 Pro, which claimed it could prove the exact formula f(m) = ⌈2√m⌉ for all m ≥ 4.
7
As is becoming standard practice, Wouter van Doorn submitted the informal proof to Aristotle to get it autoformalised in @leanprover and Aristotle succeeded.
So far, nothing seemed out of the ordinary.
So far, nothing seemed out of the ordinary.
8
However, while writing up the paper, the authors noticed that one step in the original informal argument was actually wrong.
You might think this should not be possible. After all, Aristotle had already formalised the proof. So it has to be correct, right?
Well, yes... and no!
You might think this should not be possible. After all, Aristotle had already formalised the proof. So it has to be correct, right?
Well, yes... and no!
10
To understand why a wrong definition of φ still led to a verified @leanprover proof, the authors had to closely look at the formalisation.
They discovered that, in fact, Aristotle had not used GPT's proposed φ in this case.
They discovered that, in fact, Aristotle had not used GPT's proposed φ in this case.
12
Aristotle did not merely check the proof.
It autonomously discovered a fix for a subtle edge case that expert human reviewers had skipped trusting that Aristotle's formalisation would take care of it.
That trust turned out to be well placed!
It autonomously discovered a fix for a subtle edge case that expert human reviewers had skipped trusting that Aristotle's formalisation would take care of it.
That trust turned out to be well placed!
13
A thread like this one can only scratch the surface, but if you'd like to dive deeper, here are the two main references:
1. Informal @arxiv preprint: arxiv.org/abs/2603.28636
2. Formal @leanprover script: live.lean-lang.org/#project=mathl…
1. Informal @arxiv preprint: arxiv.org/abs/2603.28636
2. Formal @leanprover script: live.lean-lang.org/#project=mathl…
14
I also tend to see this story as a nice, concrete instance of @davidbessis's "System 3": the productive friction between informal intuition and formal rigour, except externalised across AI models rather than confined to a single mathematician's mind. davidbessis.substack.com/p/thinking-fas…







