@PietroMonticone: AI is increasingly changing ho...

1

AI is increasingly changing how we do mathematics.

Erdős Problem #650, open for over 60 years, was solved a few weeks ago through a collaboration between human mathematicians, an informal reasoning model (GPT 5.4 Pro @OpenAI) and a formal one (Aristotle @HarmonicMath). 🧵

2

Let me tell you what happened and why I think it nicely illustrates how formal verification could significantly mitigate the risk of informal hallucination, whether from human or non-human agents.

3

Fix a positive integer m. Given any set A of m positive integers and any interval I of length 2·max(A), how many disjoint pairs (a,b) with a in A, b in I and a dividing b can you always find?

Call this f(m). Erdős asked: what is f(m)?

4

In 1959, Erdős and Surányi proved f(m) ≥ √m.

About 20 years later, Erdős and Selfridge showed f(m) ≤ 2⌈√m⌉. Two bounds differing by a factor of two.

In 1995, Erdős asked whether the lower bound could be improved.

Nothing of note was proved, until this year.

5

Prompted by Yixin He, Yanyang Li, and Quanyu Tang, GPT 5.4 Pro was able to rediscover the Erdős–Selfridge upper bound f(m) ≤ 2⌈√m⌉.

Inspired by this, Quanyu kept conversing with GPT 5.4 Pro, which claimed it could prove the exact formula f(m) = ⌈2√m⌉ for all m ≥ 4.

6

Thomas Bloom (@thomasfbloom) and Terence Tao reviewed the proof. They checked the main ideas, and provisionally concluded that the proof seemed to check out.

The final result: f(m) = min(m, ⌈2√m⌉) for all m.

7

As is becoming standard practice, Wouter van Doorn submitted the informal proof to Aristotle to get it autoformalised in @leanprover and Aristotle succeeded.

So far, nothing seemed out of the ordinary.

8

However, while writing up the paper, the authors noticed that one step in the original informal argument was actually wrong.

You might think this should not be possible. After all, Aristotle had already formalised the proof. So it has to be correct, right?

Well, yes... and no!

9

Very roughly speaking, the proof splits I into halves Γ₋(S), Γ₊(S) and needs |S| ≤ |Γ₋(S)|·|Γ₊(S)|.

The natural injection φ(a) = (uₐ, uₐ+a) works when aₘ doesn't divide x, but when aₘ divides x it may not even be well-defined and naive fixes break injectivity.

10

To understand why a wrong definition of φ still led to a verified @leanprover proof, the authors had to closely look at the formalisation.

They discovered that, in fact, Aristotle had not used GPT's proposed φ in this case.

11

Aristotle had independently come up with a different function, a three-branch injection that handles all the edge cases, and used it in its formalisation.

Before anyone even realised there was an issue, Aristotle had already spotted the mistake and quietly taken care of it.

12

Aristotle did not merely check the proof.

It autonomously discovered a fix for a subtle edge case that expert human reviewers had skipped trusting that Aristotle's formalisation would take care of it.

That trust turned out to be well placed!

13

A thread like this one can only scratch the surface, but if you'd like to dive deeper, here are the two main references:

1. Informal @arxiv preprint: arxiv.org/abs/2603.28636

2. Formal @leanprover script: live.lean-lang.org/#project=mathl…

14

I also tend to see this story as a nice, concrete instance of @davidbessis's "System 3": the productive friction between informal intuition and formal rigour, except externalised across AI models rather than confined to a single mathematician's mind. davidbessis.substack.com/p/thinking-fas…

@PietroMonticone: AI is increasingly changing ho...

Actions

What You Can Do