Visualize Thread by @PietroMonticone

✨ Visual Editor

palette Canvas & Background

Presets

Custom Colors

Gradient:arrow_forward

Text Color:

Gradient Angle135°

Background Pattern

Grain Texture

Aspect Ratio

style Card Style

Preset

Padding40px

Card Radius16px

Enable Card Shadow

Glassmorphism Effect

Show Watermark AGENCY

Show Timestamps

Show X Logo

text_fields Typography

Font Family

Font Size16px

Pietro Monticone

@PietroMonticone

AI is increasingly changing how we do mathematics.

Erdős Problem #650, open for over 60 years, was solved a few weeks ago through a collaboration between human mathematicians, an informal reasoning model (GPT 5.4 Pro @OpenAI) and a formal one (Aristotle @HarmonicMath). 🧵

Pietro Monticone

@PietroMonticone

Let me tell you what happened and why I think it nicely illustrates how formal verification could significantly mitigate the risk of informal hallucination, whether from human or non-human agents.

Pietro Monticone

@PietroMonticone

Fix a positive integer m. Given any set A of m positive integers and any interval I of length 2·max(A), how many disjoint pairs (a,b) with a in A, b in I and a dividing b can you always find?

Call this f(m). Erdős asked: what is f(m)?

Pietro Monticone

@PietroMonticone

In 1959, Erdős and Surányi proved f(m) ≥ √m.

About 20 years later, Erdős and Selfridge showed f(m) ≤ 2⌈√m⌉. Two bounds differing by a factor of two.

In 1995, Erdős asked whether the lower bound could be improved.

Nothing of note was proved, until this year.

Pietro Monticone

@PietroMonticone

Prompted by Yixin He, Yanyang Li, and Quanyu Tang, GPT 5.4 Pro was able to rediscover the Erdős–Selfridge upper bound f(m) ≤ 2⌈√m⌉.

Inspired by this, Quanyu kept conversing with GPT 5.4 Pro, which claimed it could prove the exact formula f(m) = ⌈2√m⌉ for all m ≥ 4.

Pietro Monticone

@PietroMonticone

Thomas Bloom (@thomasfbloom) and Terence Tao reviewed the proof. They checked the main ideas, and provisionally concluded that the proof seemed to check out.

The final result: f(m) = min(m, ⌈2√m⌉) for all m.

Pietro Monticone

@PietroMonticone

As is becoming standard practice, Wouter van Doorn submitted the informal proof to Aristotle to get it autoformalised in @leanprover and Aristotle succeeded.

So far, nothing seemed out of the ordinary.

Pietro Monticone

@PietroMonticone

However, while writing up the paper, the authors noticed that one step in the original informal argument was actually wrong.

You might think this should not be possible. After all, Aristotle had already formalised the proof. So it has to be correct, right?

Well, yes... and no!

Pietro Monticone

@PietroMonticone

Very roughly speaking, the proof splits I into halves Γ₋(S), Γ₊(S) and needs |S| ≤ |Γ₋(S)|·|Γ₊(S)|.

The natural injection φ(a) = (uₐ, uₐ+a) works when aₘ doesn't divide x, but when aₘ divides x it may not even be well-defined and naive fixes break injectivity.

Pietro Monticone

@PietroMonticone

To understand why a wrong definition of φ still led to a verified @leanprover proof, the authors had to closely look at the formalisation.

They discovered that, in fact, Aristotle had not used GPT's proposed φ in this case.

Pietro Monticone

@PietroMonticone

Aristotle had independently come up with a different function, a three-branch injection that handles all the edge cases, and used it in its formalisation.

Before anyone even realised there was an issue, Aristotle had already spotted the mistake and quietly taken care of it.

Pietro Monticone

@PietroMonticone

Aristotle did not merely check the proof.

It autonomously discovered a fix for a subtle edge case that expert human reviewers had skipped trusting that Aristotle's formalisation would take care of it.

That trust turned out to be well placed!

Pietro Monticone

@PietroMonticone

A thread like this one can only scratch the surface, but if you'd like to dive deeper, here are the two main references:

1. Informal @arxiv preprint: arxiv.org/abs/2603.28636

2. Formal @leanprover script: live.lean-lang.org/#project=mathl…

Pietro Monticone

@PietroMonticone

I also tend to see this story as a nice, concrete instance of @davidbessis's "System 3": the productive friction between informal intuition and formal rigour, except externalised across AI models rather than confined to a single mathematician's mind. davidbessis.substack.com/p/thinking-fas…

Generated by Thread Navigator

100%

view_carousel Carousel Studio NEW

Press ⌘ + S to quick-export