Google’s AI research division DeepMind has achieved a significant milestone in artificial intelligence by solving complex mathematical problems at the 2024 International Mathematical Olympiad (IMO). Two new AI systems β AlphaProof and AlphaGeometry 2 β tackled problems that have long been considered too abstract for machines, marking a new chapter in AI reasoning capabilities.
What Makes This Breakthrough Different?
Most AI models today work by predicting the next word or token based on patterns found in large datasets. This approach works well for language tasks like summarising text or answering general questions. However, abstract mathematical reasoning requires a different kind of thinking β one that involves logical steps, formal proofs, and structured problem-solving.
This gap between language prediction and genuine reasoning has been one of the biggest challenges in AI development. Google DeepMind’s latest results suggest that this gap is beginning to close.
AlphaProof and AlphaGeometry 2: How They Work
DeepMind introduced two distinct AI systems for this challenge:
- AlphaProof β This system was built by combining Google’s Gemini language model with AlphaZero, the AI that previously defeated human champions in chess and Go. AlphaProof focuses on formal mathematical proofs and demonstrated strong logical reasoning skills.
- AlphaGeometry 2 β This system specialises in geometry problems and works alongside AlphaProof to handle a broader range of mathematical challenges.
Together, these two systems solved four out of six problems presented at the 2024 IMO β a competition widely regarded as one of the most prestigious mathematics contests for high school students globally. This is the highest score ever achieved by an AI system in the history of the Olympiad.
Performance at the 2024 International Math Olympiad
The results from the 2024 IMO test were impressive, though not without limitations. Here is a quick look at how the AI systems performed:
| Metric | Details |
|---|---|
| Problems Solved | 4 out of 6 |
| Fastest Solution | A few minutes |
| Longest Time Taken | Up to three days |
| Hardest Problem Solved | Only 5 out of 600+ human participants solved it |
| AI Record | Best-ever AI performance at IMO |
It is worth noting that some problems took the AI up to three days to solve, which is longer than the time allowed in the actual competition. Google acknowledged this limitation but emphasised that the overall performance still represents a historic achievement for AI in mathematical reasoning.
AlphaProof’s Most Impressive Feat
Among the six problems, AlphaProof successfully solved the hardest one β a problem that only five out of more than 600 human participants managed to crack. This result highlights the system’s ability to go beyond surface-level pattern matching and engage in deep, structured reasoning.
The combination of Gemini’s language understanding with AlphaZero’s strategic thinking appears to be a key factor behind AlphaProof’s success. AlphaZero had previously shown that AI can master complex rule-based games through self-play and reinforcement learning. Applying a similar approach to formal mathematics seems to have paid off.
The Broader Race for AI Reasoning
Google DeepMind is not alone in pursuing advanced AI reasoning. In early July, Reuters reported that Microsoft-backed OpenAI was also working on a reasoning-focused AI project known as Strawberry, previously referred to internally as Q*. The project has attracted significant attention for its potential to push AI capabilities further.
However, Strawberry has also sparked debate. Some OpenAI researchers reportedly raised concerns with the company’s board about potential unforeseen risks associated with the technology’s impact on humanity. This reflects a wider conversation happening across the AI industry about how to balance rapid progress with responsible development.
Key points shaping the AI reasoning landscape right now include:
- Multiple major technology companies are investing heavily in reasoning-capable AI systems.
- Mathematical problem-solving is seen as a key benchmark for measuring genuine AI intelligence.
- Safety and ethical concerns are growing alongside technical advancements.
- The education sector and scientific research communities are closely watching these developments.
What This Means Going Forward
The progress made by AlphaProof and AlphaGeometry 2 signals that AI systems are moving closer to handling tasks that require genuine logical thinking rather than just pattern recognition. This has wide-ranging implications β from scientific research and engineering to education and problem-solving in fields like medicine and economics.
At the same time, questions remain about how these systems will be used, who will have access to them, and what safeguards need to be in place. As companies like Google push the boundaries of what AI can do, the conversation around responsible AI development becomes more important than ever.
The 2024 IMO results are a clear signal that AI reasoning is advancing fast. Whether this progress leads to broadly beneficial outcomes will depend as much on policy and ethics as on the technology itself.
Frequently Asked Questions
AlphaProof and AlphaGeometry 2 are two AI systems developed by Google DeepMind. AlphaProof combines Google's Gemini language model with AlphaZero to solve formal mathematical proofs, while AlphaGeometry 2 specialises in geometry problems. Together, they solved four out of six problems at the 2024 International Mathematical Olympiad.
Google DeepMind's AI systems solved four out of six problems at the 2024 IMO, including the hardest problem that only five out of more than 600 human participants managed to solve. This is the best performance ever recorded by an AI system at the International Mathematical Olympiad.
OpenAI's Strawberry, previously known internally as Q*, is a project focused on developing AI systems with advanced reasoning capabilities. Like Google DeepMind's work, it aims to push AI beyond simple pattern recognition into deeper logical thinking. However, some OpenAI researchers have raised concerns about potential risks associated with the technology.