GPT-5.5 would show an IQ of 136, Opus 4.7 132. A new site converts AI benchmarks into IQ scores. The problem is that the result doesn’t measure much.
Pasting an IQ score onto an artificial intelligence model is the kind of idea that instantly sparks a reaction: fascinating on paper, dubious as soon as you scratch it. A new site called AI IQ has tackled this by compiling the results of 12 benchmarks publics (ARC-AGI, FrontierMath, GPQA, among others) and converting them into an estimated IQ score distributed across five dimensions: abstraction, mathematical reasoning, programming, critical reasoning and agentic reasoning. The site even offers an emotional intelligence score derived from EQ-Bench 3. As of May 2026, OpenAI’s GPT-5.5 sits at the top with an estimated IQ of 136followed by Anthropic’s Opus 4.7 at 132, Google’s Gemini 3.1 Pro at 131, and GPT-5.4 at 131. On a separate scale, TrackingAI’s based on the Mensa Norway test, Grok-4.20 Expert Mode and GPT-5.4 Pro are tied at 145.
Why IQ does not measure the intelligence of an AI
The most striking graph on the site shows the evolution over time. As of October 2023, GPT-4-turbo displayed an estimated IQ of around 75. Thirty months later, leading models are flirting with 136. Sixty points of progress in two and a half yearsit’s spectacular. Except the compression at the top tells a different story: the top five models are separated by just 7 points (129 to 136 on AI IQ, 141 to 145 on Mensa).
The fundamental problem is not in the numbers, it is in what we claim to measure. Researcher Alan D. Thompson, who has worked on cognitive assessment of AIs since 2021 and documented the limitations of the exercise in detail, identifies four pitfalls that the AI IQ site does not address. First of all, IQ tests were designed for human cognitionand their scales become blurry as soon as we apply them to non-human intelligence. Then, these tests are standardized on average human populations, which makes the interpretation of extreme scores (beyond approximately 155) statistically unreliable, even between humans. Third pitfall: artificial intelligence is fundamentally different from human intelligence (a model can solve an advanced mathematics problem and fail on a common sense task that a six-year-old has mastered). Last point: AI IQ does not administer tests. The site compiles results from existing benchmarks and translates them into IQ scores via an in-house algorithm, which amounts to converting kilometers to degrees Celsius: the operation is technically feasible, but the result does not mean what the unit promises.
The article from VentureBeat who popularized the site himself recognizes this: each supplier publishes its own benchmarks, often selected to highlight its strengths, creating a “Tower of Babel where no one measures the same thing in the same way”. And the most demanding benchmarks (ARC-AGI-2, FrontierMath Tier 4, Humanity’s Last Exam) are already starting to saturate, meaning the measurement cap is approaching faster than the capacity cap.
Ranking AI on the human IQ scale has the merit of making progress tangible for the general public. But confusing a score derived from benchmarks with a measure of intelligence is like taking the thermometer for a fever.
👉🏻 Follow tech news in real time: add 01net to your sources on Google, and subscribe to our WhatsApp channel.
Source :
AI IQ
