We live in a world where AI companies like OpenAI and Google are constantly looking for new ways to pit their AI models against each other. One of the most recent attempts to measure how top AI models stack up put eight large language models in the same chess competition. The outcome? OpenAI’s o3 model emerged completely undefeated — even going so far as to defeat Grok 4 in the final showdown. And while many have taken to reporting on this success, including the BBC’s report, I couldn’t help but ask myself a single question when reading about it: Why do I care?
The reason that it makes sense to question the point of this competition is because computer companies have long used chess as a way to assess the abilities and progress of machine learning models. They’ve done this so frequently that most modern chess machines remain unbeatable when facing humans. But chess isn’t real life.
Considering the world is full of posts on newsfeeds about how far AI has come, maybe it feels a bit weird for me to sit here and essentially trash the fact that Grok lost to OpenAI in chess. And while it is easy to take Elon Musk’s X post about the outcome as his typical ego-driven response, his comments about Grok’s success in chess just being a side effect are actually spot-on.
Chess mastery isn’t the point of AI
OpenAI, Gemini, and Anthropic — all of the various AI companies pumping out new models every few months — aren’t building their AI to play chess. They’re building them to automate routine tasks that you and I do every day as part of our jobs and livelihoods. Some people even see AI as a replacement for human-driven work. As such, the performance of a couple of AI models in chess isn’t really going to equate to much when you start looking at how the world sees these machines.
OpenAI’s o3 model can seemingly play chess better than any human. Meanwhile, it can’t write the next great American novel. It can’t, no matter what Sam Altman says. And while the release of newer models like GPT-5 are meant to improve that kind of functionality, you still lose out on the human touch and emotion that comes with those activities. Plus, we still have to worry about other problems that aren’t as big of a deal in chess, like AI hallucinations and chatbots negatively impacting the mental health of users.
Ultimately, how GPT, Gemini, Grok, or any other AI model performed during a few games of chess means nothing to the long-term goals that AI companies have. Sure, these victories make for catchy headlines, but they don’t actually answer the real questions surrounding AI — like how much risk AI brings to humanity, and which jobs AI will replace.