The competition between OpenAI and other companies in the field of artificial intelligence (IA) is an all-out war. The launch of ChatGPT in November 2022 was a turning point in this industry, which has become much more competitive. Since then, Google has been trying to regain its leadership, but the firm led by Sam Altman has not given it a break.
In recent years we have seen how the search giant and the Microsoft-backed startup have measured their strength launch after launch. This week Google presented the “Gemini 2.0 Flash Thinking Mode”, a bet that finally seemed to be on par with OpenAI’s o1 model. Well, the new o3 and o3 mini models from OpenAI have just appeared on the scene.
OpenAI presents its new reasoning models
The latest from the creators of ChatGPT is capable of offering a more advanced level of reasoning than the initial version. Like the o1 model that we met in September of this year, the new model will spend some time “thinking” the answer. It will not be as fast as the GPT versions, but its advantage is that it will be able to solve more complex problems in several steps.
Certainly, reasoning models are ideal for everything. In fact, the field of AI is growing so much that there are use case-oriented alternatives. For example, if we were looking for a quick response model to power a customer service chatbot, we wouldn’t choose o3, but something like GPT-4o mini. If we are looking for precision in physics and mathematics, o3 may be the right choice.
An interesting way to analyze the scope and possibilities of a model is to consider it in the light of benchmarks. During the presentation, OpenAI presented two programming benchmarks. As we can see in the images, o3 improves to o1 by 22.8 percentage points in SWE-Bench Verified. In this benchmark it reaches 71.7 points compared to 84.9 for the previous model.
In Codeforces, o1 achieves a score of 1891 and o3 2727. As we say, these models are useful for many complex tasks. If we focus on mathematics benchmarks, in the American Invitational Mathematics Exam 2024, o1 records a score of 83.3%. o3, for its part, boasts a 96.7%, failing a single question.
It should be noted that the decision to call the model o3 instead of o2 appears to have nothing to do with a leap in its capabilities (or a marketing-motivated move). In fact, according to The Information, it is due to avoiding trademark issues. OpenAI would have decided to skip a number because o2 is a registered trademark of a British telecommunications provider.
We will have to wait to access OpenAI’s new flagship reasoning model, o3, as well as its smaller and faster model, o3 mini. For now, security researchers can sign up for a waiting list to test the model. The firm hopes to launch them to the public later, but it is not clear when they will end up arriving (and if under what subscriptions).
Images | WorldOfSoftware with DALL·E 3
In WorldOfSoftware | Apple cannot offer ChatGPT in China. More than a problem, that is a blessing