Date: December 23, 2024
The last day of the 12-day-long event included announcing OpenAI’s o3 model, which many fans claim to have achieved AGI capabilities.
OpenAI made some impressive new revelations throughout its ‘12 Days of Ship-mas’ event. On the last day, OpenAI made one of its biggest announcements around a new reasoning model that will become the successor of o1 AI models. The newly introduced AI version is called o3, which has aced the ARC challenge, a prestigious AI reasoning test.
“This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models,” said the creator of the ARC challenge, who is also a peer engineer at Google.
Even though o3 achieved a breakthrough score in the ARC challenge, it did not win the competition's grand prize. This means that the o3 model has unlocked a major step-function in increasing AI capabilities for general intelligence frameworks.
When compared to o1, o3 overperforms in several benchmarks while performing tasks like complex coding, solving scientific problems, and advanced mathematical challenges. For now, the newly introduced AI model is being cautiously rolled out for safety testing to researchers. The company has launched two models, o3 and o3-mini, offering variations based on the computing power required.
The o3 AI model scored 71.7% accuracy in the SWE-bench verified test, while o1 scored 48.9% accuracy when given the same computing power. This means that not only is the new AI reasoning model more power efficient than o1, but it also comes with the ability to do more in equivalent computing power to the top AI models present in the market.
Another feather in its cap is the EpochAI Frontier Math benchmark, in which the AI created a world-record high score. The o3 model scored a record-high 25.2% in this test, while the historical performance of any AI model has never crossed even 2%. OpenAI is building the new o3 version with power constraints in mind, and the o3 model is the perfect solution for lower-end task requirements.
By Arpit Dubey
Arpit is a dreamer, wanderer, and tech nerd who loves to jot down tech musings and updates. With a knack for crafting compelling narratives, Arpit has a sharp specialization in everything: from Predictive Analytics to Game Development, along with artificial intelligence (AI), Cloud Computing, IoT, and let’s not forget SaaS, healthcare, and more. Arpit crafts content that’s as strategic as it is compelling. With a Logician's mind, he is always chasing sunrises and tech advancements while secretly preparing for the robot uprising.
OpenAI Is Building an Audio-First AI Model And It Wants to Put It in Your Pocket
New real-time audio model targeted for Q1 2026 alongside consumer device ambitions.
Nvidia in Advanced Talks to Acquire Israel's AI21 Labs for Up to $3 Billion
Deal would mark chipmaker's fourth major Israeli acquisition and signal shifting dynamics in enterprise AI.
Nvidia Finalizes $5 Billion Stake in Intel after FTC approval
The deal marks a significant lifeline for Intel and signals a new era of collaboration between two of America's most powerful chipmakers.
Manus Changed How AI Agents Work. Now It's Coming to 3 Billion Meta Users
The social media giant's purchase of the Singapore-based firm marks its third-largest acquisition ever, as the race for AI dominance intensifies.