#News

New Anthropic's Opus 4.5 Beats Human Engineers at Their Own Test

Date: November 25, 2025

New model achieves 80% on coding benchmarks while slashing prices, with Chrome and Excel integrations now available.

Anthropic unveiled Claude Opus 4.5 on Monday, marking the third major model release from the AI startup in just two months and intensifying competition in the rapidly evolving artificial intelligence market.

The San Francisco-based company positions Opus 4.5 as "the best model in the world for coding, agents, and computer use," according to its announcement. The launch comes at a pivotal moment, following Google's Gemini 3 release last week and OpenAI's GPT-5.1 earlier this month, highlighting the breakneck pace of AI development.

Breaking New Ground in Coding Benchmarks

Claude Opus 4.5 achieved a significant milestone by becoming the first model to score over 80 percent on SWE-Bench Verified, a respected coding benchmark that measures software engineering capabilities. The model reached 80.9% accuracy on the test, surpassing both Google's Gemini 3 Pro and OpenAI's GPT-5.1, according to Anthropic's internal testing.

The company put its model through an unusual real-world test: Anthropic tested Claude Opus 4.5 on a difficult take-home exam given to prospective performance engineers, and the model scored higher than any human candidate ever had.

Early testers have praised the model's ability to handle complex, multi-system problems. Testers noted that Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding, and when pointed at a complex, multi-system bug, it figures out the fix, Anthropic stated.

Enhanced Memory and Productivity Features

A key innovation in Opus 4.5 involves improved memory management for long-context operations. "Knowing the right details to remember is really important in complement to just having a longer context window," Dianne Na Penn, Anthropic's head of product management for research, told TechCrunch.

This enhancement enabled a long-requested "endless chat" feature for paid users, allowing conversations to continue uninterrupted when the model reaches its context limit. The system automatically compresses context memory without notifying users.

The memory improvements prove particularly valuable for agentic use cases, where Opus might command multiple sub-agents. "Claude needs to be able to explore code bases and large documents, and also know when to backtrack and recheck something," Penn explained.

Chrome and Excel Integrations Expand

Alongside the model release, Anthropic expanded access to two key productivity tools. Claude for Chrome, a browser extension that enables Claude to take action across browser tabs, is now available to all Max users. Previously limited to pilot programs, the extension allows the AI to handle tasks spanning multiple web pages.

Claude for Excel, which assists with spreadsheet tasks, pivot tables, and chart generation, expanded beta access to Max, Team, and Enterprise users. Microsoft highlighted that the model delivers improvements in creating spreadsheets, presentations, and documents with consistency, professional polish, and genuine domain awareness, according to the Microsoft Azure blog.

Aggressive Pricing Strategy

Perhaps most striking is Anthropic's pricing approach. The company slashed Opus-tier pricing to $5 per million input tokens and $25 per million output tokens—approximately two-thirds cheaper than its predecessor, Opus 4.1, which cost $15/$75. While still more expensive than GPT-5.1 ($1.25/$10) and Gemini 3 Pro ($2-4/$12-18), the reduction makes Opus-level capabilities more accessible.

GitHub noted that in early testing, Claude Opus 4.5 surpassed internal coding benchmarks while cutting token usage in half, potentially offsetting some of the cost difference through improved efficiency.

By Arpit Dubey

Arpit is a dreamer, wanderer, and tech nerd who loves to jot down tech musings and updates. With a knack for crafting compelling narratives, Arpit has a sharp specialization in everything: from Predictive Analytics to Game Development, along with artificial intelligence (AI), Cloud Computing, IoT, and let’s not forget SaaS, healthcare, and more. Arpit crafts content that’s as strategic as it is compelling. With a Logician's mind, he is always chasing sunrises and tech advancements while secretly preparing for the robot uprising.

// Recommended

Pinterest Follows Amazon in Layoffs Trend, Shares Fall by 9%

AI-driven restructuring fuels Pinterest layoffs, mirroring Amazon’s strategy, as investors react sharply and question short-term growth and advertising momentum.

Clawdbot Rebrands to "Moltbot" After Anthropic Trademark Pressure: The Viral AI Agent That’s Selling Mac Minis

Clawdbot is now Moltbot. The open-source AI agent was renamed after Anthropic cited trademark concerns regarding its similarity to their Claude models.

Amazon Bungles 'Project Dawn' Layoff Launch With Premature Internal Email Leak

"Project Dawn" leaks trigger widespread panic as an accidental email leaves thousands of Amazon employees bracing for a corporate cull.

OpenAI Launches Prism, an AI-Native Workspace to Shake Up Scientific Research

Prism transforms the scientific workflow by automating LaTeX, citing literature, and turning raw research into publication-ready papers with GPT-5.2 precision.

Have newsworthy information in tech we can share with our community?