Maybe AI agents can be lawyers after all | TechCrunch - iTechsNews

Last month, I wrote about Mercor’s new benchmark that measures the capabilities of AI agents in professional tasks such as law and business analytics. At the time, the results were pretty dismal, with every major lab scoring under 25%, so we concluded that lawyers were safe from AI displacement, at least for now.

But AI capabilities can change a lot in a matter of weeks.

This week’s release of Anthropic’s Opus 4.6 rocked the charts, with the new Anthropic scoring just under 30% in one-shot tests and averaging 45% when given a few more cracks at the problem. Notably, the release included a lot of new agent features, including “agent swarms” that could help with this kind of multi-step problem solving.

Regardless, the score is a huge leap from the prior art and a sign that progress on the base models isn’t slowing down. Mercor chief executive Brendan Foody, who was particularly impressed, said: “To jump from 18.4% to 29.8% in a few months is crazy.”

APEX-Agents Ranking.Thanks for the pictures:Mercor (screenshot)

Thirty percent is still a long way from 100%, so it’s not like lawyers need to worry about being replaced by machines next week. But they should be a lot less confident than last month!

Leave a Comment Cancel reply