Anthropic says its latest AI bot can beat Gemini and ChatGPT

Content

Anthropic, the AI company started by several former OpenAI employees, says the new Claude 3 family of AI models performs as well as or better than leading models from Google and OpenAI. Unlike earlier versions, Claude 3 is also multimodal, able to understand text and photo inputs.

Anthropic says Claude 3 will answer more questions, understand longer instructions, and be more accurate. Claude 3 can understand more context, meaning it can process more information. There’s Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, with Opus being the largest and “most intelligent model.” Anthropic says Opus and Sonnet are now available on claude.ai and its API. Haiku will be released soon. All three models can be deployed on chatbots, auto-completion, and data extraction tasks.

Previous versions of Claude refused to answer some prompts that were harmless, which the company writes “suggests a lack of contextual understanding.” The new models are less likely to refuse to answer prompts that toe the line of its safety guardrails, similar to rumors about Meta’s plans for Llama 3 when it’s released.

Bar chart showing a significantly lower rate of “refused on harmless prompt” responses by Claude 3 AI models (near or below 10 percent), compared to Claude 2.1 (around 25 percent).

Incorrect refusals on Claude 3 versus Claude 2.1.

Image: Anthropic

Anthropic claims Claude 3 models can give near-instant results even while parsing dense material like a research paper. A blog post says Haiku, the smallest version of Claude 3, is “the fastest and most cost-effective model on the market,” able to read a dense research paper complete with charts and graphs “in less than three seconds.”

Anthropic says Opus outperformed most models in several benchmarking tests. It showed better graduate-level reasoning than OpenAI’s GPT-4, getting 50.4 percent in that test over GPT-4’s 35.7 percent. It also answered math questions, coded, and understood reasoning better.

A list of benchmark scores comparing AI models from Anthropic, OpenAI, and Google, showing Claude 3 (Opus) as the highest scoring model on all of the tests listed.

Claude 3 models compared to GPT-4, GPT-3.5, and Gemini 1.0 Ultra / Pro.

Image: Anthropic

The new models also significantly improve against the previous Claude 2.1 model. Sonnet, the middle ground model, was twice as fast as Claude 2 and Claude 2.1. “It excels in tasks demanding rapid responses, like knowledge retrieval or sales automation,” Anthropic said.

Anthropic trained the Claude 3 models on a mix of nonpublic internal and third-party datasets and publicly available data as of August 2023. The company says in a paper introducing the three models that these were trained using hardware from Amazon’s AWS and Google Cloud. Both companies invested in Anthropic, with Amazon putting $4 billion into the company. Claude 3 will be available on AWS’s model library Bedrock and in Google’s Vertex AI.

Featured Videos From The Verge

TikTok Hearing

TikTok Hearing

Summary
Anthropic's new Claude 3 family of AI models, developed by former OpenAI employees, outperforms leading models from Google and OpenAI. Claude 3 is multimodal, understanding text and photo inputs, and offers improved performance in answering questions, following instructions, and accuracy. The models, including Haiku, Sonnet, and Opus, can be deployed on various tasks. Claude 3 shows reduced incorrect refusals compared to previous versions, with Opus excelling in benchmark tests over GPT-4. The models were trained on a mix of datasets and are available on AWS's Bedrock and Google's Vertex AI. Anthropic claims Claude 3 can provide near-instant results, even with dense material like research papers, and Sonnet is highlighted for its speed in tasks like knowledge retrieval. The models were trained using hardware from Amazon's AWS and Google Cloud, with investments from both companies.