Anthropic claims its new AI chatbot models beat OpenAI’s GPT-4

Anthropic, an AI startup supported by Google and venture capital in the hundreds of millions (with the potential for even more), has unveiled the latest iteration of its GenAI technology, Claude. The company asserts that this AI chatbot surpasses OpenAI’s GPT-4 in terms of performance.

The new GenAI from Anthropic, dubbed Claude 3, comprises a suite of models — Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, with Opus being the most potent. According to Anthropic, all these models demonstrate “enhanced capabilities” in analysis and forecasting, as well as superior performance on specific benchmarks compared to models like ChatGPT, GPT-4, and Google’s Gemini 1.0 Ultra (but not Gemini 1.5 Pro).

Significantly, Claude 3 is Anthropic’s inaugural multimodal GenAI, implying it can analyze both text and images — a feature shared with certain versions of GPT-4 and Gemini. Claude 3 can process a variety of visual data, including photos, charts, graphs, and technical diagrams, sourced from PDFs, slideshows, and other document formats.

In an advancement over some GenAI competitors, Claude 3 can analyze multiple images in a single request (up to a limit of 20). This enables it to compare and contrast images, as noted by Anthropic.

However, Claude 3’s image processing capabilities do have limitations.

Anthropic has deactivated the models’ ability to identify individuals, likely due to ethical and legal considerations. The company also acknowledges that Claude 3 tends to err with “low-quality” images (those under 200 pixels) and struggles with tasks that require spatial reasoning (such as reading an analog clock face) and object counting (Claude 3 cannot provide exact counts of objects in images).

Claude 3 does not have the capability to create artwork. Its models are purely designed for image analysis, at least in its current state.

Anthropic indicates that, whether dealing with text or images, customers can generally anticipate Claude 3 to more effectively follow multi-step instructions, generate structured output in formats such as JSON, and communicate in languages other than English compared to its earlier versions. Thanks to a “more refined comprehension of requests,” Claude 3 is expected to decline answering questions less frequently. Additionally, the models will soon reference the sources of their responses, enabling users to verify the information.

As stated in a support article by Anthropic, “Claude 3 tends to produce more expressive and engaging responses. It’s easier to guide and prompt compared to our previous models. Users should find that they can attain the desired outcomes with briefer and more succinct prompts.”

Some of these enhancements are attributed to the expanded context of Claude 3.

In the context of a model, the term ‘context window’ refers to the input data (for instance, text) that the model takes into account prior to producing output. Models with smaller context windows often “forget” the content of recent conversations, causing them to deviate from the topic — frequently in troublesome ways. On the other hand, models with larger context windows are better at understanding the narrative flow of the data they process and can generate responses that are richer in context (at least theoretically).

Anthropic reveals that Claude 3 will initially support a context window of 200,000 tokens, roughly equivalent to about 150,000 words. Select customers will have access to a context window of up to 1 million tokens (~700,000 words). This is comparable to Google’s latest GenAI model, Gemini 1.5 Pro, which also provides a context window of up to a million tokens.

However, it’s important to note that despite being an improvement over its predecessors, Claude 3 is not flawless.

In a technical whitepaper, Anthropic concedes that Claude 3 is not exempt from the problems that affect other GenAI models, specifically bias and hallucinations (i.e., fabricating information). Unlike some GenAI models, Claude 3 lacks the ability to search the web; the models can only respond to questions using data available prior to August 2023. While Claude is capable of operating in multiple languages, it is not as proficient in certain “low-resource” languages as it is in English.

Anthropic is committed to delivering regular updates to Claude 3 in the upcoming months.

The company states in a blog post, “We are of the view that the potential of model intelligence is far from being fully realized, and we intend to roll out [improvements] to the Claude 3 model suite in the forthcoming months.”

Currently, Opus and Sonnet are accessible on the web and through Anthropic’s developer console and API, as well as on Amazon’s Bedrock platform and Google’s Vertex AI. Haiku is slated for release later this year.

Here is a summary of the pricing:

Opus: $15 for every million input tokens, $75 for every million output tokens
Sonnet: $3 for every million input tokens, $15 for every million output tokens
Haiku: $0.25 for every million input tokens, $1.25 for every million output tokens

That’s a brief overview of Claude 3. But what does the big picture look like?

As we’ve previously noted, Anthropic aspires to develop a next-generation algorithm for “AI self-teaching.” This type of algorithm could be employed to create virtual assistants capable of responding to emails, conducting research, and generating art, books, and more — a glimpse of which we’ve already seen with GPT-4 and other large language models.

In a recent blog post, Anthropic hinted at plans to enhance Claude 3’s initial capabilities by enabling it to interact with other systems, code “interactively,” and offer “advanced agentic capabilities.”

This last point brings to mind OpenAI’s stated goal of developing a software agent to automate complex tasks, such as transferring data from a document to a spreadsheet or automatically completing expense reports and entering them into accounting software. OpenAI already provides an API that enables developers to incorporate “agent-like experiences” into their apps, and it appears that Anthropic is determined to deliver comparable functionality.

Could we expect an image generator from Anthropic next? Honestly, that would be surprising. Image generators are currently a contentious topic, primarily due to copyright and bias issues. Google was recently compelled to disable its image generator after it introduced diversity into images with a comical disregard for historical context. Furthermore, several image generator providers are embroiled in legal disputes with artists who allege that they are profiting from their work by training GenAI on it without providing compensation or credit.

I’m interested in tracking the progress of Anthropic’s method for training GenAI, known as “constitutional AI.” The company asserts that this approach makes the behavior of its GenAI more comprehensible, predictable, and easier to modify as necessary. Constitutional AI is designed to align AI with human intentions by having models respond to queries and carry out tasks based on a straightforward set of guiding principles. For instance, for Claude 3, Anthropic incorporated a principle — guided by feedback from the crowd — that directs the models to be empathetic and accessible to individuals with disabilities.

Regardless of Anthropic’s ultimate objective, it’s committed to a long-term strategy. As per a pitch deck leaked in May of the previous year, the company plans to raise as much as $5 billion over the forthcoming 12 months or so — potentially the minimum requirement to stay competitive with OpenAI. (After all, training models is not inexpensive.) It’s already making significant strides, with $2 billion and $4 billion in committed capital and pledges from Google and Amazon, respectively, and well over a billion collectively from other investors.

Leave a Reply Cancel reply