Claude computer use, Claude 3.5 Sonnet and Claude 3.5 Haiku
Anthropic has introduced a new feature in Claude called computer use. The idea behind it is to use the computer screen as visual input and allow the model control over the mouse cursor, buttons and text input. The feature is still experimental but in public beta.
Claude 3.5 Sonnet
received some impressive updates across several benchmarks (see the table below) and later this month a new model will be released called Claude 3.5 Haiku
.
Benchmark Category | Claude 3.5 Sonnet (new) | Claude 3.5 Haiku | Claude 3.5 Sonnet | GPT-4o | GPT- mini | Gemini 1.5 Pro | Gemini 1.5 Flash |
---|---|---|---|---|---|---|---|
Graduate level reasoning (GPQA Diamond) |
65.0% 0-shot CoT |
41.6% 0-shot CoT |
59.4% 0-shot CoT |
53.6% 0-shot CoT |
40.2% 0-shot CoT |
59.1% 0-shot CoT |
51.0% 0-shot CoT |
Undergraduate level knowledge (MMLU Pro) |
78.0% 0-shot CoT |
65.0% 0-shot CoT |
75.1% 0-shot CoT |
— | — | 75.8% 0-shot CoT |
67.3% 0-shot CoT |
Code (HumanEval) |
93.7% 0-shot |
88.1% 0-shot |
92.0% 0-shot |
90.2% 0-shot |
87.2% 0-shot |
— | — |
Math problem-solving (MATH) |
78.3% 0-shot CoT |
69.2% 0-shot CoT |
71.1% 0-shot CoT |
76.6% 0-shot CoT |
70.2% 0-shot CoT |
86.5% 4-shot CoT |
77.9% 4-shot CoT |
High school math competition (AIME 2024) |
16.0% 0-shot CoT |
5.3% 0-shot CoT |
9.6% 0-shot CoT |
9.3% 0-shot CoT |
— | — | — |
Visual Q/A (MMMU) |
70.4% 0-shot CoT |
— | 68.3% 0-shot CoT |
69.1% 0-shot CoT |
59.4% 0-shot CoT |
65.9% 0-shot CoT |
62.3% 0-shot CoT |
Agentic coding (SWE-bench Verified) |
49.0% | 40.6% | 33.4% | — | — | — | — |
Agentic tool use - Retail (TAU-bench) |
69.2% | 51.0% | 62.6% | — | — | — | — |
Agentic tool use - Airline (TAU-bench) |
46.0% | 22.8% | 36.0% | — | — | — | — |
Note:
According to Anthropic, the o1 models were omitted due to the extensive pre-response computation time and differences between the model approaches.
Claude 3.5 Haiku
is an alternative to GPT-4o Mini
. While not as competitive in terms of pricing, according to the benchmarks it should perform better.
Here are the current prices of the API:
Pricing for Claude 3.5 Sonnet
- $15.00 / 1M output tokens
- $3.00 / 1M input tokens
- $3.75 / 1M prompt caching write tokens
- $0.30 / 1M prompt caching read tokens
Pricing for Claude 3.5 Haiku
- $1.25 / 1M output tokens
- $0.25 / 1M input tokens
- $0.30 / 1M prompt caching write tokens
- $0.03 / 1M prompt caching read tokens
Pricing for Claude 3 Opus
- $75.00 / 1M output tokens
- $15.00 / 1M input tokens
- $18.75 / 1M prompt caching write tokens
- $1.50 / 1M prompt caching read tokens
Notes:
- All models feature a 200K context window
- 50% discount is available when using the Batches API