Skip to main content

Claude computer use, Claude 3.5 Sonnet and Claude 3.5 Haiku

Anthropic has introduced a new feature in Claude called computer use. The idea behind it is to use the computer screen as visual input and allow the model control over the mouse cursor, buttons and text input. The feature is still experimental but in public beta.

Claude 3.5 Sonnet received some impressive updates across several benchmarks (see the table below) and later this month a new model will be released called Claude 3.5 Haiku.

Benchmark Category Claude 3.5 Sonnet (new) Claude 3.5 Haiku Claude 3.5 Sonnet GPT-4o GPT- mini Gemini 1.5 Pro Gemini 1.5 Flash
Graduate level reasoning
(GPQA Diamond)
65.0%
0-shot CoT
41.6%
0-shot CoT
59.4%
0-shot CoT
53.6%
0-shot CoT
40.2%
0-shot CoT
59.1%
0-shot CoT
51.0%
0-shot CoT
Undergraduate level knowledge
(MMLU Pro)
78.0%
0-shot CoT
65.0%
0-shot CoT
75.1%
0-shot CoT
75.8%
0-shot CoT
67.3%
0-shot CoT
Code
(HumanEval)
93.7%
0-shot
88.1%
0-shot
92.0%
0-shot
90.2%
0-shot
87.2%
0-shot
Math problem-solving
(MATH)
78.3%
0-shot CoT
69.2%
0-shot CoT
71.1%
0-shot CoT
76.6%
0-shot CoT
70.2%
0-shot CoT
86.5%
4-shot CoT
77.9%
4-shot CoT
High school math competition
(AIME 2024)
16.0%
0-shot CoT
5.3%
0-shot CoT
9.6%
0-shot CoT
9.3%
0-shot CoT
Visual Q/A
(MMMU)
70.4%
0-shot CoT
68.3%
0-shot CoT
69.1%
0-shot CoT
59.4%
0-shot CoT
65.9%
0-shot CoT
62.3%
0-shot CoT
Agentic coding
(SWE-bench Verified)
49.0% 40.6% 33.4%
Agentic tool use - Retail
(TAU-bench)
69.2% 51.0% 62.6%
Agentic tool use - Airline
(TAU-bench)
46.0% 22.8% 36.0%

Note: According to Anthropic, the o1 models were omitted due to the extensive pre-response computation time and differences between the model approaches.

Claude 3.5 Haiku is an alternative to GPT-4o Mini. While not as competitive in terms of pricing, according to the benchmarks it should perform better.

Here are the current prices of the API:

Pricing for Claude 3.5 Sonnet

  • $15.00 / 1M output tokens
  • $3.00 / 1M input tokens
  • $3.75 / 1M prompt caching write tokens
  • $0.30 / 1M prompt caching read tokens

Pricing for Claude 3.5 Haiku

  • $1.25 / 1M output tokens
  • $0.25 / 1M input tokens
  • $0.30 / 1M prompt caching write tokens
  • $0.03 / 1M prompt caching read tokens

Pricing for Claude 3 Opus

  • $75.00 / 1M output tokens
  • $15.00 / 1M input tokens
  • $18.75 / 1M prompt caching write tokens
  • $1.50 / 1M prompt caching read tokens

Notes:

  • All models feature a 200K context window
  • 50% discount is available when using the Batches API