Claude computer use, Claude 3.5 Sonnet and Claude 3.5 Haiku

Harald Nezbeda

2024-10-23 06:56

Anthropic has introduced a new feature in Claude called computer use. The idea behind it is to use the computer screen as visual input and allow the model control over the mouse cursor, buttons and text input. The feature is still experimental but in public beta.

Claude 3.5 Sonnet received some impressive updates across several benchmarks (see the table below) and later this month a new model will be released called Claude 3.5 Haiku.

Benchmark Category	Claude 3.5 Sonnet (new)	Claude 3.5 Haiku	Claude 3.5 Sonnet	GPT-4o	GPT- mini	Gemini 1.5 Pro	Gemini 1.5 Flash
Graduate level reasoning (GPQA Diamond)	65.0% _{0-shot CoT}	41.6% _{0-shot CoT}	59.4% _{0-shot CoT}	53.6% _{0-shot CoT}	40.2% _{0-shot CoT}	59.1% _{0-shot CoT}	51.0% _{0-shot CoT}
Undergraduate level knowledge (MMLU Pro)	78.0% _{0-shot CoT}	65.0% _{0-shot CoT}	75.1% _{0-shot CoT}	—	—	75.8% _{0-shot CoT}	67.3% _{0-shot CoT}
Code (HumanEval)	93.7% _0-shot	88.1% _0-shot	92.0% _0-shot	90.2% _0-shot	87.2% _0-shot	—	—
Math problem-solving (MATH)	78.3% _{0-shot CoT}	69.2% _{0-shot CoT}	71.1% _{0-shot CoT}	76.6% _{0-shot CoT}	70.2% _{0-shot CoT}	86.5% _{4-shot CoT}	77.9% _{4-shot CoT}
High school math competition (AIME 2024)	16.0% _{0-shot CoT}	5.3% _{0-shot CoT}	9.6% _{0-shot CoT}	9.3% _{0-shot CoT}	—	—	—
Visual Q/A (MMMU)	70.4% _{0-shot CoT}	—	68.3% _{0-shot CoT}	69.1% _{0-shot CoT}	59.4% _{0-shot CoT}	65.9% _{0-shot CoT}	62.3% _{0-shot CoT}
Agentic coding (SWE-bench Verified)	49.0%	40.6%	33.4%	—	—	—	—
Agentic tool use - Retail (TAU-bench)	69.2%	51.0%	62.6%	—	—	—	—
Agentic tool use - Airline (TAU-bench)	46.0%	22.8%	36.0%	—	—	—	—

Note: According to Anthropic, the o1 models were omitted due to the extensive pre-response computation time and differences between the model approaches.

Claude 3.5 Haiku is an alternative to GPT-4o Mini. While not as competitive in terms of pricing, according to the benchmarks it should perform better.

Here are the current prices of the API:

Pricing for Claude 3.5 Sonnet

$15.00 / 1M output tokens
$3.00 / 1M input tokens
$3.75 / 1M prompt caching write tokens
$0.30 / 1M prompt caching read tokens

Pricing for Claude 3.5 Haiku

$1.25 / 1M output tokens
$0.25 / 1M input tokens
$0.30 / 1M prompt caching write tokens
$0.03 / 1M prompt caching read tokens

Pricing for Claude 3 Opus

$75.00 / 1M output tokens
$15.00 / 1M input tokens
$18.75 / 1M prompt caching write tokens
$1.50 / 1M prompt caching read tokens

Notes:

All models feature a 200K context window
50% discount is available when using the Batches API