Untitled Benchmark

May 26, 2026

50 tasks

42 models

$0.0092

user_c636b9d7

Public

ResultsPreliminary

Vote in the arena

5 of 42 models scored automatically so far. Arena votes unlock the rest and refine the ranking.

GPT-4.1 Nano

by OpenAI

2.3s

$0.0000

100%

score

Qwen3.5-Flash

by Qwen

29.4s

$0.0014

100%

score

Auto Router

by OpenRouter

3.2s

$0.0052

100%

score

Claude 3 Haiku

by Anthropic

896ms

$0.0001

90%

score

Qwen3.6 Flash

by Qwen

13.3s

$0.0025

score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

GPT-4.1 Nano

by OpenAI on OpenRouter

2.3s

$0.0000

100%

Qwen3.5-Flash

by Qwen on OpenRouter

29.4s

$0.0014

100%

Auto Router

by OpenRouter on OpenRouter

3.2s

$0.0052

100%

Claude 3 Haiku

by Anthropic on OpenRouter

896ms

$0.0001

90%

Qwen3.6 Flash

by Qwen on OpenRouter

13.3s

$0.0025

Model	Duration	Cost	Score
GPT-4.1 Nano by OpenAI on OpenRouter	2.3s	$0.0000	100%
Qwen3.5-Flash by Qwen on OpenRouter	29.4s	$0.0014	100%
Auto Router by OpenRouter on OpenRouter	3.2s	$0.0052	100%
Claude 3 Haiku by Anthropic on OpenRouter	896ms	$0.0001	90%
Qwen3.6 Flash by Qwen on OpenRouter	13.3s	$0.0025	0%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier

Best value

Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

Qwen3.5-FlashOpenRouter

5,641 avg (99 in / 5,542 out)

Qwen3.6 FlashOpenRouter

2,301 avg (99 in / 2,202 out)

Auto RouterOpenRouter

165 avg (85 in / 80 out)

Claude 3 HaikuOpenRouter

149 avg (99 in / 50 out)

GPT-4.1 NanoOpenRouter

118 avg (95 in / 23 out)