Karlsruhe Local Knowledge Benchmark

Tests the model's specific knowledge regarding the history, geography, transportation, and culture of the German city Karlsruhe.

Jan 7, 2026
9 tasks
110 models
$1.1126
user_c636b9d7
Link only

ResultsPreliminary

Vote in the arena

27 of 110 models scored automatically so far. Arena votes unlock the rest and refine the ranking.

GPT-5.2
by OpenAI
98%
score
Claude Opus 4.5
by Anthropic
87%
score
Claude Sonnet 4.5
by Anthropic
85%
score
4
Gemini 3 Flash Preview
by Google
84%
score
5
Gemini 2.5 Pro
by Google
81%
score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

GPT-5.2
by OpenAI on OpenRouter
11.2s
$0.1344
98%
Claude Opus 4.5
by Anthropic on OpenRouter
6.3s
$0.1530
87%
Claude Sonnet 4.5
by Anthropic on OpenRouter
6.7s
$0.0992
85%
Gemini 3 Flash Preview
by Google on OpenRouter
3.8s
$0.0276
84%
Gemini 2.5 Pro
by Google on OpenRouter
17.7s
$0.4417
81%
Claude 3 Haiku
by Anthropic on OpenRouter
4.3s
$0.0088
76%
GLM 4.7
by Z.ai on OpenRouter
38.1s
$0.0473
74%
DeepSeek V3.2
by DeepSeek on OpenRouter
22.9s
$0.0041
72%
Claude Haiku 4.5
by Anthropic on OpenRouter
3.1s
$0.0291
72%
Ministral 3 8B 2512
by Mistral on OpenRouter
3.2s
$0.0021
71%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier
Best value
Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

GPT-5 NanoOpenRouter
2,374 avg (19 in / 2,355 out)
GLM 4.7OpenRouter
2,253 avg (21 in / 2,232 out)
Gemini 2.5 ProOpenRouter
1,758 avg (14 in / 1,744 out)
gpt-oss-20bOpenRouter
1,441 avg (78 in / 1,363 out)
gpt-oss-120bOpenRouter
1,280 avg (80 in / 1,200 out)