Karlsruhe Local Knowledge Benchmark

Tests the model's specific knowledge regarding the history, geography, transportation, and culture of the German city Karlsruhe.

Jan 7, 2026
9 tasks
110 models
$5.1516
user_c636b9d7
Link only

ResultsPreliminary

Vote in the arena

29 of 110 models on the leaderboard so far. More join with each arena vote.

Gemini 2.5 Pro
by Google
97%
score
GPT-5.2
by OpenAI
95%
score
Gemini 3 Flash Preview
by Google
90%
score
4
Claude Opus 4.5
by Anthropic
87%
score
5
DeepSeek V3.2
by DeepSeek
86%
score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

Gemini 2.5 Pro
by Google on OpenRouter
17.7s
$0.4417
97%
GPT-5.2
by OpenAI on OpenRouter
11.2s
$0.1344
95%
Gemini 3 Flash Preview
by Google on OpenRouter
3.8s
$0.0276
90%
Claude Opus 4.5
by Anthropic on OpenRouter
6.3s
$0.1530
87%
DeepSeek V3.2
by DeepSeek on OpenRouter
22.9s
$0.0041
86%
Gemini 2.0 Flash
by Google on OpenRouter
1.6s
$0.0012
85%
DeepSeek V3.2 Speciale
by DeepSeek on OpenRouter
81.3s
$0.0232
85%
DeepSeek V3 0324
by DeepSeek on OpenRouter
17.5s
$0.0069
85%
Kimi K2 Thinking
by MoonshotAI on OpenRouter
18.1s
$0.0113
83%
Claude Sonnet 4.5
by Anthropic on OpenRouter
6.7s
$0.0992
79%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier
Best value
Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

GPT-5 NanoOpenRouter
2,374 avg (19 in / 2,355 out)
GLM 4.7OpenRouter
2,253 avg (21 in / 2,232 out)
DeepSeek V3.2 SpecialeOpenRouter
2,100 avg (23 in / 2,078 out)
Gemini 2.5 ProOpenRouter
1,758 avg (14 in / 1,744 out)
gpt-oss-20bOpenRouter
1,441 avg (78 in / 1,363 out)