Spatial Reasoning: Germany

This benchmark evaluates an LLM's knowledge of German and Central European geography, focusing on relative positioning, proximity, and spatial orientation between cities and landmarks.

Jan 9, 2026

11 tasks

110 models

$0.4142

user_c636b9d7

Link only

ResultsPreliminary

Vote in the arena

27 of 110 models scored automatically so far. Arena votes unlock the rest and refine the ranking.

Claude Sonnet 4.5

by Anthropic

2.9s

$0.0254

97%

score

Claude Haiku 4.5

by Anthropic

1.7s

$0.0075

97%

score

Claude Opus 4.5

by Anthropic

3.4s

$0.0424

95%

score

Gemini 3 Flash Preview

by Google

1.3s

$0.0032

95%

score

Kimi K2 Thinking

by MoonshotAI

26.6s

$0.0059

95%

score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

Claude Sonnet 4.5

by Anthropic on OpenRouter

2.9s

$0.0254

97%

Claude Haiku 4.5

by Anthropic on OpenRouter

1.7s

$0.0075

97%

Claude Opus 4.5

by Anthropic on OpenRouter

3.4s

$0.0424

95%

Gemini 3 Flash Preview

by Google on OpenRouter

1.3s

$0.0032

95%

Kimi K2 Thinking

by MoonshotAI on OpenRouter

26.6s

$0.0059

95%

GPT-5 Mini

by OpenAI on OpenRouter

12.0s

$0.0252

95%

GPT-5.2

by OpenAI on OpenRouter

4.2s

$0.0363

95%

Gemini 2.5 Pro

by Google on OpenRouter

10.4s

$0.2173

95%

Claude 3.5 Haiku

by Anthropic on OpenRouter

2.2s

$0.0059

94%

Gemini 2.5 Flash Lite

by Google on OpenRouter

787ms

$0.0007

94%

Model	Duration	Cost	Score
Claude Sonnet 4.5 by Anthropic on OpenRouter	2.9s	$0.0254	97%
Claude Haiku 4.5 by Anthropic on OpenRouter	1.7s	$0.0075	97%
Claude Opus 4.5 by Anthropic on OpenRouter	3.4s	$0.0424	95%
Gemini 3 Flash Preview by Google on OpenRouter	1.3s	$0.0032	95%
Kimi K2 Thinking by MoonshotAI on OpenRouter	26.6s	$0.0059	95%
GPT-5 Mini by OpenAI on OpenRouter	12.0s	$0.0252	95%
GPT-5.2 by OpenAI on OpenRouter	4.2s	$0.0363	95%
Gemini 2.5 Pro by Google on OpenRouter	10.4s	$0.2173	95%
Claude 3.5 Haiku by Anthropic on OpenRouter	2.2s	$0.0059	94%
Gemini 2.5 Flash Lite by Google on OpenRouter	787ms	$0.0007	94%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier

Best value

Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

GPT-5 NanoOpenRouter

1,312 avg (27 in / 1,284 out)

Gemini 2.5 ProOpenRouter

1,007 avg (22 in / 985 out)

GLM 4.7OpenRouter

947 avg (27 in / 920 out)

gpt-oss-20bOpenRouter

714 avg (87 in / 627 out)

Kimi K2 ThinkingOpenRouter

670 avg (44 in / 626 out)