German Memelord Bench

Benchmarking LLMs capabilities to detect and understand German memes across a plethora of questions.

Jan 9, 2026
35 tasks
110 models
$1.1346
user_c636b9d7
Link only

ResultsPreliminary

Vote in the arena

27 of 110 models scored automatically so far. Arena votes unlock the rest and refine the ranking.

Gemini 3 Flash Preview
by Google
93%
score
Gemini 2.5 Pro
by Google
81%
score
Claude Sonnet 4.5
by Anthropic
70%
score
4
GPT-5.2
by OpenAI
65%
score
5
Claude Opus 4.5
by Anthropic
53%
score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

Gemini 3 Flash Preview
by Google on OpenRouter
2.5s
$0.0192
93%
Gemini 2.5 Pro
by Google on OpenRouter
15.8s
$0.4450
81%
Claude Sonnet 4.5
by Anthropic on OpenRouter
5.5s
$0.0793
70%
GPT-5.2
by OpenAI on OpenRouter
9.2s
$0.1775
65%
Claude Opus 4.5
by Anthropic on OpenRouter
6.7s
$0.1277
53%
GPT-5 Mini
by OpenAI on OpenRouter
12.3s
$0.0397
50%
Claude 3 Haiku
by Anthropic on OpenRouter
1.7s
$0.0060
49%
GPT-5 Nano
by OpenAI on OpenRouter
19.8s
$0.0255
43%
Claude Haiku 4.5
by Anthropic on OpenRouter
2.6s
$0.0234
43%
Gemini 2.5 Flash
by Google on OpenRouter
1.5s
$0.0112
42%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier
Best value
Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

GLM 4.7OpenRouter
4,637 avg (24 in / 4,612 out)
gpt-oss-20bOpenRouter
2,323 avg (84 in / 2,239 out)
GPT-5 NanoOpenRouter
2,215 avg (23 in / 2,191 out)
Gemini 2.5 ProOpenRouter
1,550 avg (17 in / 1,532 out)
gpt-oss-120bOpenRouter
1,040 avg (84 in / 956 out)