German Memelord Bench

Benchmarking LLMs capabilities to detect and understand German memes across a plethora of questions.

Jan 9, 2026
35 tasks
110 models
$6.0705
karllorey
Link only

ResultsPreliminary

Vote in the arena

29 of 110 models on the leaderboard so far. More join with each arena vote.

Gemini 3 Flash Preview
by Google
93%
score
Gemini 2.5 Pro
by Google
80%
score
Claude Sonnet 4.5
by Anthropic
68%
score
4
GPT-5.2
by OpenAI
64%
score
5
Claude Opus 4.5
by Anthropic
52%
score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

Gemini 3 Flash Preview
by Google on OpenRouter
2.5s
$0.0192
93%
Gemini 2.5 Pro
by Google on OpenRouter
15.8s
$0.4450
80%
Claude Sonnet 4.5
by Anthropic on OpenRouter
5.5s
$0.0793
68%
GPT-5.2
by OpenAI on OpenRouter
9.2s
$0.1775
64%
Claude Opus 4.5
by Anthropic on OpenRouter
6.7s
$0.1277
52%
GPT-5 Mini
by OpenAI on OpenRouter
12.3s
$0.0397
52%
Claude 3 Haiku
by Anthropic on OpenRouter
1.7s
$0.0060
51%
GPT-5 Nano
by OpenAI on OpenRouter
19.8s
$0.0255
46%
Gemini 2.5 Flash
by Google on OpenRouter
1.5s
$0.0112
46%
Claude Haiku 4.5
by Anthropic on OpenRouter
2.6s
$0.0234
43%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier
Best value
Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

GLM 4.7OpenRouter
4,637 avg (24 in / 4,612 out)
DeepSeek V3.2 SpecialeOpenRouter
2,849 avg (25 in / 2,824 out)
gpt-oss-20bOpenRouter
2,323 avg (84 in / 2,239 out)
GPT-5 NanoOpenRouter
2,215 avg (23 in / 2,191 out)
Gemini 2.5 ProOpenRouter
1,550 avg (17 in / 1,532 out)