Money Boy Cultural Literacy Test

Running

This benchmark evaluates knowledge of the Austrian rapper Money Boy, including his biography, lyrics, and pop culture influence. It tests the model's ability to identify specific biographical facts and complete iconic German cloud rap verses.

Jan 6, 2026
7 tasks
110 models
user_c636b9d7
Link only

ResultsPreliminary

Vote in the arena

99 of 110 models scored automatically so far. Arena votes unlock the rest and refine the ranking.

Gemini 3 Flash Preview
by Google
100%
score
Gemini 3.1 Pro Preview
by Google
90%
score
Gemini 3.5 Flash
by Google
84%
score
4
GPT-5.5 Pro
by OpenAI
74%
score
5
GPT-5.3 Chat
by OpenAI
74%
score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

Gemini 3 Flash Preview
by Google on OpenRouter
1.4s
$0.0014
100%
Gemini 3.1 Pro Preview
by Google on OpenRouter
9.1s
$0.0687
90%
Gemini 3.5 Flash
by Google on OpenRouter
2.1s
$0.0052
84%
GPT-5.5 Pro
by OpenAI on OpenRouter
26.8s
$0.8640
74%
GPT-5.3 Chat
by OpenAI on OpenRouter
2.8s
$0.0135
74%
GPT-5.5
by OpenAI on OpenRouter
4.5s
$0.0313
73%
Gemini 3.1 Flash Lite
by Google on OpenRouter
823ms
$0.0005
72%
Claude Sonnet 4.5
by Anthropic on OpenRouter
3.9s
$0.0124
67%
GPT-5.2
by OpenAI on OpenRouter
5.0s
$0.0175
67%
Gemini 2.5 Pro
by Google on OpenRouter
10.5s
$0.0632
63%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier
Best value
Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

Qwen3.5-35B-A3BOpenRouter
5,139 avg (96 in / 5,043 out)
Qwen3.5-9BOpenRouter
3,442 avg (95 in / 3,347 out)
Qwen3.5-27BOpenRouter
3,119 avg (96 in / 3,023 out)
Qwen3.6 PlusOpenRouter
2,075 avg (94 in / 1,981 out)
Hy3 previewOpenRouter
2,056 avg (94 in / 1,962 out)