Explain Like I'm 5

This benchmark measures the ability to explain complex topics simply and concisely for a five-year-old.

May 14, 2026
7 tasks
110 models
$0.2284
user_c636b9d7
Public

ResultsPreliminary

Vote in the arena

47 of 110 models scored automatically so far. Arena votes unlock the rest and refine the ranking.

GPT-5.5 Pro
by OpenAI
100%
score
Claude Sonnet 4.6
by Anthropic
96%
score
GPT-5.5
by OpenAI
94%
score
4
Claude Opus 4.6
by Anthropic
93%
score
5
Claude Haiku 4.5
by Anthropic
90%
score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

GPT-5.5 Pro
by OpenAI on OpenRouter
12.4s
$0.0750
100%
Claude Sonnet 4.6
by Anthropic on OpenRouter
3.3s
$0.0015
96%
GPT-5.5
by OpenAI on OpenRouter
3.1s
$0.0070
94%
Claude Opus 4.6
by Anthropic on OpenRouter
5.4s
$0.0135
93%
Claude Haiku 4.5
by Anthropic on OpenRouter
2.4s
$0.0018
90%
Gemini 3.1 Flash Lite
by Google on OpenRouter
1.8s
$0.0005
88%
GLM 5.1
by Z.ai on OpenRouter
18.1s
$0.0104
87%
Claude 3 Haiku
by Anthropic on OpenRouter
1.4s
$0.0001
87%
Qwen3.6 Max Preview
by Qwen on OpenRouter
27.6s
$0.0130
86%
Mistral Large 3 2512
by Mistral on OpenRouter
2.2s
$0.0003
83%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier
Best value
Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

Gemini 2.5 ProOpenRouter
1,502 avg (89 in / 1,413 out)
Qwen3.6 FlashOpenRouter
1,409 avg (102 in / 1,308 out)
GLM 4.7OpenRouter
1,384 avg (93 in / 1,291 out)
Hy3 previewOpenRouter
1,356 avg (99 in / 1,257 out)
Qwen3.6 PlusOpenRouter
1,243 avg (102 in / 1,141 out)