Explain Like I'm 5

This benchmark measures the ability to explain complex topics simply and concisely for a five-year-old.

May 14, 2026

7 tasks

110 models

$0.2284

user_c636b9d7

Public

ResultsPreliminary

Vote in the arena

47 of 110 models scored automatically so far. Arena votes unlock the rest and refine the ranking.

GPT-5.5 Pro

by OpenAI

12.4s

$0.0750

100%

score

Claude Sonnet 4.6

by Anthropic

3.3s

$0.0015

96%

score

GPT-5.5

by OpenAI

3.1s

$0.0070

94%

score

Claude Opus 4.6

by Anthropic

5.4s

$0.0135

93%

score

Claude Haiku 4.5

by Anthropic

2.4s

$0.0018

90%

score

Prompt Details

Expand each prompt to see per-model responses and reasoning.

Model Comparison

Compare performance across models and prompts.

GPT-5.5 Pro

by OpenAI on OpenRouter

12.4s

$0.0750

100%

Claude Sonnet 4.6

by Anthropic on OpenRouter

3.3s

$0.0015

96%

GPT-5.5

by OpenAI on OpenRouter

3.1s

$0.0070

94%

Claude Opus 4.6

by Anthropic on OpenRouter

5.4s

$0.0135

93%

Claude Haiku 4.5

by Anthropic on OpenRouter

2.4s

$0.0018

90%

Gemini 3.1 Flash Lite

by Google on OpenRouter

1.8s

$0.0005

88%

GLM 5.1

by Z.ai on OpenRouter

18.1s

$0.0104

87%

Claude 3 Haiku

by Anthropic on OpenRouter

1.4s

$0.0001

87%

Qwen3.6 Max Preview

by Qwen on OpenRouter

27.6s

$0.0130

86%

Mistral Large 3 2512

by Mistral on OpenRouter

2.2s

$0.0003

83%

Model	Duration	Cost	Score
GPT-5.5 Pro by OpenAI on OpenRouter	12.4s	$0.0750	100%
Claude Sonnet 4.6 by Anthropic on OpenRouter	3.3s	$0.0015	96%
GPT-5.5 by OpenAI on OpenRouter	3.1s	$0.0070	94%
Claude Opus 4.6 by Anthropic on OpenRouter	5.4s	$0.0135	93%
Claude Haiku 4.5 by Anthropic on OpenRouter	2.4s	$0.0018	90%
Gemini 3.1 Flash Lite by Google on OpenRouter	1.8s	$0.0005	88%
GLM 5.1 by Z.ai on OpenRouter	18.1s	$0.0104	87%
Claude 3 Haiku by Anthropic on OpenRouter	1.4s	$0.0001	87%
Qwen3.6 Max Preview by Qwen on OpenRouter	27.6s	$0.0130	86%
Mistral Large 3 2512 by Mistral on OpenRouter	2.2s	$0.0003	83%

Value Analysis

Find models with the best balance of quality, cost, and speed.

Best value frontier

Best value

Size = duration

Highlighted models offer the best score at their price point. Larger dots take longer to produce a result.

Token Usage

Average tokens used per model across all prompts.

Gemini 2.5 ProOpenRouter

1,502 avg (89 in / 1,413 out)

Qwen3.6 FlashOpenRouter

1,409 avg (102 in / 1,308 out)

GLM 4.7OpenRouter

1,384 avg (93 in / 1,291 out)

Hy3 previewOpenRouter

1,356 avg (99 in / 1,257 out)

Qwen3.6 PlusOpenRouter

1,243 avg (102 in / 1,141 out)