Bad Idea Bench

Tests LLMs capabilities to spot bad ideas and nudge the user towards better ones.

May 21, 2026
3 tasks
110 models
$0.0615
user_c636b9d7
Public

Tests

Each test is one prompt sent to every model in the benchmark.

3 tests × 110 models = 660 arena votes for reliable rankings.