Bad Idea Bench

Tests LLMs capabilities to spot bad ideas and nudge the user towards better ones.

May 21, 2026

4 tasks

110 models

$0.0752

user_c636b9d7

Public

Tests

Each test is one prompt sent to every model in the benchmark.

4 tests × 110 models = 880 arena votes for reliable rankings.