Bad Idea Bench

Tests LLMs capabilities to spot bad ideas and nudge the user towards better ones.

May 21, 2026
4 tasks
110 models
$0.0752
user_c636b9d7
Public

Tests

Each test is one prompt sent to every model in the benchmark.

4 tests × 110 models = 880 arena votes for reliable rankings.