Categorization Bench

This benchmark measures the model's ability to suggest relevant category names for a given set of related concepts or items.

May 16, 2026

10 tasks

110 models

$2.1566

user_c636b9d7

Public

Tests

Each test is one prompt sent to every model in the benchmark.

10 tests × 110 models = 2200 arena votes for reliable rankings.