Tests an LLM's ability to accurately count and categorize specific characters, symbols, and patterns within strings. This benchmark evaluates tokenization-independent visual processing and precise sub-string analysis.
Each test is one prompt sent to every model in the benchmark.
11 tests × 110 models = 2420 arena votes for reliable rankings.