Tests an LLM's ability to accurately count and categorize specific characters, symbols, and patterns within strings. This benchmark evaluates tokenization-independent visual processing and precise sub-string analysis.
29 of 110 models on the leaderboard so far. More join with each arena vote.
Expand each prompt to see per-model responses and reasoning.
Compare performance across models and prompts.
Find models with the best balance of quality, cost, and speed.
Average tokens used per model across all prompts.