Browse what others are evaluating. Each benchmark is a set of prompts run against a group of models. Open one to see how they scored.
No description
Tests LLMs capabilities to spot bad ideas and nudge the user towards better ones.
This benchmark measures the model's ability to suggest relevant category names for a given set of related concepts or items.
This benchmark measures the ability to explain complex topics simply and concisely for a five-year-old.
Tests the LLMs knowledge about a specific German town in rural Germany.
No description
Benchmarks product recommendations for a diverse set of SaaS products
LLM Benchmark on typical Venture Capital terms so you know which model to discuss your next fundraising with.
Benchmarking LLMs capabilities to detect and understand German memes across a plethora of questions.
Tests an LLM's ability to accurately count and categorize specific characters, symbols, and patterns within strings. This benchmark evaluates tokenization-independent visual processing and precise sub-string analysis.