Benchmarks

Browse what others are evaluating. Each benchmark is a set of prompts run against a group of models. Open one to see how they scored.