Benchmarks

Recent LLM benchmarks comparing model performance across prompts.