Changelog

What we've shipped lately.

v2.1.1

June 2, 2026Latest

Changed
Results now update immediately.
The results page now updates on its own while a benchmark is being evaluated. Each model appears on the leaderboard as soon as it's scored, so you don't have to wait for the whole benchmark to finish.

May 28, 2026

Changed
More to compare right away in the Arena.
A new benchmark now has a fuller set of answers ready to vote on from the start, so there's more than a single pair to compare while it warms up.
Changed
Smarter matchups in the Arena.
The Arena now leans toward closer, more informative matchups, so fewer votes are needed to move a benchmark toward a reliable ranking.

May 20, 2026

Added
Vote on answers in the Arena.
See two model answers side by side and pick the better one. Your votes feed into the benchmark's score.
Added
A new landing page.
Clearer explanation of what Evalry does and how to pick the right model for what you're building.
Changed
Vote directly from a benchmark.
You can now vote on model answers right next to the results, without switching pages.
Fixed
Benchmarks pages on phones.
The benchmarks list and individual benchmark pages now lay out properly on small screens.

May 10, 2026

Added
Browse every model.
A new page lists every model we test, with its score, speed, and price.
Changed
Easier-to-read model answers.
Long answers are now formatted properly and collapsed by default, so the results page stays scannable.

April 15, 2026

Changed
Newer models in the recommended lists.
The pre-made model selections include the latest releases from OpenAI, Anthropic, Google, and others.

April 8, 2026

Added
Edit benchmarks in place.
Change your tasks from the benchmark page itself, with suggestions to help you refine them.
Changed
Better-looking shared links.
When you paste a public benchmark link into Slack, X, or anywhere else, it shows a proper preview with title, description, and image.

March 19, 2026

Added
Spending limits.
Set a credit cap on your account so a long-running benchmark can't surprise you on cost.

March 11, 2026

Added
Featured benchmarks.
A short list of benchmarks we've hand-picked, so you don't have to start from scratch.
Changed
Faster public pages.
Public benchmark pages open near-instantly, even on a fresh visit.

February 25, 2026

Changed
Paged benchmark list.
The benchmarks page no longer shows everything at once, which keeps it fast as the catalog grows.

February 18, 2026

Added
Overall model rankings.
See which models score best across every benchmark, not only within one.
Added
Pick a model group instead of picking one by one.
Choose from ready-made model sets and run your benchmark against the whole group at once.

February 4, 2026

Fixed
Results page on phones.
Tables and charts on the results page now fit small screens.

January 22, 2026

Added
Public or private benchmarks.
Choose whether each benchmark is visible to anyone with the link or only to you.
Added
A page for every model.
Each model has its own page with its scores, speed, and price.
Changed
Compare token usage and export results.
See how many tokens each model used per task, and download the full results.

January 8, 2026

Added
Accounts and sharing.
Sign in to keep your benchmarks and share them with a link.

January 2, 2026

Added
Evalry is live.
First public release.
Added
Run against hundreds of models.
Benchmarks now run against any of the models from the major providers, all from one place.
Added
Rankings page.
A single page that ranks models across every public benchmark.