Changelog

What we've shipped lately.

v2.0.0

Latest
  • Added

    Vote on answers in the Arena.

    See two model answers side by side and pick the better one. Your votes feed into the benchmark's score.

  • Added

    A new landing page.

    Clearer explanation of what Evalry does and how to pick the right model for what you're building.

  • Changed

    Vote directly from a benchmark.

    You can now vote on model answers right next to the results, without switching pages.

  • Fixed

    Benchmarks pages on phones.

    The benchmarks list and individual benchmark pages now lay out properly on small screens.

v1.7.0

  • Added

    Browse every model.

    A new page lists every model we test, with its score, speed, and price.

  • Changed

    Easier-to-read model answers.

    Long answers are now formatted properly and collapsed by default, so the results page stays scannable.

v1.6.1

  • Changed

    Newer models in the recommended lists.

    The pre-made model selections include the latest releases from OpenAI, Anthropic, Google, and others.

v1.6.0

  • Added

    Edit benchmarks in place.

    Change your tasks from the benchmark page itself, with suggestions to help you refine them.

  • Changed

    Better-looking shared links.

    When you paste a public benchmark link into Slack, X, or anywhere else, it shows a proper preview with title, description, and image.

v1.5.0

  • Added

    Spending limits.

    Set a credit cap on your account so a long-running benchmark can't surprise you on cost.

v1.4.0

  • Added

    Featured benchmarks.

    A short list of benchmarks we've hand-picked, so you don't have to start from scratch.

  • Changed

    Faster public pages.

    Public benchmark pages open near-instantly, even on a fresh visit.

v1.3.1

  • Changed

    Paged benchmark list.

    The benchmarks page no longer shows everything at once, which keeps it fast as the catalog grows.

v1.3.0

  • Added

    Overall model rankings.

    See which models score best across every benchmark, not only within one.

  • Added

    Pick a model group instead of picking one by one.

    Choose from ready-made model sets and run your benchmark against the whole group at once.

v1.2.1

  • Fixed

    Results page on phones.

    Tables and charts on the results page now fit small screens.

v1.2.0

  • Added

    Public or private benchmarks.

    Choose whether each benchmark is visible to anyone with the link or only to you.

  • Added

    A page for every model.

    Each model has its own page with its scores, speed, and price.

  • Changed

    Compare token usage and export results.

    See how many tokens each model used per task, and download the full results.

v1.1.0

  • Added

    Accounts and sharing.

    Sign in to keep your benchmarks and share them with a link.

v1.0.0

  • Added

    Evalry is live.

    First public release.

  • Added

    Run against hundreds of models.

    Benchmarks now run against any of the models from the major providers, all from one place.

  • Added

    Rankings page.

    A single page that ranks models across every public benchmark.