Cogito V2 Preview Llama 109B

An instruction-tuned, hybrid-reasoning Mixture-of-Experts model built on Llama-4-Scout-17B-16E. Cogito v2 can answer directly or engage an extended “thinking” phase, with alignment guided by Iterated Distillation & Amplification (IDA). It targets coding, STEM, instruction following, and general helpfulness, with stronger multilingual, tool-calling, and reasoning performance than size-equivalent baselines. The model supports long-context use (up to 10M tokens) and standard Transformers workflows. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

by Deep Cogito

Overview

Quick stats across all benchmark runs.

Score

9 benchmarks

Avg Latency

7.3s

13 requests

Pricing

$0.18 in / $0.59 out

per 1M tokens

Context

33K

tokens

Alternatives

Models with similar or better quality but different tradeoffs

No alternatives found

Run benchmarks on this model to discover alternatives

Benchmark Performance

How this model performs across different benchmarks

No benchmark data available

Run benchmarks with this model to see performance breakdown

Price vs Performance

Compare cost efficiency across all models

Current model (baseline)
Other models (relative score)
Y-axis shows score difference from shared benchmarks. X-axis uses log scale.

Score Over Time

Performance trends across all benchmark runs

Benchmark Activity

Number of benchmark runs over time

Get started with this model using OpenRouter

import { OpenRouter } from "@openrouter/sdk";

const openrouter = new OpenRouter({
  apiKey: "<OPENROUTER_API_KEY>"
});

const completion = await openrouter.chat.completions.create({
  model: "deepcogito/cogito-v2-preview-llama-109b-moe",
  messages: [
    {
      role: "user",
      content: "Hello!"
    }
  ]
});

console.log(completion.choices[0].message.content);

Get your API key at openrouter.ai/keys

Other Models from Deep Cogito

Compare performance with other models from the same creator

ModelLatencyCost/1MScore
Cogito v2.1 671B10.0s$1.25
Cogito V2 Preview Llama 405B25.8s$3.50
Cogito V2 Preview Llama 70B13.6s$0.88