MoonshotAI

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift.

It sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks.

MoonshotAI: Kimi K2 Thinking

OpenRouter

Google

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability.

The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.

Google: Gemini 3 Flash Preview

DeepSeek

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)

DeepSeek: DeepSeek V3.2

Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window.

Reasoning can be enabled/disabled using the `reasoning` `enabled` parameter in the API. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#controlling-reasoning-tokens)

xAI: Grok 4.1 Fast

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.

It succeeds the [DeepSeek V3](/deepseek/deepseek-chat-v3) model and performs really well on a variety of tasks.

DeepSeek: DeepSeek V3 0324

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. 

Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

Google: Gemini 2.5 Flash

Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output.

To see which model was used, visit [Activity](/activity), or read the `model` attribute of the response. Your response will be priced at the same rate as the routed model.

Learn more, including how to customize the models for routing, in our [docs](/docs/guides/routing/routers/auto-router).

Requests will be routed to the following models:
- [openai/gpt-5.1](/openai/gpt-5.1)
- [openai/gpt-5](/openai/gpt-5)
- [openai/gpt-5-mini](/openai/gpt-5-mini)
- [openai/gpt-5-nano](/openai/gpt-5-nano)
- [openai/gpt-4.1](/openai/gpt-4.1)
- [openai/gpt-4.1-mini](/openai/gpt-4.1-mini)
- [openai/gpt-4.1-nano](/openai/gpt-4.1-nano)
- [openai/gpt-4o](/openai/gpt-4o)
- [openai/gpt-4o-2024-05-13](/openai/gpt-4o-2024-05-13)
- [openai/gpt-4o-2024-08-06](/openai/gpt-4o-2024-08-06)
- [openai/gpt-4o-2024-11-20](/openai/gpt-4o-2024-11-20)
- [openai/gpt-4o-mini](/openai/gpt-4o-mini)
- [openai/gpt-4o-mini-2024-07-18](/openai/gpt-4o-mini-2024-07-18)
- [openai/gpt-4-turbo](/openai/gpt-4-turbo)
- [openai/gpt-4-turbo-preview](/openai/gpt-4-turbo-preview)
- [openai/gpt-4-1106-preview](/openai/gpt-4-1106-preview)
- [openai/gpt-4](/openai/gpt-4)
- [openai/gpt-3.5-turbo](/openai/gpt-3.5-turbo)
- [openai/gpt-oss-120b](/openai/gpt-oss-120b)
- [anthropic/claude-opus-4.5](/anthropic/claude-opus-4.5)
- [anthropic/claude-opus-4.1](/anthropic/claude-opus-4.1)
- [anthropic/claude-opus-4](/anthropic/claude-opus-4)
- [anthropic/claude-sonnet-4.5](/anthropic/claude-sonnet-4.5)
- [anthropic/claude-sonnet-4](/anthropic/claude-sonnet-4)
- [anthropic/claude-3.7-sonnet](/anthropic/claude-3.7-sonnet)
- [anthropic/claude-haiku-4.5](/anthropic/claude-haiku-4.5)
- [anthropic/claude-3.5-haiku](/anthropic/claude-3.5-haiku)
- [anthropic/claude-3-haiku](/anthropic/claude-3-haiku)
- [google/gemini-3-pro-preview](/google/gemini-3-pro-preview)
- [google/gemini-2.5-pro](/google/gemini-2.5-pro)
- [google/gemini-2.0-flash-001](/google/gemini-2.0-flash-001)
- [google/gemini-2.5-flash](/google/gemini-2.5-flash)
- [mistralai/mistral-large](/mistralai/mistral-large)
- [mistralai/mistral-large-2407](/mistralai/mistral-large-2407)
- [mistralai/mistral-large-2411](/mistralai/mistral-large-2411)
- [mistralai/mistral-medium-3.1](/mistralai/mistral-medium-3.1)
- [mistralai/mistral-nemo](/mistralai/mistral-nemo)
- [mistralai/mistral-7b-instruct](/mistralai/mistral-7b-instruct)
- [mistralai/mixtral-8x7b-instruct](/mistralai/mixtral-8x7b-instruct)
- [mistralai/mixtral-8x22b-instruct](/mistralai/mixtral-8x22b-instruct)
- [mistralai/codestral-2508](/mistralai/codestral-2508)
- [x-ai/grok-4](/x-ai/grok-4)
- [x-ai/grok-3](/x-ai/grok-3)
- [x-ai/grok-3-mini](/x-ai/grok-3-mini)
- [deepseek/deepseek-r1](/deepseek/deepseek-r1)
- [meta-llama/llama-3.3-70b-instruct](/meta-llama/llama-3.3-70b-instruct)
- [meta-llama/llama-3.1-405b-instruct](/meta-llama/llama-3.1-405b-instruct)
- [meta-llama/llama-3.1-70b-instruct](/meta-llama/llama-3.1-70b-instruct)
- [meta-llama/llama-3.1-8b-instruct](/meta-llama/llama-3.1-8b-instruct)
- [meta-llama/llama-3-70b-instruct](/meta-llama/llama-3-70b-instruct)
- [meta-llama/llama-3-8b-instruct](/meta-llama/llama-3-8b-instruct)
- [qwen/qwen3-235b-a22b](/qwen/qwen3-235b-a22b)
- [qwen/qwen3-32b](/qwen/qwen3-32b)
- [qwen/qwen3-14b](/qwen/qwen3-14b)
- [cohere/command-r-plus-08-2024](/cohere/command-r-plus-08-2024)
- [cohere/command-r-08-2024](/cohere/command-r-08-2024)
- [moonshotai/kimi-k2-thinking](/moonshotai/kimi-k2-thinking)
- [perplexity/sonar](/perplexity/sonar)

Auto Router

OpenAI

GPT-5.2 Chat (AKA Instant) is the fast, lightweight member of the 5.2 family, optimized for low-latency chat while retaining strong general intelligence. It uses adaptive reasoning to selectively “think” on harder queries, improving accuracy on math, coding, and multi-step tasks without slowing down typical conversations. The model is warmer and more conversational by default, with better instruction following and more stable short-form reasoning. GPT-5.2 Chat is designed for high-throughput, interactive workloads where responsiveness and consistency matter more than deep deliberation.

OpenAI: GPT-5.2 Chat

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens, it delivers strong performance in general reasoning, visual coding, and agentic tool-calling.

MoonshotAI: Kimi K2.5

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k.

This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

MoonshotAI: Kimi K2 0905

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.

MoonshotAI: Kimi K2 0711

Kimi-Dev-72B is an open-source large language model fine-tuned for software engineering and issue resolution tasks. Based on Qwen2.5-72B, it is optimized using large-scale reinforcement learning that applies code patches in real repositories and validates them via full test suite execution—rewarding only correct, robust completions. The model achieves 60.4% on SWE-bench Verified, setting a new benchmark among open-source models for software bug fixing and code reasoning.

Model	Cost
DeepSeek: DeepSeek V3.2	-75%
xAI: Grok 4.1 Fast	-72%
DeepSeek: DeepSeek V3 0324	-71%
Google: Gemini 3 Flash Preview	-63%
Google: Gemini 2.5 Flash	-51%

Model	Latency
Google: Gemini 3 Flash Preview	-82%
Auto Router	-77%
Google: Gemini 2.5 Flash	-76%
OpenAI: GPT-5.2 Chat	-72%
xAI: Grok 4.1 Fast	-63%

Model	Score	Latency	Cost/1M
MoonshotAI: Kimi K2 0905	87.5%	14.4s	$1.55
MoonshotAI: Kimi K2 0711	86.2%	15.0s	$1.45
MoonshotAI: Kimi K2 0905	86.1%	43.5s	$1.15
MoonshotAI: Kimi Dev 72B	53.1%	173.1s	$0.72
MoonshotAI: Kimi K2.5	—	—	$1.71
MoonshotAI: Kimi K2 0711	—	—	Free

Benchmark	Score	Rank
Character Frequency Bench	100.0%	1 / 35
Spatial Reasoning: Germany	94.8%	7 / 35
Niederstetten Benchmark	44.0%	24 / 41
Money Boy Cultural Literacy Test	42.9%	13 / 35
German Memelord Bench	23.4%	14 / 35

MoonshotAI: Kimi K2 Thinking

Alternatives

Same Quality, Cheaper

Same Quality, Faster

Same Cost, Better

Other Models from MoonshotAI

Benchmark Performance

Price vs Performance

Score Over Time

Benchmark Activity

Quickstart