All models

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

by NVIDIA

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node. Note: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.

Avg Score

0.0%

0 answers

Avg Latency

0ms

0 runs

Pricing

$0.60

input

/

$1.80

output

per 1M tokens

Context

131K

tokens

Alternatives

Models with similar or better quality but different tradeoffs

No alternatives found

Run benchmarks on this model to discover alternatives

Other Models from NVIDIA

Compare performance with other models from the same creator

Benchmark Performance

How this model performs across different benchmarks

No benchmark data available

Run benchmarks with this model to see performance breakdown

Price vs Performance

Compare cost efficiency across all models

Current model
Other models
X-axis uses log scale for better visualization of price range

Score Over Time

Performance trends across all benchmark runs

No score trend data

Score history will appear here after multiple runs

Benchmark Activity

Number of benchmark runs over time

No activity data

Activity will appear here after benchmark runs

Quickstart

Get started with this model using OpenRouter

View on OpenRouter
import { OpenRouter } from "@openrouter/sdk";

const openrouter = new OpenRouter({
  apiKey: "<OPENROUTER_API_KEY>"
});

const completion = await openrouter.chat.completions.create({
  model: "nvidia/llama-3.1-nemotron-ultra-253b-v1",
  messages: [
    {
      role: "user",
      content: "Hello!"
    }
  ]
});

console.log(completion.choices[0].message.content);

Get your API key at openrouter.ai/keys