model: Support nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 (#9301)

2025-08-26 10:33:40 +03:00
parent f92b729d52
commit 4cd08dc592
2 changed files with 7 additions and 0 deletions
--- a/docs/supported_models/generative_models.md
+++ b/docs/supported_models/generative_models.md
@@ -52,3 +52,4 @@ in the GitHub search bar.
 | **Granite 3.0, 3.1** (IBM)               | `ibm-granite/granite-3.1-8b-instruct`                          | IBM's open dense foundation models optimized for reasoning, code, and business AI use cases. Integrated with Red Hat and watsonx systems. |
 | **Granite 3.0 MoE** (IBM)               | `ibm-granite/granite-3.0-3b-a800m-instruct`                          | IBM’s Mixture-of-Experts models offering strong performance with cost-efficiency. MoE expert routing designed for enterprise deployment at scale. |
 | **Llama Nemotron Super** (v1, v1.5, NVIDIA) | `nvidia/Llama-3_3-Nemotron-Super-49B-v1`, `nvidia/Llama-3_3-Nemotron-Super-49B-v1_5` | The [NVIDIA Nemotron](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/) family builds on the strongest open models in the ecosystem by enhancing them with greater accuracy, efficiency, and transparency using NVIDIA open synthetic datasets, advanced techniques, and tools. This enables the creation of practical, right-sized, and high-performing AI agents. |
+| **Llama Nemotron Ultra** (v1, NVIDIA) | `nvidia/Llama-3_1-Nemotron-Ultra-253B-v1` | The [NVIDIA Nemotron](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/) family builds on the strongest open models in the ecosystem by enhancing them with greater accuracy, efficiency, and transparency using NVIDIA open synthetic datasets, advanced techniques, and tools. This enables the creation of practical, right-sized, and high-performing AI agents. |
--- a/test/srt/models/test_generation_models.py
+++ b/test/srt/models/test_generation_models.py
@@ -83,6 +83,12 @@ ALL_MODELS = [
        trust_remote_code=True,
        skip_long_prompt=True,
    ),
+    ModelCase(
+        "nvidia/Llama-3_1-Nemotron-Ultra-253B-v1",
+        tp_size=8,
+        trust_remote_code=True,
+        skip_long_prompt=True,
+    ),
 ]

 TORCH_DTYPES = [torch.float16]