Files
Nemotron-Orchestrator-8B-Qw…/README_NEMOTRON_BF16.md
ModelHub XC 0c4e1c7daf 初始化项目,由ModelHub XC社区提供模型
Model: AMAImedia/Nemotron-Orchestrator-8B-Qwen3-BF16-NOESIS
Source: Original Platform
2026-04-20 09:19:59 +08:00

7.7 KiB

license, license_name, license_link, language, library_name, tags, base_model, quantized_by, pipeline_tag
license license_name license_link language library_name tags base_model quantized_by pipeline_tag
other nvidia-open-model-license LICENSE
en
transformers
bf16
orchestration
tool-calling
noesis
dhcf-fno
qwen3
nvidia/Nemotron-Orchestrator-8B AMAImedia text-generation

Nemotron-Orchestrator-8B-Qwen3-BF16-NOESIS

BF16 reference checkpoint of nvidia/Nemotron-Orchestrator-8B, losslessly cast from the original FP32 release.

This release halves the download bandwidth and disk footprint of the original model (from ~32 GB FP32 down to ~16 GB BF16) without any quality loss for inference, and serves as the source artifact for the AWQ INT4 quantization sibling.

Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).

  • Founder: Ilia Bolotnikov · Telegram: @djbionicl
  • Organization: AMAImedia.com
  • NOESIS version: v14.1 · Release date: 2026-04
  • License: NVIDIA Open Model License (research and development only — see LICENSE)

NOESIS context

NOESIS is a 9-specialist multilingual video dubbing platform (full dubbing in 30 languages, ASR in 150+). This BF16 checkpoint is the source artifact for the AWQ INT4 quantization used as the English orchestration teacher for NOESIS Specialist M9-ORCH-4B during knowledge distillation.

NOESIS specialists overview:

ID Role Size
M1 ASR (150+ langs) 10B/3B
M2 Dubbing LM (30 langs full) 10B/3B
M3 TTS + voice cloning 10B/3B
M4 Chat + creative writing 10B/3B
M5 Code + math 10B/3B
M6 Deep research (1M ctx) 10B/3B
M7 Prompt engineering 4B/0.8B
M8 Quality control (PRM) 4B/0.8B
M9 Orchestrator + routing 4B/0.8B

Conversion details

Property Value
Source precision FP32 (~32 GB)
This release precision BF16 (~15.3 GB)
Conversion method torch.Tensor.to(dtype=torch.bfloat16)
Rounding IEEE 754 round-to-nearest-even (PyTorch default)
Quality impact None for inference (BF16 has same 8-bit exponent as FP32)

For low-VRAM (6-12 GB) inference, see the AWQ INT4 sibling release: amaimedia/Nemotron-Orchestrator-8B-Qwen3-AWQ-INT4-NOESIS.


How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "amaimedia/Nemotron-Orchestrator-8B-Qwen3-BF16-NOESIS"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Plan a multi-step task: find recent AWQ papers, summarize the top three."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

⚠️ License notice

This model inherits the NVIDIA Open Model License from the upstream nvidia/Nemotron-Orchestrator-8B. The base model is designated by NVIDIA as "for research and development only".

This BF16 derivative is published as a bandwidth-friendly reference checkpoint for the broader research and development community. Users are responsible for compliance with NVIDIA's license terms — see the LICENSE file in this repository for the full text.


Original Model Card

The following is the original model card from nvidia/Nemotron-Orchestrator-8B, reproduced here for convenience. All credit for the underlying model goes to NVIDIA & the University of Hong Kong.


ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper Code Model Data Website

Description

Orchestrator-8B is a state-of-the-art 8B parameter orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools.

On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.

This model is for research and development only.

Key Features

  • Intelligent Orchestration: Capable of managing heterogeneous toolsets including basic tools (search, code execution) and other LLMs (specialized and generalist).
  • Multi-Objective RL Training: Trained via Group Relative Policy Optimization (GRPO) with a novel reward function that optimizes for accuracy, latency/cost, and adherence to user preferences.
  • Efficiency: Delivers higher accuracy at significantly lower computational cost compared to monolithic frontier models.
  • Robust Generalization: Demonstrated ability to generalize to unseen tools and pricing configurations.

Benchmark

On Humanity's Last Exam, Orchestrator-8B achieves 37.1%, surpassing GPT-5 (35.1%) with only 30% monetary cost and 2.5x faster. On FRAMES and τ²-Bench, Orchestrator-8B consistently outperforms strong monolithic systems, demonstrating versatile reasoning and robust tool orchestration.

Orchestrator-8B consistently outperforms GPT-5, Claude Opus 4.1 and Qwen3-235B-A22B on HLE with substantially lower cost.

Model Details

  • Developed by: NVIDIA & University of Hong Kong
  • Model Type: Decoder-only Transformer (dense, not MoE)
  • Base Model: Qwen3-8B
  • Parameters: 8B
  • Language(s): English
  • License: NVIDIA License

Model Version

1.0

Training Dataset

Dataset Link
GeneralThought-430K Link
ToolScale Link

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Citation

@misc{toolorchestra,
  title  = {ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration},
  author = {Hongjin Su and Shizhe Diao and Ximing Lu and Mingjie Liu and Jiacheng Xu and Xin Dong and Yonggan Fu and Peter Belcak and Hanrong Ye and Hongxu Yin and Yi Dong and Evelina Bakhturina and Tao Yu and Yejin Choi and Jan Kautz and Pavlo Molchanov},
  year   = {2025},
  eprint = {2511.21689},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url    = {https://arxiv.org/abs/2511.21689}
}

NOESIS citation

@misc{noesis_v14,
  title  = {NOESIS v14.1: DHCF-FNO Multilingual Dubbing Platform},
  author = {Bolotnikov, Ilia},
  year   = {2026},
  publisher = {AMAImedia},
  url    = {https://amaimedia.com}
}