--- license: other license_name: nvidia-open-model-license license_link: LICENSE language: - en library_name: transformers tags: - bf16 - orchestration - tool-calling - noesis - dhcf-fno - qwen3 base_model: nvidia/Nemotron-Orchestrator-8B quantized_by: AMAImedia pipeline_tag: text-generation --- # Nemotron-Orchestrator-8B-Qwen3-BF16-NOESIS **BF16 reference checkpoint of [nvidia/Nemotron-Orchestrator-8B](https://huggingface.co/nvidia/Nemotron-Orchestrator-8B), losslessly cast from the original FP32 release.** Released as part of the **NOESIS Professional Multilingual Dubbing Automation Platform** (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators). - **Founder:** Ilia Bolotnikov - **Organization:** [AMAImedia.com](https://www.amaimedia.com) - **X (Twitter):** [@AMAImediacom](https://x.com/AMAImediacom) - **LinkedIn:** [Ilia Bolotnikov](https://www.linkedin.com/in/ilia-bolotnikov) - **Telegram:** [@djbionicl](https://t.me/djbionicl) - **NOESIS version:** v14.6 - **Release date:** 2026-04 --- ## ⚠️ License notice This model inherits the **NVIDIA Open Model License** from the upstream `nvidia/Nemotron-Orchestrator-8B`. The base model is designated by NVIDIA as **"for research and development only"**. This BF16 derivative is published as a bandwidth-friendly reference checkpoint for the broader research and development community. **Users are responsible for compliance with NVIDIA's license terms** — see the `LICENSE` file in this repository for the full text. --- ## Why this BF16 release exists The original NVIDIA release ships in **FP32 (~32 GB on disk)**. Most modern inference and quantization tooling (HuggingFace Transformers, vLLM, SGLang, AutoAWQ, AutoGPTQ, llama.cpp BF16 conversion) immediately casts to BF16 on load. Publishing a pre-cast BF16 checkpoint: - Halves download bandwidth (16 GB vs 32 GB) - Halves disk footprint - Skips a slow load-time cast for users - Provides a clean BF16 baseline for downstream quantization recipes The cast is performed via `torch.Tensor.to(dtype=torch.bfloat16)` with IEEE 754 round-to-nearest-even (PyTorch default). BF16 has the same 8-bit exponent range as FP32 and 7 bits of mantissa, which is **lossless for inference-time use** of weight tensors. --- ## Model summary | Property | Value | | --- | --- | | Base model | nvidia/Nemotron-Orchestrator-8B | | Underlying architecture | Qwen3-8B (decoder-only transformer, **dense, NOT MoE**) | | Source precision | FP32 | | This release precision | BF16 | | Vocab size | 151936 | | Language | English (per base model) | | Disk footprint | ~16 GB | | Inference VRAM | ~17 GB BF16 (full-resident on 24 GB+ GPU) | For low-VRAM (6-12 GB) inference, see the AWQ INT4 sibling release: [amaimedia/Nemotron-Orchestrator-8B-Qwen3-AWQ-INT4-NOESIS](https://huggingface.co/amaimedia/Nemotron-Orchestrator-8B-Qwen3-AWQ-INT4-NOESIS). --- ## How to use ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "amaimedia/Nemotron-Orchestrator-8B-Qwen3-BF16-NOESIS" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) prompt = "Plan a multi-step task: find recent AWQ papers, summarize the top three." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=256, do_sample=False) print(tokenizer.decode(out[0], skip_special_tokens=True)) ``` --- ## NOESIS context This BF16 checkpoint is the source artifact for the AWQ INT4 quantization used as the **English orchestration teacher** for NOESIS Specialist **M9-ORCH-4B** during knowledge distillation. NOESIS is a 9-specialist dubbing automation platform — see the NOESIS collection for the full specialist family. --- ## Acknowledgements & citation Base model: ToolOrchestra by NVIDIA & University of Hong Kong. ```bibtex @misc{toolorchestra, title = {ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration}, author = {Hongjin Su and Shizhe Diao and Ximing Lu and others}, year = {2025}, eprint = {2511.21689}, archivePrefix = {arXiv} } ``` NOESIS: ```bibtex @misc{noesis_v14, title = {NOESIS v14.6: DHCF-FNO Multilingual Dubbing Platform}, author = {Bolotnikov, Ilia}, year = {2026}, publisher = {AMAImedia}, url = {https://amaimedia.com} } ```