初始化项目，由ModelHub XC社区提供模型

Model: AMAImedia/Qwen3-8B-Nemotron-Orchestrator-NOESIS-BF16 Source: Original Platform
2026-05-01 11:36:08 +08:00
commit 3ad005dd96
15 changed files with 1017 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,145 @@
+---
+license: other
+license_name: nvidia-open-model-license
+license_link: LICENSE
+language:
+- en
+library_name: transformers
+tags:
+- bf16
+- orchestration
+- tool-calling
+- noesis
+- dhcf-fno
+- qwen3
+base_model: nvidia/Nemotron-Orchestrator-8B
+quantized_by: AMAImedia
+pipeline_tag: text-generation
+---
+
+# Qwen3-8B-Nemotron-Orchestrator-NOESIS-BF16
+
+**BF16 reference checkpoint of [nvidia/Nemotron-Orchestrator-8B](https://huggingface.co/nvidia/Nemotron-Orchestrator-8B),
+losslessly cast from the original FP32 release.**
+
+Released as part of the **NOESIS Professional Multilingual Dubbing Automation Platform**
+(framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).
+
+- **Founder:** Ilia Bolotnikov
+- **Organization:** [AMAImedia.com](https://www.amaimedia.com)
+- **X (Twitter):** [@AMAImediacom](https://x.com/AMAImediacom)
+- **LinkedIn:** [Ilia Bolotnikov](https://www.linkedin.com/in/ilia-bolotnikov)
+- **Telegram:** [@djbionicl](https://t.me/djbionicl)
+- **NOESIS version:** v14.6
+- **Release date:** 2026-04
+
+---
+
+## ⚠️ License notice
+
+This model inherits the **NVIDIA Open Model License** from the upstream
+`nvidia/Nemotron-Orchestrator-8B`. The base model is designated by NVIDIA as
+**"for research and development only"**.
+
+This BF16 derivative is published as a bandwidth-friendly reference checkpoint
+for the broader research and development community. **Users are responsible
+for compliance with NVIDIA's license terms** — see the `LICENSE` file in
+this repository for the full text.
+
+---
+
+## Why this BF16 release exists
+
+The original NVIDIA release ships in **FP32 (~32 GB on disk)**. Most modern
+inference and quantization tooling (HuggingFace Transformers, vLLM, SGLang,
+AutoAWQ, AutoGPTQ, llama.cpp BF16 conversion) immediately casts to BF16
+on load. Publishing a pre-cast BF16 checkpoint:
+
+- Halves download bandwidth (16 GB vs 32 GB)
+- Halves disk footprint
+- Skips a slow load-time cast for users
+- Provides a clean BF16 baseline for downstream quantization recipes
+
+The cast is performed via `torch.Tensor.to(dtype=torch.bfloat16)` with
+IEEE 754 round-to-nearest-even (PyTorch default). BF16 has the same 8-bit
+exponent range as FP32 and 7 bits of mantissa, which is **lossless for
+inference-time use** of weight tensors.
+
+---
+
+## Model summary
+
+| Property | Value |
+| --- | --- |
+| Base model | nvidia/Nemotron-Orchestrator-8B |
+| Underlying architecture | Qwen3-8B (decoder-only transformer, **dense, NOT MoE**) |
+| Source precision | FP32 |
+| This release precision | BF16 |
+| Vocab size | 151936 |
+| Language | English (per base model) |
+| Disk footprint | ~16 GB |
+| Inference VRAM | ~17 GB BF16 (full-resident on 24 GB+ GPU) |
+
+For low-VRAM (6-12 GB) inference, see the AWQ INT4 sibling release:
+[amaimedia/Nemotron-Orchestrator-8B-Qwen3-AWQ-INT4-NOESIS](https://huggingface.co/amaimedia/Nemotron-Orchestrator-8B-Qwen3-AWQ-INT4-NOESIS).
+
+---
+
+## How to use
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+
+model_id = "amaimedia/Qwen3-8B-Nemotron-Orchestrator-NOESIS-BF16"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+
+prompt = "Plan a multi-step task: find recent AWQ papers, summarize the top three."
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
+print(tokenizer.decode(out[0], skip_special_tokens=True))
+```
+
+---
+
+## NOESIS context
+
+This BF16 checkpoint is the source artifact for the AWQ INT4 quantization
+used as the **English orchestration teacher** for NOESIS Specialist
+**M9-ORCH-4B** during knowledge distillation.
+
+NOESIS is a 9-specialist dubbing automation platform — see the NOESIS
+collection for the full specialist family.
+
+---
+
+## Acknowledgements & citation
+
+Base model: ToolOrchestra by NVIDIA & University of Hong Kong.
+
+```bibtex
+@misc{toolorchestra,
+  title  = {ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration},
+  author = {Hongjin Su and Shizhe Diao and Ximing Lu and others},
+  year   = {2025},
+  eprint = {2511.21689},
+  archivePrefix = {arXiv}
+}
+```
+
+NOESIS:
+
+```bibtex
+@misc{noesis_v14,
+  title  = {NOESIS v14.6: DHCF-FNO Multilingual Dubbing Platform},
+  author = {Bolotnikov, Ilia},
+  year   = {2026},
+  publisher = {AMAImedia},
+  url    = {https://amaimedia.com}
+}
+```