初始化项目,由ModelHub XC社区提供模型
Model: AMAImedia/Qwen3-8B-Nemotron-Orchestrator-NOESIS-BF16 Source: Original Platform
This commit is contained in:
145
README.md
Normal file
145
README.md
Normal file
@@ -0,0 +1,145 @@
|
||||
---
|
||||
license: other
|
||||
license_name: nvidia-open-model-license
|
||||
license_link: LICENSE
|
||||
language:
|
||||
- en
|
||||
library_name: transformers
|
||||
tags:
|
||||
- bf16
|
||||
- orchestration
|
||||
- tool-calling
|
||||
- noesis
|
||||
- dhcf-fno
|
||||
- qwen3
|
||||
base_model: nvidia/Nemotron-Orchestrator-8B
|
||||
quantized_by: AMAImedia
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Qwen3-8B-Nemotron-Orchestrator-NOESIS-BF16
|
||||
|
||||
**BF16 reference checkpoint of [nvidia/Nemotron-Orchestrator-8B](https://huggingface.co/nvidia/Nemotron-Orchestrator-8B),
|
||||
losslessly cast from the original FP32 release.**
|
||||
|
||||
Released as part of the **NOESIS Professional Multilingual Dubbing Automation Platform**
|
||||
(framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).
|
||||
|
||||
- **Founder:** Ilia Bolotnikov
|
||||
- **Organization:** [AMAImedia.com](https://www.amaimedia.com)
|
||||
- **X (Twitter):** [@AMAImediacom](https://x.com/AMAImediacom)
|
||||
- **LinkedIn:** [Ilia Bolotnikov](https://www.linkedin.com/in/ilia-bolotnikov)
|
||||
- **Telegram:** [@djbionicl](https://t.me/djbionicl)
|
||||
- **NOESIS version:** v14.6
|
||||
- **Release date:** 2026-04
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ License notice
|
||||
|
||||
This model inherits the **NVIDIA Open Model License** from the upstream
|
||||
`nvidia/Nemotron-Orchestrator-8B`. The base model is designated by NVIDIA as
|
||||
**"for research and development only"**.
|
||||
|
||||
This BF16 derivative is published as a bandwidth-friendly reference checkpoint
|
||||
for the broader research and development community. **Users are responsible
|
||||
for compliance with NVIDIA's license terms** — see the `LICENSE` file in
|
||||
this repository for the full text.
|
||||
|
||||
---
|
||||
|
||||
## Why this BF16 release exists
|
||||
|
||||
The original NVIDIA release ships in **FP32 (~32 GB on disk)**. Most modern
|
||||
inference and quantization tooling (HuggingFace Transformers, vLLM, SGLang,
|
||||
AutoAWQ, AutoGPTQ, llama.cpp BF16 conversion) immediately casts to BF16
|
||||
on load. Publishing a pre-cast BF16 checkpoint:
|
||||
|
||||
- Halves download bandwidth (16 GB vs 32 GB)
|
||||
- Halves disk footprint
|
||||
- Skips a slow load-time cast for users
|
||||
- Provides a clean BF16 baseline for downstream quantization recipes
|
||||
|
||||
The cast is performed via `torch.Tensor.to(dtype=torch.bfloat16)` with
|
||||
IEEE 754 round-to-nearest-even (PyTorch default). BF16 has the same 8-bit
|
||||
exponent range as FP32 and 7 bits of mantissa, which is **lossless for
|
||||
inference-time use** of weight tensors.
|
||||
|
||||
---
|
||||
|
||||
## Model summary
|
||||
|
||||
| Property | Value |
|
||||
| --- | --- |
|
||||
| Base model | nvidia/Nemotron-Orchestrator-8B |
|
||||
| Underlying architecture | Qwen3-8B (decoder-only transformer, **dense, NOT MoE**) |
|
||||
| Source precision | FP32 |
|
||||
| This release precision | BF16 |
|
||||
| Vocab size | 151936 |
|
||||
| Language | English (per base model) |
|
||||
| Disk footprint | ~16 GB |
|
||||
| Inference VRAM | ~17 GB BF16 (full-resident on 24 GB+ GPU) |
|
||||
|
||||
For low-VRAM (6-12 GB) inference, see the AWQ INT4 sibling release:
|
||||
[amaimedia/Nemotron-Orchestrator-8B-Qwen3-AWQ-INT4-NOESIS](https://huggingface.co/amaimedia/Nemotron-Orchestrator-8B-Qwen3-AWQ-INT4-NOESIS).
|
||||
|
||||
---
|
||||
|
||||
## How to use
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch
|
||||
|
||||
model_id = "amaimedia/Qwen3-8B-Nemotron-Orchestrator-NOESIS-BF16"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
prompt = "Plan a multi-step task: find recent AWQ papers, summarize the top three."
|
||||
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
|
||||
print(tokenizer.decode(out[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## NOESIS context
|
||||
|
||||
This BF16 checkpoint is the source artifact for the AWQ INT4 quantization
|
||||
used as the **English orchestration teacher** for NOESIS Specialist
|
||||
**M9-ORCH-4B** during knowledge distillation.
|
||||
|
||||
NOESIS is a 9-specialist dubbing automation platform — see the NOESIS
|
||||
collection for the full specialist family.
|
||||
|
||||
---
|
||||
|
||||
## Acknowledgements & citation
|
||||
|
||||
Base model: ToolOrchestra by NVIDIA & University of Hong Kong.
|
||||
|
||||
```bibtex
|
||||
@misc{toolorchestra,
|
||||
title = {ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration},
|
||||
author = {Hongjin Su and Shizhe Diao and Ximing Lu and others},
|
||||
year = {2025},
|
||||
eprint = {2511.21689},
|
||||
archivePrefix = {arXiv}
|
||||
}
|
||||
```
|
||||
|
||||
NOESIS:
|
||||
|
||||
```bibtex
|
||||
@misc{noesis_v14,
|
||||
title = {NOESIS v14.6: DHCF-FNO Multilingual Dubbing Platform},
|
||||
author = {Bolotnikov, Ilia},
|
||||
year = {2026},
|
||||
publisher = {AMAImedia},
|
||||
url = {https://amaimedia.com}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user