Files
nesso-0.4B-agentic-mlx/README.md
ModelHub XC 9dfa63d1cf 初始化项目,由ModelHub XC社区提供模型
Model: mlx-community/nesso-0.4B-agentic-mlx
Source: Original Platform
2026-06-03 15:56:17 +08:00

280 lines
9.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- it
- en
license: apache-2.0
tags:
- small-language-model
- slm
- edge-ai
- italian
- bilingual
- function-calling
- agentic
- structured-output
- tool-use
- llama
- mlx
base_model: mii-llm/nesso-0.4B-agentic
model_type: llama
pipeline_tag: text-generation
library_name: mlx
---
# Nesso-0.4B-Agentic-MLX
**Nesso-0.4B-Agentic-MLX** is the Apple Silicon-optimized version of [Nesso-0.4B-Agentic](https://huggingface.co/mii-llm/nesso-0.4B-agentic). It has been converted to the MLX format for high-performance inference on Mac M-series chips.
It is a bilingual English/Italian Small Language Model (SLM) optimized for **function calling, structured output generation, and agentic execution patterns**. It is post-trained on top of [Zagreus-0.4B-ita](https://huggingface.co/mii-llm/zagreus-0.4B-ita), a foundational model trained from scratch by the [mii-llm](https://mii-llm.ai) community (*Made in Italy Large Language Model*) on the [Seeweb](https://www.seeweb.it) HPC infrastructure.
Designed for **sovereign edge inference**, Nesso-0.4B-Agentic targets deployment scenarios that require reliable tool use, structured JSON output, and multi-step agentic reasoning — all within a compact ~400M parameter footprint.
> ⚠️ This model is currently at the **SFT (Supervised Fine-Tuning)** stage. DPO (Direct Preference Optimization) training is planned and updated results will be published upon completion.
---
## Model Details
| Property | Value |
|---|---|
| **Architecture** | Modified Llama-3.2 (fully dense) |
| **Parameters** | ~400M |
| **Hidden size** | 960 |
| **Layers** | 32 |
| **Attention heads** | 15 (KV heads: 5) |
| **Context length** | 4096 tokens |
| **Tokenizer** | Llama-3.2 (`vocab_size`: 128,256) |
| **Format** | MLX |
| **Languages** | English, Italian |
| **Base model** | mii-llm/nesso-0.4B-agentic |
| **Post-training framework** | Axolotl + FSDP |
| **Chat template** | ChatML |
---
## Chat Template
This model uses the **ChatML** format:
```
<|im_start|>system
You are a helpful assistant with access to tools.<|im_end|>
<|im_start|>user
What is the weather in Rome today?<|im_end|>
<|im_start|>assistant
```
Special tokens:
- `pad_token`: `<|im_end|>`
- `eos_token`: `<|im_end|>`
---
## Usage
### Installation
```bash
pip install mlx-lm
```
### Inference via Python
```python
from mlx_lm import load, generate
model_id = "mlx-community/nesso-0.4B-agentic-mlx"
model, tokenizer = load(model_id)
system_prompt = (
"Sei un assistente che può usare strumenti.\n"
"Quando servono informazioni esterne, chiama una funzione.\n"
"Usa ESATTAMENTE il formato <tool_call> previsto."
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Che tempo fa a Milano?"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True, temp=0.3, max_tokens=256)
print(response)
```
### Inference via Terminal
```bash
python -m mlx_lm.generate --model mlx-community/nesso-0.4B-agentic-mlx \
--prompt "<|im_start|>system\nSei un assistente che può usare strumenti.<|im_end|>\n<|im_start|>user\nChe tempo fa a Milano?<|im_end|>\n<|im_start|>assistant\n" \
--temp 0.3 --max-tokens 256
```
> 💡 **Tip**: For function calling and structured output tasks, we recommend using a lower temperature (`0.1``0.3`) to improve JSON validity and output consistency.
---
## Training Details
### Base Model Pre-training
`Nesso-0.4B-Agentic` is built on `Zagreus-0.4B-ita`, which was pre-trained on approximately **1 trillion tokens** using the following data mix:
| Dataset | Description |
| --- | --- |
| [FineWeb (350BT sample)](https://huggingface.co/datasets/HuggingFaceFW/fineweb/viewer/sample-350BT) | ~350B tokens of English web text |
| [FineWeb-2 (ita_Latn)](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2/viewer/ita_Latn) | Italian web text |
| [FinePDFs (ita_Latn)](https://huggingface.co/datasets/HuggingFaceFW/finepdfs/viewer/ita_Latn) | Italian PDF documents |
| [StarCoder Data](https://huggingface.co/datasets/bigcode/starcoderdata) | ~250B tokens of code |
**Token distribution**: ~400B English + ~400B Italian + ~200B Code
**Infrastructure**: 64× NVIDIA A100 GPUs (8 nodes × 8 GPUs) on Seeweb HPC
**Framework**: [Nanotron (mii-llm fork)](https://github.com/mii-llm/nanotron)
### Post-training (SFT)
Post-training was performed using **Axolotl** with FSDP across 4 nodes (32× A100 GPUs).
The instruction dataset is a **proprietary bilingual (English/Italian)** corpus curated by the mii-llm team, with dedicated focus on **function calling, structured JSON output, tool orchestration, and agentic execution patterns**. This dataset was built through years of iteration across domains including finance, cybersecurity, and multi-step agentic workflows, and is considered a strategic research asset not released as open source.
**Key hyperparameters:**
| Hyperparameter | Value |
| --- | --- |
| Optimizer | AdamW (fused) |
| Learning rate | `1e-3` |
| LR scheduler | Cosine (constant ratio: 0.8, min ratio: 0.3) |
| Epochs | 3 |
| Micro batch size | 1 |
| Gradient accumulation steps | 8 |
| Sequence length | 4096 |
| Max grad norm | 1.0 |
| Precision | BF16 + Flash Attention |
| FSDP strategy | FULL_SHARD |
---
## Evaluation
We used our [fork of lm-evaluation-harness](https://github.com/mii-llm/lm-evaluation-harness/) for multilingual
### Evaluation Commands
```bash
# Italian benchmarks
lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
--tasks m_mmlu_it --num_fewshot 5 --device cuda:0 --batch_size 1
lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
--tasks hellaswag_it,arc_it --device cuda:0 --batch_size 1
lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
--tasks ifeval-ita --device cuda:0 --batch_size 1
# English benchmarks
lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
--tasks mmlu --num_fewshot 5 --device cuda:0 --batch_size 1
lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
--tasks hellaswag,arc --device cuda:0 --batch_size 1
lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
--tasks ifeval --device cuda:0 --batch_size 1
```
### Results
#### English Benchmarks
| Model | IFEval EN ↑ | ARC EN ↑ | HellaSwag EN ↑ | MMLU EN ↑ | **Avg EN** |
| --- | --- | --- | --- | --- | --- |
| Qwen/Qwen3-0.6B | 0.2758 | 0.3430 | **0.4742** | **0.4013** | **0.3736** |
| Nesso-0.4B-instruct | **0.3465** | **0.3003** | 0.4629 | 0.2871 | 0.3492 |
| **Nesso-0.4B-agentic** | 0.2962 | 0.2534 | 0.4062 | 0.2889 | 0.3112 |
| LiquidAI/LFM2-350M | 0.1595 | 0.2457 | 0.3092 | 0.3445 | 0.2647 |
#### Italian Benchmarks
| Model | IFEval IT ↑ | ARC IT ↑ | HellaSwag IT ↑ | MMLU IT ↑ | **Avg IT** |
| --- | --- | --- | --- | --- | --- |
| Qwen/Qwen3-0.6B | **0.3058** | 0.2729 | 0.3598 | **0.4025** | **0.3353** |
| Nesso-0.4B-instruct | 0.2962 | **0.2874** | **0.4076** | 0.2875 | 0.3197 |
| **Nesso-0.4B-agentic** | 0.2914 | 0.2541 | 0.3673 | 0.2730 | 0.2965 |
| LiquidAI/LFM2-350M | 0.1427 | 0.2464 | 0.2994 | 0.3132 | 0.2504 |
#### Overall
| Model | Avg EN | Avg IT | **Overall** |
| --- | --- | --- | --- |
| Qwen/Qwen3-0.6B | 0.3736 | 0.3353 | 0.3545 |
| Nesso-0.4B-instruct | 0.3492 | 0.3197 | 0.3345 |
| **Nesso-0.4B-agentic** | 0.3112 | 0.2965 | **0.3039** |
| LiquidAI/LFM2-350M | 0.2647 | 0.2504 | 0.2576 |
### Discussion
Nesso-0.4B-Agentic is trained with a specialization trade-off: its post-training data prioritizes **structured output fidelity, tool calling accuracy, and agentic planning** over general benchmark performance. As a result, scores on standard academic benchmarks (IFEval, MMLU, ARC) are lower than the instruct variant, which is expected behavior for a task-specialized model.
Nesso-0.4B-Agentic still **outperforms LiquidAI/LFM2-350M across all benchmarks** in both languages, confirming its quality as a competitive small model. Its real-world advantage over general-purpose models of similar size is best assessed on agentic and function-calling tasks rather than academic benchmarks.
---
## Related Models
| Model | Description |
| --- | --- |
| [Zagreus-0.4B-ita](https://huggingface.co/mii-llm/zagreus-0.4B-ita) | Base pre-trained model (this model's foundation) |
| [Nesso-0.4B-instruct](https://huggingface.co/mii-llm/nesso-0.4B-instruct) | Optimized for conversational and instruction-following tasks |
| [Open-Zagreus-0.4B](https://huggingface.co/mii-llm/open-zagreus-0.4B) | Fully open-source SFT variant |
---
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{nesso2025,
title = {The Joy and Pain of Training an LLM from Scratch:
A Technical Report on the Zagreus and Nesso Model Families},
author = {mii-llm community},
year = {2025},
howpublished = {\url{[https://github.com/mii-llm/zagreus-nesso-slm](https://github.com/mii-llm/zagreus-nesso-slm)}},
}
```
---
## Acknowledgements
* **Antonio Baldassarra** (CEO, Seeweb) and **Marco Cristofanilli** (Head of AI, Seeweb) for infrastructure sponsorship
* The **Hugging Face** team for Nanotron, datatrove, FineWeb, and FineWeb-2
* The **mii-llm** open-source community
---
## License
Released under the **Apache 2.0** license.
> Made with ❤️ in Italy by [mii-llm](https://mii-llm.ai)
```