--- language: - it - en license: apache-2.0 tags: - small-language-model - slm - edge-ai - italian - bilingual - function-calling - agentic - structured-output - tool-use - llama - mlx base_model: mii-llm/nesso-0.4B-agentic model_type: llama pipeline_tag: text-generation library_name: mlx --- # Nesso-0.4B-Agentic-MLX **Nesso-0.4B-Agentic-MLX** is the Apple Silicon-optimized version of [Nesso-0.4B-Agentic](https://huggingface.co/mii-llm/nesso-0.4B-agentic). It has been converted to the MLX format for high-performance inference on Mac M-series chips. It is a bilingual English/Italian Small Language Model (SLM) optimized for **function calling, structured output generation, and agentic execution patterns**. It is post-trained on top of [Zagreus-0.4B-ita](https://huggingface.co/mii-llm/zagreus-0.4B-ita), a foundational model trained from scratch by the [mii-llm](https://mii-llm.ai) community (*Made in Italy – Large Language Model*) on the [Seeweb](https://www.seeweb.it) HPC infrastructure. Designed for **sovereign edge inference**, Nesso-0.4B-Agentic targets deployment scenarios that require reliable tool use, structured JSON output, and multi-step agentic reasoning — all within a compact ~400M parameter footprint. > ⚠️ This model is currently at the **SFT (Supervised Fine-Tuning)** stage. DPO (Direct Preference Optimization) training is planned and updated results will be published upon completion. --- ## Model Details | Property | Value | |---|---| | **Architecture** | Modified Llama-3.2 (fully dense) | | **Parameters** | ~400M | | **Hidden size** | 960 | | **Layers** | 32 | | **Attention heads** | 15 (KV heads: 5) | | **Context length** | 4096 tokens | | **Tokenizer** | Llama-3.2 (`vocab_size`: 128,256) | | **Format** | MLX | | **Languages** | English, Italian | | **Base model** | mii-llm/nesso-0.4B-agentic | | **Post-training framework** | Axolotl + FSDP | | **Chat template** | ChatML | --- ## Chat Template This model uses the **ChatML** format: ``` <|im_start|>system You are a helpful assistant with access to tools.<|im_end|> <|im_start|>user What is the weather in Rome today?<|im_end|> <|im_start|>assistant ``` Special tokens: - `pad_token`: `<|im_end|>` - `eos_token`: `<|im_end|>` --- ## Usage ### Installation ```bash pip install mlx-lm ``` ### Inference via Python ```python from mlx_lm import load, generate model_id = "mlx-community/nesso-0.4B-agentic-mlx" model, tokenizer = load(model_id) system_prompt = ( "Sei un assistente che può usare strumenti.\n" "Quando servono informazioni esterne, chiama una funzione.\n" "Usa ESATTAMENTE il formato previsto." ) messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": "Che tempo fa a Milano?"} ] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True, temp=0.3, max_tokens=256) print(response) ``` ### Inference via Terminal ```bash python -m mlx_lm.generate --model mlx-community/nesso-0.4B-agentic-mlx \ --prompt "<|im_start|>system\nSei un assistente che può usare strumenti.<|im_end|>\n<|im_start|>user\nChe tempo fa a Milano?<|im_end|>\n<|im_start|>assistant\n" \ --temp 0.3 --max-tokens 256 ``` > 💡 **Tip**: For function calling and structured output tasks, we recommend using a lower temperature (`0.1`–`0.3`) to improve JSON validity and output consistency. --- ## Training Details ### Base Model Pre-training `Nesso-0.4B-Agentic` is built on `Zagreus-0.4B-ita`, which was pre-trained on approximately **1 trillion tokens** using the following data mix: | Dataset | Description | | --- | --- | | [FineWeb (350BT sample)](https://huggingface.co/datasets/HuggingFaceFW/fineweb/viewer/sample-350BT) | ~350B tokens of English web text | | [FineWeb-2 (ita_Latn)](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2/viewer/ita_Latn) | Italian web text | | [FinePDFs (ita_Latn)](https://huggingface.co/datasets/HuggingFaceFW/finepdfs/viewer/ita_Latn) | Italian PDF documents | | [StarCoder Data](https://huggingface.co/datasets/bigcode/starcoderdata) | ~250B tokens of code | **Token distribution**: ~400B English + ~400B Italian + ~200B Code **Infrastructure**: 64× NVIDIA A100 GPUs (8 nodes × 8 GPUs) on Seeweb HPC **Framework**: [Nanotron (mii-llm fork)](https://github.com/mii-llm/nanotron) ### Post-training (SFT) Post-training was performed using **Axolotl** with FSDP across 4 nodes (32× A100 GPUs). The instruction dataset is a **proprietary bilingual (English/Italian)** corpus curated by the mii-llm team, with dedicated focus on **function calling, structured JSON output, tool orchestration, and agentic execution patterns**. This dataset was built through years of iteration across domains including finance, cybersecurity, and multi-step agentic workflows, and is considered a strategic research asset not released as open source. **Key hyperparameters:** | Hyperparameter | Value | | --- | --- | | Optimizer | AdamW (fused) | | Learning rate | `1e-3` | | LR scheduler | Cosine (constant ratio: 0.8, min ratio: 0.3) | | Epochs | 3 | | Micro batch size | 1 | | Gradient accumulation steps | 8 | | Sequence length | 4096 | | Max grad norm | 1.0 | | Precision | BF16 + Flash Attention | | FSDP strategy | FULL_SHARD | --- ## Evaluation We used our [fork of lm-evaluation-harness](https://github.com/mii-llm/lm-evaluation-harness/) for multilingual ### Evaluation Commands ```bash # Italian benchmarks lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \ --tasks m_mmlu_it --num_fewshot 5 --device cuda:0 --batch_size 1 lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \ --tasks hellaswag_it,arc_it --device cuda:0 --batch_size 1 lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \ --tasks ifeval-ita --device cuda:0 --batch_size 1 # English benchmarks lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \ --tasks mmlu --num_fewshot 5 --device cuda:0 --batch_size 1 lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \ --tasks hellaswag,arc --device cuda:0 --batch_size 1 lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \ --tasks ifeval --device cuda:0 --batch_size 1 ``` ### Results #### English Benchmarks | Model | IFEval EN ↑ | ARC EN ↑ | HellaSwag EN ↑ | MMLU EN ↑ | **Avg EN** | | --- | --- | --- | --- | --- | --- | | Qwen/Qwen3-0.6B | 0.2758 | 0.3430 | **0.4742** | **0.4013** | **0.3736** | | Nesso-0.4B-instruct | **0.3465** | **0.3003** | 0.4629 | 0.2871 | 0.3492 | | **Nesso-0.4B-agentic** | 0.2962 | 0.2534 | 0.4062 | 0.2889 | 0.3112 | | LiquidAI/LFM2-350M | 0.1595 | 0.2457 | 0.3092 | 0.3445 | 0.2647 | #### Italian Benchmarks | Model | IFEval IT ↑ | ARC IT ↑ | HellaSwag IT ↑ | MMLU IT ↑ | **Avg IT** | | --- | --- | --- | --- | --- | --- | | Qwen/Qwen3-0.6B | **0.3058** | 0.2729 | 0.3598 | **0.4025** | **0.3353** | | Nesso-0.4B-instruct | 0.2962 | **0.2874** | **0.4076** | 0.2875 | 0.3197 | | **Nesso-0.4B-agentic** | 0.2914 | 0.2541 | 0.3673 | 0.2730 | 0.2965 | | LiquidAI/LFM2-350M | 0.1427 | 0.2464 | 0.2994 | 0.3132 | 0.2504 | #### Overall | Model | Avg EN | Avg IT | **Overall** | | --- | --- | --- | --- | | Qwen/Qwen3-0.6B | 0.3736 | 0.3353 | 0.3545 | | Nesso-0.4B-instruct | 0.3492 | 0.3197 | 0.3345 | | **Nesso-0.4B-agentic** | 0.3112 | 0.2965 | **0.3039** | | LiquidAI/LFM2-350M | 0.2647 | 0.2504 | 0.2576 | ### Discussion Nesso-0.4B-Agentic is trained with a specialization trade-off: its post-training data prioritizes **structured output fidelity, tool calling accuracy, and agentic planning** over general benchmark performance. As a result, scores on standard academic benchmarks (IFEval, MMLU, ARC) are lower than the instruct variant, which is expected behavior for a task-specialized model. Nesso-0.4B-Agentic still **outperforms LiquidAI/LFM2-350M across all benchmarks** in both languages, confirming its quality as a competitive small model. Its real-world advantage over general-purpose models of similar size is best assessed on agentic and function-calling tasks rather than academic benchmarks. --- ## Related Models | Model | Description | | --- | --- | | [Zagreus-0.4B-ita](https://huggingface.co/mii-llm/zagreus-0.4B-ita) | Base pre-trained model (this model's foundation) | | [Nesso-0.4B-instruct](https://huggingface.co/mii-llm/nesso-0.4B-instruct) | Optimized for conversational and instruction-following tasks | | [Open-Zagreus-0.4B](https://huggingface.co/mii-llm/open-zagreus-0.4B) | Fully open-source SFT variant | --- ## Citation If you use this model in your research, please cite: ```bibtex @misc{nesso2025, title = {The Joy and Pain of Training an LLM from Scratch: A Technical Report on the Zagreus and Nesso Model Families}, author = {mii-llm community}, year = {2025}, howpublished = {\url{[https://github.com/mii-llm/zagreus-nesso-slm](https://github.com/mii-llm/zagreus-nesso-slm)}}, } ``` --- ## Acknowledgements * **Antonio Baldassarra** (CEO, Seeweb) and **Marco Cristofanilli** (Head of AI, Seeweb) for infrastructure sponsorship * The **Hugging Face** team for Nanotron, datatrove, FineWeb, and FineWeb-2 * The **mii-llm** open-source community --- ## License Released under the **Apache 2.0** license. > Made with ❤️ in Italy by [mii-llm](https://mii-llm.ai) ```