From 7bc9a51f3b70fa79a8c8b7db48629d46fe162933 Mon Sep 17 00:00:00 2001 From: Saurav Muralidharan Date: Mon, 22 Jul 2024 17:24:27 -0700 Subject: [PATCH] Add arXiv details --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 139c308..93cde4d 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ license_link: >- Minitron is a family of small language models (SLMs) obtained by pruning NVIDIA's [Nemotron-4 15B](https://arxiv.org/abs/2402.16819) model. We prune model embedding size, attention heads, and MLP intermediate dimension, following which, we perform continued training with distillation to arrive at the final models. -Deriving the Minitron 8B and 4B models from the base 15B model using our approach requires up to **40x fewer training tokens** per model compared to training from scratch; this results in **compute cost savings of 1.8x** for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. Please refer to our [arXiv paper]() for more details. +Deriving the Minitron 8B and 4B models from the base 15B model using our approach requires up to **40x fewer training tokens** per model compared to training from scratch; this results in **compute cost savings of 1.8x** for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. Please refer to our [arXiv paper](https://arxiv.org/abs/2407.14679) for more details. Minitron models are for research and development only. @@ -60,7 +60,8 @@ If you find our work helpful, please consider citing our paper: @article{minitron2024, title={Compact Language Models via Pruning and Knowledge Distillation}, author={Saurav Muralidharan and Sharath Turuvekere Sreenivas and Raviraj Joshi and Marcin Chochowski and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro and Jan Kautz and Pavlo Molchanov}, - journal={arXiv preprint arXiv:XXX}, - year={2024} + journal={arXiv preprint arXiv:2407.14679}, + year={2024}, + url={https://arxiv.org/abs/2407.14679}, } ```