From 7bc9a51f3b70fa79a8c8b7db48629d46fe162933 Mon Sep 17 00:00:00 2001
From: Saurav Muralidharan <sauravm@nvidia.com>
Date: Mon, 22 Jul 2024 17:24:27 -0700
Subject: [PATCH] Add arXiv details

---
 README.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 139c308..93cde4d 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ license_link: >-
 
 Minitron is a family of small language models (SLMs) obtained by pruning NVIDIA's [Nemotron-4 15B](https://arxiv.org/abs/2402.16819) model. We prune model embedding size, attention heads, and MLP intermediate dimension, following which, we perform continued training with distillation to arrive at the final models.
 
-Deriving the Minitron 8B and 4B models from the base 15B model using our approach requires up to **40x fewer training tokens** per model compared to training from scratch; this results in **compute cost savings of 1.8x** for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. Please refer to our [arXiv paper]() for more details.
+Deriving the Minitron 8B and 4B models from the base 15B model using our approach requires up to **40x fewer training tokens** per model compared to training from scratch; this results in **compute cost savings of 1.8x** for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. Please refer to our [arXiv paper](https://arxiv.org/abs/2407.14679) for more details.
 
 Minitron models are for research and development only.
 
@@ -60,7 +60,8 @@ If you find our work helpful, please consider citing our paper:
 @article{minitron2024,
       title={Compact Language Models via Pruning and Knowledge Distillation}, 
       author={Saurav Muralidharan and Sharath Turuvekere Sreenivas and Raviraj Joshi and Marcin Chochowski and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro and Jan Kautz and Pavlo Molchanov},
-      journal={arXiv preprint arXiv:XXX},
-      year={2024}
+      journal={arXiv preprint arXiv:2407.14679},
+      year={2024},
+      url={https://arxiv.org/abs/2407.14679}, 
 }
 ```