Update README.md
This commit is contained in:
@@ -3,8 +3,15 @@ license: other
|
|||||||
license_name: nvidia-open-model-license
|
license_name: nvidia-open-model-license
|
||||||
license_link: >-
|
license_link: >-
|
||||||
https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
|
https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
|
||||||
|
library_name: transformers
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
tags:
|
||||||
|
- nvidia
|
||||||
|
- llama-3
|
||||||
|
- pytorch
|
||||||
---
|
---
|
||||||
|
|
||||||
# Model Overview
|
# Model Overview
|
||||||
|
|
||||||
Minitron-8B-Base is a large language model (LLM) obtained by pruning Nemotron-4 15B; specifically, we prune model embedding size, number of attention heads, and MLP intermediate dimension. Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
|
Minitron-8B-Base is a large language model (LLM) obtained by pruning Nemotron-4 15B; specifically, we prune model embedding size, number of attention heads, and MLP intermediate dimension. Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
|
||||||
|
|||||||
Reference in New Issue
Block a user