Sparse-Llama-3.1-8B-gsm8k-2of4/README.md

---
tags:
- vllm
- sparsity
pipeline_tag: text-generation
license: llama3.1
base_model: neuralmagic/Sparse-Llama-3.1-8B-2of4
datasets:
- openai/gsm8k
language:
- en
metrics:
- accuracy
---

# Sparse-Llama-3.1-8B-gsm8k-2of4

## Model Overview
- **Model Architecture:** Llama-3.1-8B
  - **Input:** Text
  - **Output:** Text
- **Model Optimizations:**
  - **Sparsity:** 2:4
- **Release Date:** 11/21/2024
- **Version:** 1.0
- **License(s):** [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
- **Model Developers:** Neural Magic

This is AI model especialized in grade-school math obtained by fine-tuning the 2:4 sparse [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) on the [GSM8k](https://huggingface.co/datasets/openai/gsm8k) dataset.
It achieves 66.9% 0-shot accuracy on the test set of GSM8k, compared to 66.3% for the fine-tuned dense model [Llama-3.1-8B-gsm8k](https://huggingface.co/neuralmagic/Llama-3.1-8B-gsm8k) — demonstrating over **100% accuracy recovery**.
In constrast, the pretrained [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) achieves 50.7% 5-shot accuracy and the sparse foundational [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) model achieves 56.3% 5-shot accuracy.


### Model Optimizations

This inherits the optimizations from its parent, [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4).
Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.


## Deployment with vLLM

This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend. vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.


## Evaluation

This model was evaluated on the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).

### Accuracy
#### GSM8k Benchmark
<table>
    <tr>
        <td><strong>Metric</strong></td>
        <td style="text-align: center"><strong>Llama-3.1-8B<br>(5-shot)</strong></td>
        <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-2of4<br>(5-shot)</strong></td>
        <td style="text-align: center"><strong>Llama-3.1-8B-gsm8k<br>(0-shot)</strong></td>
        <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-gsm8k-2of4<br>(0-shot)</strong></td>
    </tr>
    <tr>
        <td>Accuracy</td>
        <td style="text-align: center">50.7%</td>
        <td style="text-align: center">56.3%</td>
        <td style="text-align: center">66.3%</td>
        <td style="text-align: center">66.9%</td>
    </tr>
</table>
初始化项目，由ModelHub XC社区提供模型 Model: RedHatAI/Sparse-Llama-3.1-8B-gsm8k-2of4 Source: Original Platform 2026-04-23 10:28:56 +08:00			`---`
			`tags:`
			`- vllm`
			`- sparsity`
			`pipeline_tag: text-generation`
			`license: llama3.1`
			`base_model: neuralmagic/Sparse-Llama-3.1-8B-2of4`
			`datasets:`
			`- openai/gsm8k`
			`language:`
			`- en`
			`metrics:`
			`- accuracy`
			`---`

			`# Sparse-Llama-3.1-8B-gsm8k-2of4`

			`## Model Overview`
			`- Model Architecture: Llama-3.1-8B`
			`- Input: Text`
			`- Output: Text`
			`- Model Optimizations:`
			`- Sparsity: 2:4`
			`- Release Date: 11/21/2024`
			`- Version: 1.0`
			`- License(s): [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)`
			`- Model Developers: Neural Magic`

			`This is AI model especialized in grade-school math obtained by fine-tuning the 2:4 sparse [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) on the [GSM8k](https://huggingface.co/datasets/openai/gsm8k) dataset.`
			`It achieves 66.9% 0-shot accuracy on the test set of GSM8k, compared to 66.3% for the fine-tuned dense model [Llama-3.1-8B-gsm8k](https://huggingface.co/neuralmagic/Llama-3.1-8B-gsm8k) — demonstrating over 100% accuracy recovery.`
			`In constrast, the pretrained [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) achieves 50.7% 5-shot accuracy and the sparse foundational [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) model achieves 56.3% 5-shot accuracy.`


			`### Model Optimizations`

			`This inherits the optimizations from its parent, [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4).`
			`Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.`


			`## Deployment with vLLM`

			`This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend. vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.`


			`## Evaluation`

			`This model was evaluated on the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).`

			`### Accuracy`
			`#### GSM8k Benchmark`
			`<table>`
			`<tr>`
			`<td><strong>Metric</strong></td>`
			`<td style="text-align: center"><strong>Llama-3.1-8B<br>(5-shot)</strong></td>`
			`<td style="text-align: center"><strong>Sparse-Llama-3.1-8B-2of4<br>(5-shot)</strong></td>`
			`<td style="text-align: center"><strong>Llama-3.1-8B-gsm8k<br>(0-shot)</strong></td>`
			`<td style="text-align: center"><strong>Sparse-Llama-3.1-8B-gsm8k-2of4<br>(0-shot)</strong></td>`
			`</tr>`
			`<tr>`
			`<td>Accuracy</td>`
			`<td style="text-align: center">50.7%</td>`
			`<td style="text-align: center">56.3%</td>`
			`<td style="text-align: center">66.3%</td>`
			`<td style="text-align: center">66.9%</td>`
			`</tr>`
			`</table>`