初始化项目，由ModelHub XC社区提供模型

Model: RedHatAI/Sparse-Llama-3.1-8B-ultrachat_200k-2of4 Source: Original Platform
2026-04-23 09:15:59 +08:00
commit eb47ccee3e
13 changed files with 146 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,60 @@
+---
+tags:
+- vllm
+- sparsity
+pipeline_tag: text-generation
+license: llama3.1
+base_model: neuralmagic/Sparse-Llama-3.1-8B-2of4
+datasets:
+- HuggingFaceH4/ultrachat_200k
+language:
+- en
+---
+
+# Sparse-Llama-3.1-8B-ultrachat_200k-2of4
+
+## Model Overview
+- **Model Architecture:** Llama-3.1-8B
+  - **Input:** Text
+  - **Output:** Text
+- **Model Optimizations:**
+  - **Sparsity:** 2:4
+- **Release Date:** 11/21/2024
+- **Version:** 1.0
+- **License(s):** [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
+- **Model Developers:** Neural Magic
+
+This is a multi-turn conversational AI model obtained by fine-tuning the 2:4 sparse [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) on the [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) dataset.
+On the [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) benchmark (version 1), it achieves a score of 61.1, compared to 62.0 for the fine-tuned dense model [Llama-3.1-8B-ultrachat_200k](https://huggingface.co/neuralmagic/Llama-3.1-8B-ultrachat_200k) — demonstrating a **98.5% accuracy recovery**.
+
+
+### Model Optimizations
+
+This inherits the optimizations from its parent, [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4).
+Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.
+
+
+## Deployment with vLLM
+
+This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend. vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
+
+
+## Evaluation
+
+This model was evaluated on Neural Magic's fork of [AlpacaEval](https://github.com/neuralmagic/alpaca_eval) benchmark.
+We adopt the same setup as in [Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment](https://arxiv.org/abs/2405.03594), using version 1 of the benchmark and [Llama-2-70b-chat](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) as the annotator.
+
+### Accuracy
+#### AlpacaEval Benchmark
+<table>
+    <tr>
+        <td><strong>Metric</strong></td>
+        <td style="text-align: center"><strong>Llama-3.1-8B-ultrachat_200k</strong></td>
+        <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-ultrachat_200k-2of4</strong></td>
+    </tr>
+    <tr>
+        <td>Win rate</td>
+        <td style="text-align: center">62.0</td>
+        <td style="text-align: center">61.1</td>
+    </tr>
+</table>