初始化项目,由ModelHub XC社区提供模型

Model: princeton-nlp/Sheared-LLaMA-2.7B
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-18 10:57:40 +08:00
commit 33ea8a3873
12 changed files with 218 additions and 0 deletions

49
.gitattributes vendored Normal file
View File

@@ -0,0 +1,49 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

62
README.md Normal file
View File

@@ -0,0 +1,62 @@
---
license: apache-2.0
---
---
**Paper**: [https://arxiv.org/pdf/2310.06694.pdf](https://arxiv.org/pdf/2310.06694.pdf)
**Code**: https://github.com/princeton-nlp/LLM-Shearing
**Models**: [Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B), [Sheared-LLaMA-2.7B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B)
**Pruned Models without Continued Pre-training**: [Sheared-LLaMA-1.3B-Pruned](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B-Pruned), [Sheared-LLaMA-2.7B-Pruned](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B-Pruned)
**Instruction-tuned Models**: [Sheared-LLaMA-1.3B-ShareGPT](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT), [Sheared-LLaMA-2.7B-ShareGPT](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT)
**License**: Must comply with license of Llama2 since it's a model derived from Llama2.
---
Sheared-LLaMA-2.7B is a model pruned and further pre-trained from [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). We dynamically load data from different domains in the [RedPajama dataset](https://github.com/togethercomputeub.com/togethercomputer/RedPajama-Data). We use 0.4B tokens for pruning and 50B tokens for continued pre-training the pruned model. This model can be loaded into huggingface via
```
model = AutoModelForCausalLM.from_pretrained("princeton-nlp/Sheared-LLaMA-2.7B")
```
- Smaller-scale
- Same vocabulary as LLaMA1 and LLaMA2
- Derived with a budget of 50B tokens by utilizing existing strong LLMs
## Downstream Tasks
We evaluate on an extensive set of downstream tasks including reasoning, reading comprehension, language modeling and knowledge intensive tasks. Our Sheared-LLaMA models outperform existing large language models.
| Model | # Pre-training Tokens | Average Performance |
| ------------------- | --------------------- | ------------------- |
| LLaMA2-7B | 2T | 64.6 |
**1.3B**
| Model | # Pre-training Tokens | Average Performance |
| ------------------- | --------------------- | ------------------- |
| OPT-1.3B | 300B | 48.2 |
| Pythia-1.4B | 300B | 48.9 |
| Sheared-LLaMA-1.3B | 50B | 51.0 |
**3B**
| Model | # Pre-training Tokens | Average Performance |
| ------------------- | --------------------- | ------------------- |
| OPT-2.7B | 300B | 51.4 |
| Pythia-2.8B | 300B | 52.5 |
| INCITE-Base-3B | 800B | 54.7 |
| Open-LLaMA-3B-v1 | 1T | 55.1 |
| Open-LLaMA-3B-v2 | 1T | 55.7 |
| **Sheared-LLaMA-2.7B** | **50B** | **56.7** |
## Bibtex
```
@article{xia2023sheared,
title={Sheared llama: Accelerating language model pre-training via structured pruning},
author={Xia, Mengzhou and Gao, Tianyu and Zeng, Zhiyuan and Chen, Danqi},
journal={arXiv preprint arXiv:2310.06694},
year={2023}
}
```

26
config.json Normal file
View File

@@ -0,0 +1,26 @@
{
"_name_or_path": "princeton-nlp/Sheared-LLaMA-2.7B",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 2560,
"initializer_range": 0.02,
"intermediate_size": 6912,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 20,
"num_hidden_layers": 32,
"num_key_value_heads": 20,
"pad_token_id": 0,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.28.1",
"use_cache": true,
"vocab_size": 32000
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

7
generation_config.json Normal file
View File

@@ -0,0 +1,7 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.28.1"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ee766e6759aa2cdd4d3749b25d37b38614c95bfa9cada724a1ba76b04117a9df
size 9949092725

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c9fde8a274acfc664a1e726dfd001ce43d86909629f666fd14544fe4d36f4b6d
size 857268225

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:166203b1abe5511370379afe798377fc694f30065e49dba2c563c453a2a63d22
size 26788

23
special_tokens_map.json Normal file
View File

@@ -0,0 +1,23 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bcd04f0eadf90287bd26e1a183ac487d8a141b09b06aecb7725bbdd343640f2e
size 1842767

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
size 499723

35
tokenizer_config.json Normal file
View File

@@ -0,0 +1,35 @@
{
"add_bos_token": true,
"add_eos_token": false,
"bos_token": {
"__type": "AddedToken",
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": false,
"eos_token": {
"__type": "AddedToken",
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"legacy": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": null,
"padding_side": "right",
"sp_model_kwargs": {},
"tokenizer_class": "LlamaTokenizer",
"unk_token": {
"__type": "AddedToken",
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}