22 lines
1.4 KiB
Markdown
22 lines
1.4 KiB
Markdown
---
|
|
license: llama2
|
|
---
|
|
|
|
---
|
|
license: llama2
|
|
---
|
|
|
|
|
|
**Paper**: [https://arxiv.org/pdf/2310.06694.pdf](https://arxiv.org/pdf/2310.06694.pdf)
|
|
**Code**: https://github.com/princeton-nlp/LLM-Shearing
|
|
**Models**: [Sheared-LLaMA-1.3B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B), [Sheared-LLaMA-2.7B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B)
|
|
**Pruned Models without Continued Pre-training**: [Sheared-LLaMA-1.3B-Pruned](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B-Pruned), [Sheared-LLaMA-2.7B-Pruned](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B-Pruned)
|
|
**Instruction-tuned Models**: [Sheared-LLaMA-1.3B-ShareGPT](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT), [Sheared-LLaMA-2.7B-ShareGPT](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT)
|
|
|
|
**License**: Must comply with license of Llama2 since it's a model derived from Llama2.
|
|
|
|
Sheared-LLaMA-2.7B-Pruned is the model pruned from [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) **without continued pre-training**.
|
|
We used roughly 0.4B tokens to perform the pruning experiment. This model could be a good use to study
|
|
- effective data mixtures for continued pre-training
|
|
- comparisons to other pruning techniques
|
|
- extensive evaluations to understand how pruning affects knowledge and reasoning capabilities of LLMs |