初始化项目,由ModelHub XC社区提供模型
Model: haoranli-ml/Llama-3-8B-CoPE-64k-Instruct Source: Original Platform
This commit is contained in:
38
README.md
Normal file
38
README.md
Normal file
@@ -0,0 +1,38 @@
|
||||
---
|
||||
base_model:
|
||||
- meta-llama/Meta-Llama-3-8B
|
||||
language:
|
||||
- en
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
license: llama3
|
||||
---
|
||||
|
||||
## haoranli-ml/Llama-3-8B-CoPE-64k-Instruct
|
||||
|
||||
[](https://arxiv.org/abs/2602.05258)
|
||||
[](https://github.com/hrlics/CoPE)
|
||||
|
||||
|
||||
### ✨ Overview
|
||||
**CoPE** is a plug-and-play enhancement of RoPE that *softly* clips the unstable low-frequency components, delivering consistent gains both **within the training context** and during **long-context extrapolation**.
|
||||
|
||||
With a simple yet effective soft clipping strategy, CoPE:
|
||||
|
||||
1️⃣ **Eliminates severe OOD outliers**, whose periods exceed the pre-training context window and are the primary cause of OOD extrapolation.
|
||||
|
||||
2️⃣ **Refines Long-range Semantic Signals** by alleviating the secret *long-term decay of semantic attention* introduced by RoPE.
|
||||
|
||||
3️⃣ **Prevents Spectral Leakage** induced by hard frequency truncation, which otherwise leads to long-range oscillatory ringing in the attention scores across relative token distances and introduces spurious correlations.
|
||||
|
||||
For more details on training and evaluation, please refer to the [official GitHub repository](https://github.com/hrlics/CoPE).
|
||||
|
||||
### 📖 Citation
|
||||
```
|
||||
@article{li2026cope,
|
||||
title={CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs},
|
||||
author={Li, Haoran and Ren, Sucheng and Yuille, Alan and Wang, Feng},
|
||||
journal={arXiv preprint arXiv:2602.05258},
|
||||
year={2026}
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user