初始化项目,由ModelHub XC社区提供模型
Model: deqing/convergent-llama-300M-adamw-isolate Source: Original Platform
This commit is contained in:
45
README.md
Normal file
45
README.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
library_name: transformers
|
||||
tags:
|
||||
- convergent-evolution
|
||||
- fourier-features
|
||||
- number-embeddings
|
||||
license: mit
|
||||
datasets:
|
||||
- HuggingFaceFW/fineweb-edu
|
||||
---
|
||||
|
||||
# convergent-llama-300M-adamw-isolate
|
||||
|
||||
A 300M-parameter language model trained from scratch on **FineWeb-Edu 10BT** (~9.4B tokens, 1 epoch) as part of the *Convergent Evolution* project, which investigates how Fourier features emerge in LLM number embeddings.
|
||||
|
||||
## Model details
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **Architecture** | LLaMA-style Transformer (12 layers, 1024 hidden, 16 heads, GQA) |
|
||||
| **Parameters** | ~300M |
|
||||
| **Optimizer** | AdamW |
|
||||
| **Data perturbation** | block-diagonal attention mask (numbers cannot attend to context) |
|
||||
| **Training data** | [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) sample-10BT (~9.4B tokens) |
|
||||
| **Context length** | 1024 |
|
||||
| **Tokenizer** | Llama 3 (128K vocab) |
|
||||
| **Batch size** | 512 sequences |
|
||||
|
||||
## Training dynamics
|
||||
|
||||
Intermediate checkpoints are saved as branches: `tokens-200M`, `tokens-400M`, ..., `tokens-9.6B`.
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM
|
||||
|
||||
# Load final checkpoint
|
||||
model = AutoModelForCausalLM.from_pretrained("deqing/convergent-llama-300M-adamw-isolate")
|
||||
|
||||
# Load intermediate checkpoint (e.g., at 1B tokens)
|
||||
model = AutoModelForCausalLM.from_pretrained("deqing/convergent-llama-300M-adamw-isolate", revision="tokens-1B")
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
||||
Paper forthcoming.
|
||||
Reference in New Issue
Block a user