50 lines
1.4 KiB
Markdown
50 lines
1.4 KiB
Markdown
---
|
|
library_name: transformers
|
|
tags:
|
|
- convergent-evolution
|
|
- fourier-features
|
|
- number-embeddings
|
|
license: mit
|
|
datasets:
|
|
- HuggingFaceFW/fineweb-edu
|
|
---
|
|
|
|
# convergent-llama-300M-adamw-original
|
|
|
|
A 300M-parameter language model trained from scratch on **[FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) sample-10BT (~9.4B tokens)** as part of the *Convergent Evolution* project, which investigates how Fourier features emerge in LLM number embeddings.
|
|
|
|
## Model details
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Architecture** | LLaMA-style Transformer (12 layers, 1024 hidden, 16 heads, GQA) |
|
|
| **Parameters** | ~300M |
|
|
| **Optimizer** | AdamW |
|
|
| **Data perturbation** | standard (unperturbed) text |
|
|
| **Training data** | [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) sample-10BT (~9.4B tokens) |
|
|
| **Context length** | 1024 |
|
|
| **Tokenizer** | Llama 3 (128K vocab) |
|
|
| **Batch size** | 512 sequences |
|
|
|
|
## Usage
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM
|
|
|
|
# Load final checkpoint
|
|
model = AutoModelForCausalLM.from_pretrained("deqing/convergent-llama-300M-adamw-original")
|
|
```
|
|
|
|
## Training dynamics
|
|
|
|
Intermediate checkpoints are saved as branches: `tokens-200M`, `tokens-400M`, ..., `tokens-9.6B`.
|
|
|
|
```python
|
|
# Load intermediate checkpoint (e.g., at 1B tokens)
|
|
model = AutoModelForCausalLM.from_pretrained("deqing/convergent-llama-300M-adamw-original", revision="tokens-1B")
|
|
```
|
|
|
|
## Citation
|
|
|
|
Paper forthcoming.
|