Files
ModelHub XC 4a60b7f2a4 初始化项目,由ModelHub XC社区提供模型
Model: Raghav-Singhal/epe-1p-smollm-1p7b-100B-20n-2048sl-960gbsz-nochartersoft
Source: Original Platform
2026-05-05 23:57:44 +08:00

43 lines
1.2 KiB
Markdown

---
library_name: transformers
pipeline_tag: text-generation
tags:
- llama
- causal-lm
- bfloat16
---
# epe-1p-smollm-1p7b-100B-20n-2048sl-960gbsz-nochartersoft
Converted Hugging Face base checkpoint from the Model Raising EPE pretraining run.
## Details
- Architecture: `LlamaForCausalLM`
- Base model size: `1.7B`
- Precision on disk: `bfloat16`
- Source Megatron checkpoint: `iter_0050863`
- Tokenizer: extended SmolLM2 tokenizer with 36 additional special tokens (`<assistant>` + 35 `<charter_X.Y>` tokens)
- Config vocab size: `49280` padded rows
- Tokenizer length: `49188`
## Variant
This is the `nochartersoft` EPE run. It uses the same extended tokenizer as the EPE runs, but the charter soft-token slots are excluded from the language-model softmax normalization.
## Chat Templates
Two named chat templates are provided:
| Name | Use case |
|------|----------|
| `default` | Standard chat format with the plain `assistant` role |
| `epe` | Uses `<assistant>` at the start of assistant turns |
```python
tok.apply_chat_template(messages, chat_template="default")
tok.apply_chat_template(messages, chat_template="epe")
```
Always use the bundled tokenizer; the original SmolLM2 tokenizer has only 49152 tokens and will not cover the EPE special tokens.