Model: Raghav-Singhal/epe-1p-smollm-1p7b-100B-20n-2048sl-960gbsz-bce Source: Original Platform
43 lines
1.1 KiB
Markdown
43 lines
1.1 KiB
Markdown
---
|
|
library_name: transformers
|
|
pipeline_tag: text-generation
|
|
tags:
|
|
- llama
|
|
- causal-lm
|
|
- bfloat16
|
|
---
|
|
|
|
# epe-1p-smollm-1p7b-100B-20n-2048sl-960gbsz-bce
|
|
|
|
Converted Hugging Face base checkpoint from the Model Raising EPE pretraining run.
|
|
|
|
## Details
|
|
|
|
- Architecture: `LlamaForCausalLM`
|
|
- Base model size: `1.7B`
|
|
- Precision on disk: `bfloat16`
|
|
- Source Megatron checkpoint: `iter_0050863`
|
|
- Tokenizer: extended SmolLM2 tokenizer with 36 additional special tokens (`<assistant>` + 35 `<charter_X.Y>` tokens)
|
|
- Config vocab size: `49280` padded rows
|
|
- Tokenizer length: `49188`
|
|
|
|
## Variant
|
|
|
|
This is the `1p` EPE variant trained with BCE constitution-prediction loss.
|
|
|
|
## Chat Templates
|
|
|
|
Two named chat templates are provided:
|
|
|
|
| Name | Use case |
|
|
|------|----------|
|
|
| `default` | Standard chat format with the plain `assistant` role |
|
|
| `epe` | Uses `<assistant>` at the start of assistant turns |
|
|
|
|
```python
|
|
tok.apply_chat_template(messages, chat_template="default")
|
|
tok.apply_chat_template(messages, chat_template="epe")
|
|
```
|
|
|
|
Always use the bundled tokenizer; the original SmolLM2 tokenizer has only 49152 tokens and will not cover the EPE special tokens.
|