Model: Raghav-Singhal/epe-1p-smollm-1p7b-100B-20n-2048sl-960gbsz-no_bce Source: Original Platform
library_name, pipeline_tag, tags
| library_name | pipeline_tag | tags | |||
|---|---|---|---|---|---|
| transformers | text-generation |
|
epe-1p-smollm-1p7b-100B-20n-2048sl-960gbsz-no_bce
Converted Hugging Face base checkpoint from the Model Raising EPE pretraining run.
Details
- Architecture:
LlamaForCausalLM - Base model size:
1.7B - Precision on disk:
bfloat16 - Source Megatron checkpoint:
iter_0050863 - Tokenizer: extended SmolLM2 tokenizer with 36 additional special tokens (
<assistant>+ 35<charter_X.Y>tokens) - Config vocab size:
49280padded rows - Tokenizer length:
49188
Variant
This is the 1p EPE variant trained without BCE constitution-prediction loss.
Chat Templates
Two named chat templates are provided:
| Name | Use case |
|---|---|
default |
Standard chat format with the plain assistant role |
epe |
Uses <assistant> at the start of assistant turns |
tok.apply_chat_template(messages, chat_template="default")
tok.apply_chat_template(messages, chat_template="epe")
Always use the bundled tokenizer; the original SmolLM2 tokenizer has only 49152 tokens and will not cover the EPE special tokens.
Description
Model synced from source: Raghav-Singhal/epe-1p-smollm-1p7b-100B-20n-2048sl-960gbsz-no_bce
Languages
Text
100%