Model: Raghav-Singhal/normal-smollm-1p7b-500B-30n-2048sl-960gbsz Source: Original Platform
language, license, tags, model_type, pipeline_tag
| language | license | tags | model_type | pipeline_tag | |||||
|---|---|---|---|---|---|---|---|---|---|
| en | apache-2.0 |
|
llama | text-generation |
normal-smollm-1p7b-500B-30n-2048sl-960gbsz
This is the base (pretraining) checkpoint for a SmolLM2-style 1.7B model, converted to Hugging Face
LlamaForCausalLM format from a Megatron-LM distributed checkpoint.
Details
- Parameters: ~1.7B
- Context length: 2048
- Vocab size: 49152
- Architecture: Llama (RMSNorm, SwiGLU, RoPE)
- Training: 500B tokens (pretraining)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "REPLACE_WITH_OWNER/normal-smollm-1p7b-500B-30n-2048sl-960gbsz"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
Notes
This is a base model (not instruction-tuned). For chat use, apply SFT/DPO on top of this checkpoint.
Description
Languages
Jinja
100%