初始化项目,由ModelHub XC社区提供模型
Model: Raghav-Singhal/normal-smollm-1p7b-500B-30n-2048sl-960gbsz Source: Original Platform
This commit is contained in:
39
README.md
Normal file
39
README.md
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
language: en
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- smollm
|
||||
- llama
|
||||
- causal-lm
|
||||
- pretraining
|
||||
- base-model
|
||||
model_type: llama
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# normal-smollm-1p7b-500B-30n-2048sl-960gbsz
|
||||
|
||||
This is the base (pretraining) checkpoint for a SmolLM2-style 1.7B model, converted to Hugging Face
|
||||
`LlamaForCausalLM` format from a Megatron-LM distributed checkpoint.
|
||||
|
||||
## Details
|
||||
|
||||
- Parameters: ~1.7B
|
||||
- Context length: 2048
|
||||
- Vocab size: 49152
|
||||
- Architecture: Llama (RMSNorm, SwiGLU, RoPE)
|
||||
- Training: 500B tokens (pretraining)
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_id = "REPLACE_WITH_OWNER/normal-smollm-1p7b-500B-30n-2048sl-960gbsz"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
This is a base model (not instruction-tuned). For chat use, apply SFT/DPO on top of this checkpoint.
|
||||
Reference in New Issue
Block a user