8608ad4420157591b2d9cdd3d535a6c977310e5f
library_name, tags, license, datasets, language
| library_name | tags | license | datasets | language | |||
|---|---|---|---|---|---|---|---|
| transformers |
|
odc-by |
|
|
Model Card for AICrossSim/clm-60m
A 60M parameter language model trained on 22 * 60M tokens from FineWeb-Edu dataset.
Model Details
aixsim-60M is a transformer-based language model with approximately 60 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.
- Developed by: AICrossSim
- Funded by: ARIA
- Model type: Transformer Language Model
- Language(s) (NLP): English
- Tokenizer: HuggingFaceTB/cosmo2-tokenizer
- Repository: AICrossSim/NewComputeBench
Training Details
Experiment setup and training logs can be found at wandb run.
Usage
import transformers
model_name="AICrossSim/clm-60m"
model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
lm-evaluation-harness
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| wikitext | 2 | none | 0 | bits_per_byte | ↓ | 1.6693 | ± | N/A |
| none | 0 | byte_perplexity | ↓ | 3.1806 | ± | N/A | ||
| none | 0 | word_perplexity | ↓ | 486.5306 | ± | N/A |
Description
Languages
Text
100%