初始化项目,由ModelHub XC社区提供模型
Model: iamshnoo/combined_no_europe_with_metadata_1b Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
77
README.md
Normal file
77
README.md
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
---
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
library_name: transformers
|
||||||
|
tags:
|
||||||
|
- text-generation
|
||||||
|
- metadata-localization
|
||||||
|
- leave-one-out
|
||||||
|
- 1b
|
||||||
|
- with-metadata
|
||||||
|
- pretraining
|
||||||
|
---
|
||||||
|
|
||||||
|
# combined_no_europe_with_metadata_1b
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
This repo contains the leave out europe 1b model at the final 10k-step checkpoint for the metadata localization project. It was trained from scratch on the project corpus, using the Llama 3.2 tokenizer and vocabulary.
|
||||||
|
|
||||||
|
## Variant Metadata
|
||||||
|
|
||||||
|
- Stage: `pretrain`
|
||||||
|
- Family: `leave_one_out`
|
||||||
|
- Size: `1b`
|
||||||
|
- Metadata condition: `with_metadata`
|
||||||
|
- Base model lineage: `Trained from scratch; tokenizer/vocabulary from meta-llama/Llama-3.2-1B`
|
||||||
|
|
||||||
|
## Weights & Biases Provenance
|
||||||
|
|
||||||
|
- Run name: `20/12/2025_15:50:34_combined_no_europe_with_metadata_1b`
|
||||||
|
- Internal run URL: `https://wandb.ai/iamshnoo/nanotron/runs/7olwlrxb`
|
||||||
|
- Note: the Weights & Biases workspace is private; public readers should use the summarized metrics and configuration below.
|
||||||
|
- State: `finished`
|
||||||
|
- Runtime: `114h 6m 59s`
|
||||||
|
|
||||||
|
## Run Summary
|
||||||
|
|
||||||
|
- `KPI/train_lm_loss`: `2.0222`
|
||||||
|
- `KPI/train_perplexity`: `7.5547`
|
||||||
|
- `KPI/val_loss`: `2.0837`
|
||||||
|
- `KPI/val_perplexity`: `8.0342`
|
||||||
|
- `KPI/consumed_tokens/train`: `41,943,040,000`
|
||||||
|
- `_step`: `10,000`
|
||||||
|
|
||||||
|
## Training Configuration
|
||||||
|
|
||||||
|
- `train_steps`: `10,000`
|
||||||
|
- `sequence_length`: `2,048`
|
||||||
|
- `micro_batch_size`: `8`
|
||||||
|
- `batch_accumulation_per_replica`: `64`
|
||||||
|
- `learning_rate`: `0.003`
|
||||||
|
- `min_decay_lr`: `0.0003`
|
||||||
|
- `checkpoint_interval`: `1,000`
|
||||||
|
|
||||||
|
## Training Curves
|
||||||
|
|
||||||
|
Static plots below were exported from the private Weights & Biases run and embedded here for public access.
|
||||||
|
|
||||||
|
### Train Loss
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
### Validation Perplexity
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
### Throughput
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Project Context
|
||||||
|
|
||||||
|
This model is part of the metadata localization release. Related checkpoints and variants are grouped in the public Hugging Face collection [Metadata Conditioned LLMs](https://huggingface.co/collections/iamshnoo/metadata-conditioned-llms).
|
||||||
|
- Training data source: [News on the Web (NOW) Corpus](https://www.english-corpora.org/now/)
|
||||||
|
- Project repository: [https://github.com/iamshnoo/metadata_localization](https://github.com/iamshnoo/metadata_localization)
|
||||||
|
- Paper: [https://arxiv.org/abs/2601.15236](https://arxiv.org/abs/2601.15236)
|
||||||
|
|
||||||
|
Last synced: `2026-04-02 14:46:08 UTC`
|
||||||
BIN
assets/tokens_per_sec.png
Normal file
BIN
assets/tokens_per_sec.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 61 KiB |
BIN
assets/train_loss.png
Normal file
BIN
assets/train_loss.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 38 KiB |
BIN
assets/val_perplexity.png
Normal file
BIN
assets/val_perplexity.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 39 KiB |
29
config.json
Normal file
29
config.json
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"dtype": "bfloat16",
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"head_dim": 128,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 2048,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 5632,
|
||||||
|
"max_position_embeddings": 2048,
|
||||||
|
"mlp_bias": false,
|
||||||
|
"model_type": "llama",
|
||||||
|
"num_attention_heads": 16,
|
||||||
|
"num_hidden_layers": 16,
|
||||||
|
"num_key_value_heads": 16,
|
||||||
|
"pretraining_tp": 1,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"transformers_version": "4.56.2",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 128256
|
||||||
|
}
|
||||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"transformers_version": "4.56.2"
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:8f03a8e18eb4b71f45b7aaeabd4dcc6147d83049e9b5c95272885da662fe6768
|
||||||
|
size 2694992488
|
||||||
16
special_tokens_map.json
Normal file
16
special_tokens_map.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|begin_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|end_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
|
||||||
|
size 17209920
|
||||||
2062
tokenizer_config.json
Normal file
2062
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user