初始化项目,由ModelHub XC社区提供模型

Model: iamshnoo/combined_only_url_with_metadata_1b_step2k
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-10 16:01:15 +08:00
commit 2d98b4a3c9
11 changed files with 2234 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

79
README.md Normal file
View File

@@ -0,0 +1,79 @@
---
pipeline_tag: text-generation
library_name: transformers
tags:
- text-generation
- metadata-localization
- metadata-ablation
- 1b
- with-metadata
- pretraining
- intermediate-checkpoint
---
# combined_only_url_with_metadata_1b_step2k
## Summary
This repo contains the url 1b step2k model exported from the 2k checkpoint for the metadata localization project. It was trained from scratch on the project corpus, using the Llama 3.2 tokenizer and vocabulary.
## Variant Metadata
- Stage: `pretrain`
- Family: `metadata_ablation`
- Size: `1b`
- Metadata condition: `with_metadata`
- Checkpoint export: `2k`
- Base model lineage: `Trained from scratch; tokenizer/vocabulary from meta-llama/Llama-3.2-1B`
## Weights & Biases Provenance
- Run name: `24/12/2025_22:00:55_combined_only_url_with_metadata_1b`
- Internal run URL: `https://wandb.ai/iamshnoo/nanotron/runs/mgsf3ei7`
- Note: the Weights & Biases workspace is private; public readers should use the summarized metrics and configuration below.
- State: `finished`
- Runtime: `114h 11m 7s`
## Run Summary
- `KPI/train_lm_loss`: `2.0952`
- `KPI/train_perplexity`: `8.1273`
- `KPI/val_loss`: `2.088`
- `KPI/val_perplexity`: `8.0687`
- `KPI/consumed_tokens/train`: `41,943,040,000`
- `_step`: `10,000`
## Training Configuration
- `train_steps`: `10,000`
- `sequence_length`: `2,048`
- `micro_batch_size`: `8`
- `batch_accumulation_per_replica`: `64`
- `learning_rate`: `0.003`
- `min_decay_lr`: `0.0003`
- `checkpoint_interval`: `1,000`
## Training Curves
Static plots below were exported from the private Weights & Biases run and embedded here for public access.
### Train Loss
![Train Loss](assets/train_loss.png)
### Validation Perplexity
![Validation Perplexity](assets/val_perplexity.png)
### Throughput
![Throughput](assets/tokens_per_sec.png)
## Project Context
This model is part of the metadata localization release. Related checkpoints and variants are grouped in the public Hugging Face collection [Metadata Conditioned LLMs](https://huggingface.co/collections/iamshnoo/metadata-conditioned-llms).
- Training data source: [News on the Web (NOW) Corpus](https://www.english-corpora.org/now/)
- Project repository: [https://github.com/iamshnoo/metadata_localization](https://github.com/iamshnoo/metadata_localization)
- Paper: [https://arxiv.org/abs/2601.15236](https://arxiv.org/abs/2601.15236)
Last synced: `2026-04-02 14:43:41 UTC`

BIN
assets/tokens_per_sec.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

BIN
assets/train_loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

BIN
assets/val_perplexity.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 41 KiB

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"dtype": "bfloat16",
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 16,
"num_hidden_layers": 16,
"num_key_value_heads": 16,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"transformers_version": "4.56.2",
"use_cache": true,
"vocab_size": 128256
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.56.2"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:68573838417866d816e4feb39210aeed39ecff32ab4ea0e056cb0ae059783618
size 2694992488

16
special_tokens_map.json Normal file
View File

@@ -0,0 +1,16 @@
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
size 17209920

2062
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff