初始化项目,由ModelHub XC社区提供模型
Model: raincandy-u/Rain-100M Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
100
README.md
Normal file
100
README.md
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
---
|
||||||
|
license: apache-2.0
|
||||||
|
datasets:
|
||||||
|
- HuggingFaceFW/fineweb-edu
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
library_name: transformers
|
||||||
|
---
|
||||||
|
|
||||||
|
|
||||||
|
# 🩵 Rain-100M — Model Card
|
||||||
|
|
||||||
|
**Rain-100M** is an experimental language model trained from scratch based on the **Qwen3 architecture**.
|
||||||
|
|
||||||
|
## 🧠 Training & Data
|
||||||
|
|
||||||
|
* **Training corpus**: `HuggingFaceFW/fineweb-edu`
|
||||||
|
* **Total tokens**: ~**3B**
|
||||||
|
* **Language**: English only
|
||||||
|
* **Tokenizer**: Newly trained **16k BPE** (optimized for small/compact models)
|
||||||
|
* **Max sequence length**: 4096
|
||||||
|
|
||||||
|
**Sample training metrics**:
|
||||||
|
|
||||||
|
```text
|
||||||
|
train/grad_norm: 0.6640625
|
||||||
|
train/learning_rate: 0.00000000002171853813
|
||||||
|
train/loss: 3.4459
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🏗️ Architecture (Qwen3-style)
|
||||||
|
|
||||||
|
* **Parameters**: ~100M
|
||||||
|
* **Layers**: 12 Transformer layers
|
||||||
|
* **Hidden size**: 768
|
||||||
|
* **Attention heads**: 12
|
||||||
|
* **MLP dimension**: 2048
|
||||||
|
* **Activation**: SiLU
|
||||||
|
* **Weight dtype**: bfloat16
|
||||||
|
* **RMSNorm eps**: 1e-6
|
||||||
|
* **RoPE θ**: 10000
|
||||||
|
* **Inference framework**: `transformers`
|
||||||
|
|
||||||
|
## ⚠️ Limitations
|
||||||
|
|
||||||
|
* Trained only on English data; weak or no capabilities in other languages, not suitable as a general-purpose chat model or for safety-critical use cases.
|
||||||
|
* No system-level alignment or safety fine-tuning has been applied.
|
||||||
|
|
||||||
|
## 📄 License
|
||||||
|
|
||||||
|
When using this model locally, please also comply with the licenses of the `fineweb-edu` dataset and the `transformers` / Qwen3-related components.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# 中文版本
|
||||||
|
|
||||||
|
# 🩵 Rain-100M — Model Card
|
||||||
|
|
||||||
|
**Rain-100M** 是一个基于 **Qwen3 架构** 从零训练的实验语言模型。
|
||||||
|
|
||||||
|
## 🧠 训练与数据
|
||||||
|
|
||||||
|
* **训练语料**:HuggingFaceFW/fineweb-edu
|
||||||
|
* **Tokens 数量**:约 **3B**
|
||||||
|
* **语言**:仅英语
|
||||||
|
* **Tokenizer**:全新训练的 **16k BPE**(面向轻量模型优化)
|
||||||
|
* **最大序列长度**:4096
|
||||||
|
|
||||||
|
**训练参数**:
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
train/grad_norm:0.6640625
|
||||||
|
train/learning_rate:0.00000000002171853813
|
||||||
|
train/loss:3.4459
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## 🏗️ 模型结构(Qwen3 规格)
|
||||||
|
|
||||||
|
* **参数量**:约 100M
|
||||||
|
* **层数**:12 层 Transformer
|
||||||
|
* **隐藏维度**:768
|
||||||
|
* **注意力头数**:12
|
||||||
|
* **中间层维度**:2048
|
||||||
|
* **激活函数**:SiLU
|
||||||
|
* **权重类型**:bfloat16
|
||||||
|
* **RMSNorm eps**:1e-6
|
||||||
|
* **RoPE θ**:10000
|
||||||
|
* **推理框架**:transformers
|
||||||
|
|
||||||
|
## ⚠️ 限制
|
||||||
|
|
||||||
|
* 仅使用英文语料,小语言能力有限,不适合作为通用聊天或安全敏感任务;
|
||||||
|
* 无系统对齐与安全强化。
|
||||||
|
|
||||||
|
## 📄 License
|
||||||
|
|
||||||
|
请在本地使用时遵循 fineweb-edu 数据集与 transformers/Qwen3 相关许可证。
|
||||||
45
config.json
Normal file
45
config.json
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"Qwen3ForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"dtype": "bfloat16",
|
||||||
|
"eos_token_id": 3,
|
||||||
|
"head_dim": 64,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 768,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 2048,
|
||||||
|
"layer_types": [
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention",
|
||||||
|
"full_attention"
|
||||||
|
],
|
||||||
|
"max_position_embeddings": 4096,
|
||||||
|
"max_window_layers": 28,
|
||||||
|
"model_type": "qwen3",
|
||||||
|
"num_attention_heads": 12,
|
||||||
|
"num_hidden_layers": 12,
|
||||||
|
"num_key_value_heads": 12,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"sliding_window": null,
|
||||||
|
"tie_word_embeddings": true,
|
||||||
|
"transformers_version": "4.57.6",
|
||||||
|
"use_cache": false,
|
||||||
|
"use_sliding_window": false,
|
||||||
|
"vocab_size": 16000
|
||||||
|
}
|
||||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 2,
|
||||||
|
"eos_token_id": 3,
|
||||||
|
"pad_token_id": 0,
|
||||||
|
"transformers_version": "4.57.6"
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:474428f719601fddd3f0f0603706b19f54d72e92c0439deabf17cd89357c19ff
|
||||||
|
size 194501600
|
||||||
30
special_tokens_map.json
Normal file
30
special_tokens_map.json
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "[BOS]",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "[EOS]",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "[PAD]",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "[UNK]",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
79305
tokenizer.json
Normal file
79305
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
44
tokenizer_config.json
Normal file
44
tokenizer_config.json
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
{
|
||||||
|
"added_tokens_decoder": {
|
||||||
|
"0": {
|
||||||
|
"content": "[PAD]",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"1": {
|
||||||
|
"content": "[UNK]",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"2": {
|
||||||
|
"content": "[BOS]",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"3": {
|
||||||
|
"content": "[EOS]",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"bos_token": "[BOS]",
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": "[EOS]",
|
||||||
|
"extra_special_tokens": {},
|
||||||
|
"model_max_length": 4096,
|
||||||
|
"pad_token": "[PAD]",
|
||||||
|
"tokenizer_class": "PreTrainedTokenizerFast",
|
||||||
|
"unk_token": "[UNK]"
|
||||||
|
}
|
||||||
Reference in New Issue
Block a user