初始化项目,由ModelHub XC社区提供模型
Model: North-ML1/willow-alpha-base Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
154
README.md
Normal file
154
README.md
Normal file
@@ -0,0 +1,154 @@
|
||||
---
|
||||
license: mit
|
||||
tags:
|
||||
- llama
|
||||
- pytorch
|
||||
- causal-lm
|
||||
- base-model
|
||||
- north-ml
|
||||
- forge
|
||||
- willow-alpha
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
<h1 align="center" style="font-size: 54px;">
|
||||
Willow Alpha
|
||||
</h1>
|
||||
|
||||
<p align="center">
|
||||
<b>An early-stage version of Forge-1V</b>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<i>Small language model research by North ML.</i>
|
||||
</p>
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
**Willow Alpha** is an early-stage base model checkpoint in the **Forge-1V** model line.
|
||||
|
||||
This model is currently experimental and should be treated as a research checkpoint rather than a polished assistant model. It is useful for testing architecture, pretraining quality, tokenizer behavior, evaluation pipelines, and future SFT/RLHF improvements.
|
||||
|
||||
---
|
||||
|
||||
## Model Details
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Model name | Willow Alpha |
|
||||
| Project | Forge-1V |
|
||||
| Organization | North ML |
|
||||
| Model type | Causal Language Model |
|
||||
| Language | English |
|
||||
| License | MIT |
|
||||
| Status | Early-stage / Alpha |
|
||||
|
||||
---
|
||||
|
||||
## Evaluation Results
|
||||
|
||||
All benchmarks below were run in **0-shot** mode.
|
||||
|
||||
| Benchmark | Metric | Score | Runtime |
|
||||
|---|---:|---:|---:|
|
||||
| HellaSwag | acc_norm | 26.71% | 318.67s |
|
||||
| PIQA | acc_norm | 53.86% | 38.85s |
|
||||
| WinoGrande | acc | 50.67% | 23.73s |
|
||||
| BoolQ | acc | 40.21% | 144.80s |
|
||||
| ARC-Easy | acc_norm | 34.68% | 51.41s |
|
||||
| ARC-Challenge | acc_norm | 25.60% | 37.69s |
|
||||
| OpenBookQA | acc_norm | 25.00% | 21.14s |
|
||||
| CommonsenseQA | acc | 20.31% | 27.66s |
|
||||
| LAMBADA | acc | 0.23% | 96.28s |
|
||||
| BLiMP | acc | 59.23% | 354.79s |
|
||||
| MMLU | acc | 23.89% | 388.62s |
|
||||
| WikiText-2 | word_perplexity | 12524.42 | 182.89s |
|
||||
| WikiText-2 | byte_perplexity | 5.84 | 181.42s |
|
||||
| SciQ | acc_norm | 35.60% | 87.15s |
|
||||
| COPA | acc | 64.00% | 17.21s |
|
||||
| RACE | acc | 23.16% | 334.70s |
|
||||
| SWAG | acc_norm | 29.13% | 252.00s |
|
||||
| TruthfulQA MC2 | acc | 48.74% | 126.29s |
|
||||
|
||||
---
|
||||
|
||||
## Evaluation Summary
|
||||
|
||||
| Category | Result |
|
||||
|---|---:|
|
||||
| Number of completed benchmark runs | 18 |
|
||||
| Successful runs | 18 |
|
||||
| Failed runs | 0 |
|
||||
| Best accuracy-style score | COPA — 64.00% |
|
||||
| Best language-structure score | BLiMP — 59.23% |
|
||||
| MMLU score | 23.89% |
|
||||
| WikiText-2 byte perplexity | 5.84 |
|
||||
| WikiText-2 word perplexity | 12524.42 |
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
Willow Alpha is still in a very early stage. Some results are near-random or unstable, especially on knowledge-heavy and long-context tasks.
|
||||
|
||||
The strongest early signals are:
|
||||
|
||||
- **COPA:** 64.00%
|
||||
- **BLiMP:** 59.23%
|
||||
- **PIQA:** 53.86%
|
||||
- **WinoGrande:** 50.67%
|
||||
- **TruthfulQA MC2:** 48.74%
|
||||
|
||||
The weakest areas are:
|
||||
|
||||
- **LAMBADA**
|
||||
- **WikiText-2 word perplexity**
|
||||
- **CommonsenseQA**
|
||||
- **MMLU**
|
||||
- **RACE**
|
||||
|
||||
These results suggest the model has some early reasoning and grammar signal, but still needs substantially more pretraining, higher-quality data, and post-training before being useful as a general assistant.
|
||||
|
||||
---
|
||||
|
||||
## Intended Use
|
||||
|
||||
Willow Alpha is intended for:
|
||||
|
||||
- Research
|
||||
- Benchmarking
|
||||
- Pretraining experiments
|
||||
- Fine-tuning experiments
|
||||
- Small language model development
|
||||
- Forge-1V pipeline testing
|
||||
|
||||
It is **not yet recommended** for production use.
|
||||
|
||||
---
|
||||
|
||||
## Limitations
|
||||
|
||||
This model may:
|
||||
|
||||
- Produce incorrect information
|
||||
- Fail basic reasoning tasks
|
||||
- Struggle with factual knowledge
|
||||
- Generate repetitive or low-quality text
|
||||
- Perform poorly on long-context tasks
|
||||
- Require additional supervised fine-tuning
|
||||
|
||||
---
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@misc{willow-alpha,
|
||||
title = {Willow Alpha},
|
||||
author = {North ML},
|
||||
year = {2026},
|
||||
note = {Early-stage Forge-1V checkpoint}
|
||||
}
|
||||
25
config.json
Normal file
25
config.json
Normal file
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"architectures": [
|
||||
"LlamaForCausalLM"
|
||||
],
|
||||
"attention_bias": false,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 1024,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 2816,
|
||||
"max_position_embeddings": 2048,
|
||||
"mlp_bias": false,
|
||||
"model_type": "llama",
|
||||
"num_attention_heads": 16,
|
||||
"num_hidden_layers": 24,
|
||||
"num_key_value_heads": 4,
|
||||
"pad_token_id": 0,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_theta": 10000.0,
|
||||
"tie_word_embeddings": true,
|
||||
"torch_dtype": "bfloat16",
|
||||
"use_cache": true,
|
||||
"vocab_size": 16384
|
||||
}
|
||||
164
eval_results.json
Normal file
164
eval_results.json
Normal file
@@ -0,0 +1,164 @@
|
||||
[
|
||||
{
|
||||
"task": "hellaswag",
|
||||
"benchmark": "HellaSwag",
|
||||
"metric": "acc_norm",
|
||||
"score": 0.26707827126070505,
|
||||
"shots": 0,
|
||||
"runtime_sec": 318.67,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "piqa",
|
||||
"benchmark": "PIQA",
|
||||
"metric": "acc_norm",
|
||||
"score": 0.5386289445048966,
|
||||
"shots": 0,
|
||||
"runtime_sec": 38.85,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "winogrande",
|
||||
"benchmark": "WinoGrande",
|
||||
"metric": "acc",
|
||||
"score": 0.5067087608524072,
|
||||
"shots": 0,
|
||||
"runtime_sec": 23.73,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "boolq",
|
||||
"benchmark": "BoolQ",
|
||||
"metric": "acc",
|
||||
"score": 0.40214067278287463,
|
||||
"shots": 0,
|
||||
"runtime_sec": 144.8,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "arc_easy",
|
||||
"benchmark": "ARC-Easy",
|
||||
"metric": "acc_norm",
|
||||
"score": 0.3468013468013468,
|
||||
"shots": 0,
|
||||
"runtime_sec": 51.41,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "arc_challenge",
|
||||
"benchmark": "ARC-Challenge",
|
||||
"metric": "acc_norm",
|
||||
"score": 0.25597269624573377,
|
||||
"shots": 0,
|
||||
"runtime_sec": 37.69,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "openbookqa",
|
||||
"benchmark": "OpenBookQA",
|
||||
"metric": "acc_norm",
|
||||
"score": 0.25,
|
||||
"shots": 0,
|
||||
"runtime_sec": 21.14,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "commonsense_qa",
|
||||
"benchmark": "CommonsenseQA",
|
||||
"metric": "acc",
|
||||
"score": 0.2031122031122031,
|
||||
"shots": 0,
|
||||
"runtime_sec": 27.66,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "lambada_openai",
|
||||
"benchmark": "LAMBADA",
|
||||
"metric": "acc",
|
||||
"score": 0.0023287405394915583,
|
||||
"shots": 0,
|
||||
"runtime_sec": 96.28,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "blimp",
|
||||
"benchmark": "BLiMP",
|
||||
"metric": "acc",
|
||||
"score": 0.5923432835820895,
|
||||
"shots": 0,
|
||||
"runtime_sec": 354.79,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "mmlu",
|
||||
"benchmark": "MMLU",
|
||||
"metric": "acc",
|
||||
"score": 0.23892607890613873,
|
||||
"shots": 0,
|
||||
"runtime_sec": 388.62,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "wikitext",
|
||||
"benchmark": "WikiText-2",
|
||||
"metric": "word_perplexity",
|
||||
"score": 12524.42105099034,
|
||||
"shots": 0,
|
||||
"runtime_sec": 182.89,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "wikitext",
|
||||
"benchmark": "WikiText-2",
|
||||
"metric": "byte_perplexity",
|
||||
"score": 5.838498405241562,
|
||||
"shots": 0,
|
||||
"runtime_sec": 181.42,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "sciq",
|
||||
"benchmark": "SciQ",
|
||||
"metric": "acc_norm",
|
||||
"score": 0.356,
|
||||
"shots": 0,
|
||||
"runtime_sec": 87.15,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "copa",
|
||||
"benchmark": "COPA",
|
||||
"metric": "acc",
|
||||
"score": 0.64,
|
||||
"shots": 0,
|
||||
"runtime_sec": 17.21,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "race",
|
||||
"benchmark": "RACE",
|
||||
"metric": "acc",
|
||||
"score": 0.23157894736842105,
|
||||
"shots": 0,
|
||||
"runtime_sec": 334.7,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "swag",
|
||||
"benchmark": "SWAG",
|
||||
"metric": "acc_norm",
|
||||
"score": 0.2912626212136359,
|
||||
"shots": 0,
|
||||
"runtime_sec": 252.0,
|
||||
"status": "success"
|
||||
},
|
||||
{
|
||||
"task": "truthfulqa_mc2",
|
||||
"benchmark": "TruthfulQA MC2",
|
||||
"metric": "acc",
|
||||
"score": 0.48740972804833826,
|
||||
"shots": 0,
|
||||
"runtime_sec": 126.29,
|
||||
"status": "success"
|
||||
}
|
||||
]
|
||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:486833272c9061385324c000de7619ac67520c5c74b6b3035064755d2e29724f
|
||||
size 1149464648
|
||||
225
model.safetensors.index.json
Normal file
225
model.safetensors.index.json
Normal file
@@ -0,0 +1,225 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 1149440000
|
||||
},
|
||||
"weight_map": {
|
||||
"model.embed_tokens.weight": "model.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model.safetensors",
|
||||
"model.norm.weight": "model.safetensors"
|
||||
}
|
||||
}
|
||||
12
special_tokens_map.json
Normal file
12
special_tokens_map.json
Normal file
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"bos_token": "<s>",
|
||||
"eos_token": "</s>",
|
||||
"unk_token": "<unk>",
|
||||
"pad_token": "<pad>",
|
||||
"additional_special_tokens": [
|
||||
"<|user|>",
|
||||
"<|assistant|>",
|
||||
"<|system|>",
|
||||
"<|end|>"
|
||||
]
|
||||
}
|
||||
81040
tokenizer.json
Normal file
81040
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
14
tokenizer_config.json
Normal file
14
tokenizer_config.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"model_max_length": 2048,
|
||||
"bos_token": "<s>",
|
||||
"eos_token": "</s>",
|
||||
"unk_token": "<unk>",
|
||||
"pad_token": "<pad>",
|
||||
"additional_special_tokens": [
|
||||
"<|user|>",
|
||||
"<|assistant|>",
|
||||
"<|system|>",
|
||||
"<|end|>"
|
||||
],
|
||||
"tokenizer_class": "PreTrainedTokenizerFast"
|
||||
}
|
||||
Reference in New Issue
Block a user