初始化项目,由ModelHub XC社区提供模型
Model: suyashdb/broken-model-fixed Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
76
README.md
Normal file
76
README.md
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
---
|
||||||
|
library_name: transformers
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
base_model:
|
||||||
|
- Qwen/Qwen3-8B
|
||||||
|
---
|
||||||
|
|
||||||
|
# broken-model (fixed)
|
||||||
|
|
||||||
|
HuggingFace Repo: https://huggingface.co/suyashdb/broken-model-fixed/tree/main
|
||||||
|
|
||||||
|
## Changes Made
|
||||||
|
|
||||||
|
### 1. `README.md` — `base_model` corrected
|
||||||
|
- **Before:** `meta-llama/Meta-Llama-3.1-8B`
|
||||||
|
- **After:** `Qwen/Qwen3-8B`
|
||||||
|
- **Why:** The model architecture (`Qwen3ForCausalLM`), tokenizer class (`Qwen2Tokenizer`), vocabulary size (151936), and all config values exactly match Qwen3-8B, not Llama-3.1-8B. The wrong base_model declaration was misleading but not the functional blocker.
|
||||||
|
|
||||||
|
### 2. `tokenizer_config.json` — `chat_template` added
|
||||||
|
- **Before:** The `chat_template` field was entirely absent from `tokenizer_config.json`.
|
||||||
|
- **After:** Added the full Jinja2 chat template from the canonical `Qwen/Qwen3-8B` model.
|
||||||
|
- **Why this broke inference:** Any OpenAI-compatible inference server (vLLM, TGI, FriendliAI engine) calls `tokenizer.apply_chat_template()` to convert the `messages` array in a `/chat/completions` request into a single prompt string. Without a `chat_template`, this call raises `"No chat template is set for this tokenizer"` and the server cannot process any request. The model weights themselves are intact — only the tokenizer configuration was missing this critical field.
|
||||||
|
|
||||||
|
The added template handles:
|
||||||
|
- System / user / assistant message formatting using `<|im_start|>` / `<|im_end|>` tokens
|
||||||
|
- Tool call formatting (`<tool_call>` / `<tool_response>`)
|
||||||
|
- Thinking mode: when `enable_thinking=False` is passed, the template injects `<think>\n\n</think>` to suppress chain-of-thought output
|
||||||
|
- Multi-turn reasoning content (`reasoning_content` field on assistant messages)
|
||||||
|
|
||||||
|
### 3. Vocab/tokenizer files added
|
||||||
|
- `vocab.json`, `tokenizer.json`, and `special_tokens_map.json` were uploaded from the canonical `Qwen/Qwen3-8B` model.
|
||||||
|
- The original broken repo was missing these, making it impossible to load the tokenizer standalone.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
You can verify the fix without model weights — just the tokenizer:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer
|
||||||
|
|
||||||
|
tok = AutoTokenizer.from_pretrained("suyashdb/broken-model-fixed")
|
||||||
|
|
||||||
|
messages = [{"role": "user", "content": "What is 2+2?"}]
|
||||||
|
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||||
|
print(prompt)
|
||||||
|
# Expected output:
|
||||||
|
# <|im_start|>user
|
||||||
|
# What is 2+2?<|im_end|>
|
||||||
|
# <|im_start|>assistant
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part B — Why `reasoning_effort` Does Nothing
|
||||||
|
|
||||||
|
If you've tried passing `reasoning_effort: "low"` or `reasoning_effort: "high"` in your requests and noticed zero difference in the output — you're not imagining it. Here's why.
|
||||||
|
|
||||||
|
### The short answer
|
||||||
|
|
||||||
|
This model has no idea what `reasoning_effort` means. It was never trained to respond to it.
|
||||||
|
|
||||||
|
### The longer answer
|
||||||
|
|
||||||
|
`reasoning_effort` is a parameter from OpenAI's o-series API (o1, o3, o4). The idea is that you can tell the model how hard to think — `"low"` means give me a quick answer, `"high"` means really work through it. Those models were specifically trained with a concept called budget-forcing: during training, they were given a token budget and rewarded for getting the right answer within that budget. Over time they learned to actually compress or expand their reasoning based on the hint.
|
||||||
|
|
||||||
|
Qwen3-8B was not trained that way. It has two modes — thinking (where it produces a `<think>...</think>` block before answering) and non-thinking (where it skips that entirely). That's a binary on/off switch, not a dial. When you send `reasoning_effort: "medium"`, the model receives it, doesn't recognize it, and ignores it. The output is identical regardless of what value you pass.
|
||||||
|
|
||||||
|
### What would need to change to make it work
|
||||||
|
|
||||||
|
1. The model needs to be retrained with budget-forcing. During fine-tuning, you'd prepend a budget token to each prompt (something like `<budget>512</budget>`) and train the model to produce correct answers within that many tokens. This teaches it to actually reason more efficiently when the budget is tight, rather than just cutting off mid-thought.
|
||||||
|
|
||||||
|
2. The inference server needs to translate `reasoning_effort` into a concrete token limit and either inject it into the prompt in a format the model understands, or hard-stop the `<think>` block after N tokens by force-injecting `</think>`. The second approach is blunt — it truncates reasoning but doesn't make the model reason smarter.
|
||||||
|
|
||||||
|
3. The API layer (whatever sits between the client and the model) needs to map `"low" / "medium" / "high"` to actual numbers and pass them through correctly. Right now most serving stacks just forward unknown parameters to the model, which silently ignores them.
|
||||||
|
|
||||||
|
4. Realistically, the easiest path is to use a model that already supports this natively — like a Qwen3 variant served through FriendliAI's serverless API which exposes `max_thinking_tokens`, or OpenAI's o-series which was purpose-built for `reasoning_effort`. Retrofitting budget-forcing onto an existing model requires retraining, not just a config change.
|
||||||
30
config.json
Normal file
30
config.json
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"Qwen3ForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 151643,
|
||||||
|
"eos_token_id": 151645,
|
||||||
|
"head_dim": 128,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 4096,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 12288,
|
||||||
|
"max_position_embeddings": 40960,
|
||||||
|
"max_window_layers": 36,
|
||||||
|
"model_type": "qwen3",
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_hidden_layers": 36,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 1000000,
|
||||||
|
"sliding_window": null,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.51.0",
|
||||||
|
"use_cache": true,
|
||||||
|
"use_sliding_window": false,
|
||||||
|
"vocab_size": 151936
|
||||||
|
}
|
||||||
13
generation_config.json
Normal file
13
generation_config.json
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
{
|
||||||
|
"bos_token_id": 151643,
|
||||||
|
"do_sample": true,
|
||||||
|
"eos_token_id": [
|
||||||
|
151645,
|
||||||
|
151643
|
||||||
|
],
|
||||||
|
"pad_token_id": 151643,
|
||||||
|
"temperature": 0.6,
|
||||||
|
"top_k": 20,
|
||||||
|
"top_p": 0.95,
|
||||||
|
"transformers_version": "4.51.0"
|
||||||
|
}
|
||||||
36
handler.py
Normal file
36
handler.py
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
|
||||||
|
import torch
|
||||||
|
|
||||||
|
|
||||||
|
class EndpointHandler:
|
||||||
|
def __init__(self, path=""):
|
||||||
|
self.tokenizer = AutoTokenizer.from_pretrained(path)
|
||||||
|
self.model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
path,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
device_map="auto", # loads directly to GPU, skips CPU staging
|
||||||
|
low_cpu_mem_usage=True,
|
||||||
|
)
|
||||||
|
self.pipeline = pipeline(
|
||||||
|
"text-generation",
|
||||||
|
model=self.model,
|
||||||
|
tokenizer=self.tokenizer,
|
||||||
|
)
|
||||||
|
|
||||||
|
def __call__(self, data):
|
||||||
|
messages = data.get("inputs", data.get("messages", []))
|
||||||
|
parameters = data.get("parameters", {})
|
||||||
|
max_new_tokens = parameters.get("max_new_tokens", 512)
|
||||||
|
|
||||||
|
prompt = self.tokenizer.apply_chat_template(
|
||||||
|
messages, tokenize=False, add_generation_prompt=True
|
||||||
|
)
|
||||||
|
result = self.pipeline(
|
||||||
|
prompt,
|
||||||
|
max_new_tokens=max_new_tokens,
|
||||||
|
do_sample=True,
|
||||||
|
temperature=0.6,
|
||||||
|
top_p=0.95,
|
||||||
|
return_full_text=False,
|
||||||
|
)
|
||||||
|
return {"generated_text": result[0]["generated_text"]}
|
||||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00005.safetensors
Normal file
3
model-00001-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:31d6a825ae35f11fb85b195b4c42c146c051e446433125a215336abdf95cbf5f
|
||||||
|
size 3996250744
|
||||||
3
model-00002-of-00005.safetensors
Normal file
3
model-00002-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:5991236cea6fe21f3d43cab0f0e84448734fbbe0789816202989f2ddc9d18282
|
||||||
|
size 3993160032
|
||||||
3
model-00003-of-00005.safetensors
Normal file
3
model-00003-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:c5185c4794be2d8a9784d5753c9922db38df478ce11f9ed0b415b7304d896836
|
||||||
|
size 3959604768
|
||||||
3
model-00004-of-00005.safetensors
Normal file
3
model-00004-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:b5ee7de71fbf17db3d5704e0c8f2bc7d005ca9e1d7ca2aeb19827b0cfcaa917a
|
||||||
|
size 3187841392
|
||||||
3
model-00005-of-00005.safetensors
Normal file
3
model-00005-of-00005.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:20c2d6366ab85c90786ccdd829cd2b9e7d30ef3b2ebbb998280e7e4014b542ff
|
||||||
|
size 1244659840
|
||||||
406
model.safetensors.index.json
Normal file
406
model.safetensors.index.json
Normal file
@@ -0,0 +1,406 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_size": 16381470720
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "model-00005-of-00005.safetensors",
|
||||||
|
"model.embed_tokens.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.10.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.17.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.17.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.17.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.17.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.18.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.2.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.20.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.27.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.27.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.27.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
||||||
|
"model.layers.28.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.3.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.30.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.32.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.33.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.self_attn.q_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.34.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.self_attn.k_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.self_attn.q_norm.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.35.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
||||||
|
"model.layers.4.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.7.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
||||||
|
"model.layers.8.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_norm.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
||||||
|
"model.norm.weight": "model-00004-of-00005.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
{
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|im_start|>",
|
||||||
|
"<|im_end|>",
|
||||||
|
"<|object_ref_start|>",
|
||||||
|
"<|object_ref_end|>",
|
||||||
|
"<|box_start|>",
|
||||||
|
"<|box_end|>",
|
||||||
|
"<|quad_start|>",
|
||||||
|
"<|quad_end|>",
|
||||||
|
"<|vision_start|>",
|
||||||
|
"<|vision_end|>",
|
||||||
|
"<|vision_pad|>",
|
||||||
|
"<|image_pad|>",
|
||||||
|
"<|video_pad|>"
|
||||||
|
],
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
239
tokenizer_config.json
Normal file
239
tokenizer_config.json
Normal file
@@ -0,0 +1,239 @@
|
|||||||
|
{
|
||||||
|
"add_bos_token": false,
|
||||||
|
"add_prefix_space": false,
|
||||||
|
"added_tokens_decoder": {
|
||||||
|
"151643": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151644": {
|
||||||
|
"content": "<|im_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151645": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151646": {
|
||||||
|
"content": "<|object_ref_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151647": {
|
||||||
|
"content": "<|object_ref_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151648": {
|
||||||
|
"content": "<|box_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151649": {
|
||||||
|
"content": "<|box_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151650": {
|
||||||
|
"content": "<|quad_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151651": {
|
||||||
|
"content": "<|quad_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151652": {
|
||||||
|
"content": "<|vision_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151653": {
|
||||||
|
"content": "<|vision_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151654": {
|
||||||
|
"content": "<|vision_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151655": {
|
||||||
|
"content": "<|image_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151656": {
|
||||||
|
"content": "<|video_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"151657": {
|
||||||
|
"content": "<tool_call>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151658": {
|
||||||
|
"content": "</tool_call>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151659": {
|
||||||
|
"content": "<|fim_prefix|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151660": {
|
||||||
|
"content": "<|fim_middle|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151661": {
|
||||||
|
"content": "<|fim_suffix|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151662": {
|
||||||
|
"content": "<|fim_pad|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151663": {
|
||||||
|
"content": "<|repo_name|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151664": {
|
||||||
|
"content": "<|file_sep|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151665": {
|
||||||
|
"content": "<tool_response>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151666": {
|
||||||
|
"content": "</tool_response>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151667": {
|
||||||
|
"content": "<think>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
},
|
||||||
|
"151668": {
|
||||||
|
"content": "</think>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|im_start|>",
|
||||||
|
"<|im_end|>",
|
||||||
|
"<|object_ref_start|>",
|
||||||
|
"<|object_ref_end|>",
|
||||||
|
"<|box_start|>",
|
||||||
|
"<|box_end|>",
|
||||||
|
"<|quad_start|>",
|
||||||
|
"<|quad_end|>",
|
||||||
|
"<|vision_start|>",
|
||||||
|
"<|vision_end|>",
|
||||||
|
"<|vision_pad|>",
|
||||||
|
"<|image_pad|>",
|
||||||
|
"<|video_pad|>"
|
||||||
|
],
|
||||||
|
"bos_token": null,
|
||||||
|
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": "<|im_end|>",
|
||||||
|
"errors": "replace",
|
||||||
|
"model_max_length": 131072,
|
||||||
|
"pad_token": "<|endoftext|>",
|
||||||
|
"split_special_tokens": false,
|
||||||
|
"tokenizer_class": "Qwen2Tokenizer",
|
||||||
|
"unk_token": null
|
||||||
|
}
|
||||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user