初始化项目,由ModelHub XC社区提供模型
Model: hooman650/MedQwen3B-Reasoner Source: Original Platform
This commit is contained in:
42
.gitattributes
vendored
Normal file
42
.gitattributes
vendored
Normal file
@@ -0,0 +1,42 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||
Screenshot[[:space:]]2025-02-07[[:space:]]at[[:space:]]4.22.30 PM.png filter=lfs diff=lfs merge=lfs -text
|
||||
process.png filter=lfs diff=lfs merge=lfs -text
|
||||
unsloth.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
unsloth.F16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
unsloth.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
unsloth.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
183
README.md
Normal file
183
README.md
Normal file
@@ -0,0 +1,183 @@
|
||||
---
|
||||
base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
|
||||
tags:
|
||||
- text-generation-inference
|
||||
- reinforcement-learning
|
||||
- transformers
|
||||
- unsloth
|
||||
- qwen2
|
||||
- trl
|
||||
- grpo
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
datasets:
|
||||
- qiaojin/PubMedQA
|
||||
- openai/gsm8k
|
||||
- yesilhealth/Health_Benchmarks
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# MedQwen3B-Reasoner: Medical Domain Reasoning with Mathematics-Enhanced Training
|
||||
|
||||
MedQwen3B-Reasoner is a specialized variant of Qwen2.5-3B-Instruct, fine-tuned using `GRPO` to excel at medical domain reasoning while maintaining strong mathematical problem-solving capabilities. The model demonstrates enhanced reasoning abilities and can express uncertainty when appropriate.
|
||||
|
||||

|
||||
|
||||
## Important
|
||||
|
||||
If you use `ollama`, `llama-cpp`, `vllm` or any other inference iengine, you need to set the system prompt as below as the model performs best with the following prompt:
|
||||
|
||||
```
|
||||
'\nRespond in the following format:\n<reasoning>\n...\n</reasoning>\n<answer>\n...\n</answer>\n'
|
||||
```
|
||||
|
||||
## Want to train your own?
|
||||
|
||||
Read my article [here](https://medium.com/@hooman_66365/build-your-own-medical-mini-deepseek-r1-with-reinforcement-learning-508509cd7d83) or follow the [notebook](https://github.com/hooman650/MedQwenReasoner).
|
||||
|
||||
## Key Features
|
||||
|
||||
- Medical domain expertise combined with mathematical reasoning capabilities
|
||||
- Ability to express uncertainty with "maybe" responses
|
||||
- Structured reasoning outputs with clear step-by-step explanations
|
||||
- Compact size (3B parameters) while maintaining strong performance
|
||||
- Trained using GRPO (Group Relative Policy Optimization) for 483 steps
|
||||
|
||||
## Training Data
|
||||
|
||||
The model was trained using a carefully curated mix of datasets:
|
||||
- 70% Medical domain: [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA)
|
||||
- 30% Mixed reasoning:
|
||||
- [GSM8K](https://huggingface.co/datasets/openai/gsm8k) (Mathematical reasoning)
|
||||
- [Health Benchmarks](https://huggingface.co/datasets/yesilhealth/Health_Benchmarks)
|
||||
|
||||
## Examples
|
||||
|
||||
Here are some examples of the model's reasoning capabilities across different domains:
|
||||
|
||||
### Mathematical Reasoning
|
||||
```
|
||||
Q: Agatha has some money to spend on a new bike. She spends $15 on the frame, and $25 on the front wheel. If she has $20 left to spend on a seat and handlebar tape, how much money, in dollars, did she have at first?
|
||||
Ground Truth Answer:
|
||||
60
|
||||
MedQwen3B Answer (with reasoning it came up with)
|
||||
<reasoning>
|
||||
To determine how much money Agatha initially had, we need to add up the amount she spent on the frame, the front wheel, and the remaining money she has left to spend on a seat and handlebar tape.
|
||||
|
||||
The total amount spent on the frame and the front wheel is:
|
||||
$15 + $25 = $40
|
||||
|
||||
The remaining money she has left is $20.
|
||||
|
||||
Therefore, the initial amount of money Agatha had is the sum of the money she has spent and the money she has left:
|
||||
$40 + $20 = $60
|
||||
|
||||
</reasoning>
|
||||
<answer>
|
||||
Agatha initially had $60.
|
||||
</answer>
|
||||
```
|
||||
|
||||
### Medical Research Analysis
|
||||
```
|
||||
Given the scientific context below:
|
||||
"""
|
||||
Urinary bladder cancer patients who have undergone transurethral resection of bladder tumor (TURBT) are at risk of recurrence. This study aims to correlate the level of bone morphogenetic protein (BMP) expression with urothelial carcinoma invasiveness, TNM stage and time to recurrence after TURBT.
|
||||
In 33 specimens of healthy transitional epithelium and 42 of urothelial carcinoma, BMP2, BMP4 and BMP7 expression was determined by real-time polymerase chain reaction. Patients who underwent TURBT were followed up for 1 year.
|
||||
BMP2 and BMP7 were downregulated in infiltrating urothelial carcinoma, the relative expression being 0.76 (p = 0.04) and 0.28 (p = 0.025) respectively, while BMP4 was downregulated in non-invasive tumors. High expression of BMP2 and BMP7 correlated with prolonged time to recurrence (log-rank: p = 0.01 and p = 0.03 respectively).
|
||||
"""
|
||||
Answer the following question:
|
||||
Does expression of bone morphogenetic protein-2 and -7 in urinary bladder cancer predict time to tumor recurrence? with 'yes', 'no' or 'maybe'. You need to carefully review the context and reason before answering.
|
||||
Ground Truth Answer:
|
||||
yes
|
||||
MedQwen3b Answer:
|
||||
<reasoning>
|
||||
The study indicates that high expression of BMP2 and BMP7 correlated with prolonged time to recurrence (log-rank: p = 0.01 and p = 0.03 respectively). This means that the gene expression levels of BMP2 and BMP7 were significantly associated with the time to recurrence, with higher expression leading to a longer time to recurrence. Therefore, based on the statistical significance (p-values < 0.05) it is reasonable to conclude that the expression of bone morphogenetic protein-2 and -7 in urinary bladder cancer patients predicts time to tumor recurrence.
|
||||
</reasoning>
|
||||
<answer>
|
||||
yes
|
||||
</answer>
|
||||
```
|
||||
|
||||
### Clinical Decision Making
|
||||
```
|
||||
Answer the following question:
|
||||
What is a recommended strategy for pediatricians to encourage vaccination among hesitant parents? A:Allowing delays for certain vaccines while ensuring the most important ones are administered on schedule. B:Providing evidence of immunization benefits in a judgmental manner. C:Advocating for alternative immunization schedules to accommodate parental concerns. D:Discharging patients from their practice if vaccinations are refused.
|
||||
With 'A', 'B', 'C' or 'D'. You need to carefully review the context and reason before answering.
|
||||
Ground Truth Answer:
|
||||
A
|
||||
MedQwen3B Answer:
|
||||
<reasoning>
|
||||
The strategy recommended by pediatricians to encourage vaccination among hesitant parents should not lead to the abandonment or dismissal of families as it can undermine the critical role of pediatricians in protecting children's health. Option D can harm the trust between the pediatrician and the parent, potentially alienating them. Options B and C might create confusion or threaten parents' trust in the recommended vaccination schedule. Option A, on the other hand, provides an evidence-based compromise that allows some flexibility while ensuring coverage for essential vaccines, which can be respected by hesitant parents as it balances their concerns with public health needs.
|
||||
|
||||
This approach respects the autonomy of parents while providing relevant information and reassurance, which can help in addressing their hesitations, making Option A the most suitable strategy.
|
||||
</reasoning>
|
||||
<answer>
|
||||
A
|
||||
</answer>
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "hooman650/MedQwen3B-Reasoner"
|
||||
|
||||
# Initialize model and tokenizer
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
torch_dtype="auto",
|
||||
device_map="auto"
|
||||
)
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
|
||||
# Prepare prompt
|
||||
prompt = "What is the relationship between BMI and cardiovascular disease risk?"
|
||||
messages = [
|
||||
{"role": "system", "content": "\nRespond in the following format:\n<reasoning>\n...\n</reasoning>\n<answer>\n...\n</answer>\n"},
|
||||
{"role": "user", "content": prompt}
|
||||
]
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True
|
||||
)
|
||||
|
||||
# Generate response
|
||||
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
||||
generated_ids = model.generate(
|
||||
**model_inputs,
|
||||
max_new_tokens=512
|
||||
)
|
||||
generated_ids = [
|
||||
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
||||
]
|
||||
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
||||
```
|
||||
|
||||
## Model Details
|
||||
|
||||
- Base Model: [unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit)
|
||||
- Training Steps: 483
|
||||
- Library: Unsloth
|
||||
- License: Apache 2.0
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model in your research, please cite:
|
||||
|
||||
```bibtex
|
||||
@misc {hooman_sedghamiz_2025,
|
||||
author = { {Hooman Sedghamiz} },
|
||||
title = { MedQwen3B-Reasoner (Revision 5dbc982) },
|
||||
year = 2025,
|
||||
url = { https://huggingface.co/hooman650/MedQwen3B-Reasoner },
|
||||
doi = { 10.57967/hf/4415 },
|
||||
publisher = { Hugging Face }
|
||||
}
|
||||
```
|
||||
|
||||
## License
|
||||
This model is licensed under Apache 2.0.
|
||||
24
added_tokens.json
Normal file
24
added_tokens.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"</tool_call>": 151658,
|
||||
"<tool_call>": 151657,
|
||||
"<|box_end|>": 151649,
|
||||
"<|box_start|>": 151648,
|
||||
"<|endoftext|>": 151643,
|
||||
"<|file_sep|>": 151664,
|
||||
"<|fim_middle|>": 151660,
|
||||
"<|fim_pad|>": 151662,
|
||||
"<|fim_prefix|>": 151659,
|
||||
"<|fim_suffix|>": 151661,
|
||||
"<|im_end|>": 151645,
|
||||
"<|im_start|>": 151644,
|
||||
"<|image_pad|>": 151655,
|
||||
"<|object_ref_end|>": 151647,
|
||||
"<|object_ref_start|>": 151646,
|
||||
"<|quad_end|>": 151651,
|
||||
"<|quad_start|>": 151650,
|
||||
"<|repo_name|>": 151663,
|
||||
"<|video_pad|>": 151656,
|
||||
"<|vision_end|>": 151653,
|
||||
"<|vision_pad|>": 151654,
|
||||
"<|vision_start|>": 151652
|
||||
}
|
||||
3
config.json
Normal file
3
config.json
Normal file
@@ -0,0 +1,3 @@
|
||||
{
|
||||
"model_type": "qwen2"
|
||||
}
|
||||
7
generation_config.json
Normal file
7
generation_config.json
Normal file
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"eos_token_id": 151645,
|
||||
"max_length": 32768,
|
||||
"pad_token_id": 0,
|
||||
"transformers_version": "4.48.2"
|
||||
}
|
||||
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00002.safetensors
Normal file
3
model-00001-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4c5f4b78e6fdb6185a4143b42419cd67a07298d08d921ab943cd08c09661fcbd
|
||||
size 4957560304
|
||||
3
model-00002-of-00002.safetensors
Normal file
3
model-00002-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d6582ab962529368da4d5f129a12a83e9806961d79a29b8737c72490a6bdc58b
|
||||
size 1214366696
|
||||
441
model.safetensors.index.json
Normal file
441
model.safetensors.index.json
Normal file
@@ -0,0 +1,441 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 6171877376
|
||||
},
|
||||
"weight_map": {
|
||||
"model.embed_tokens.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
|
||||
"model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
||||
"model.norm.weight": "model-00002-of-00002.safetensors"
|
||||
}
|
||||
}
|
||||
3
process.png
Normal file
3
process.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fe1cddb4c41eaff077118ca8b564236c6b5e8f07d283b141b997e88d599eea60
|
||||
size 1075747
|
||||
31
special_tokens_map.json
Normal file
31
special_tokens_map.json
Normal file
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"eos_token": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:64e71213db910f5cafa86d35091f37393dcc344b1bbc34091d1b3eed4cca01d5
|
||||
size 11422064
|
||||
209
tokenizer_config.json
Normal file
209
tokenizer_config.json
Normal file
@@ -0,0 +1,209 @@
|
||||
{
|
||||
"add_bos_token": false,
|
||||
"add_prefix_space": false,
|
||||
"added_tokens_decoder": {
|
||||
"151643": {
|
||||
"content": "<|endoftext|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151644": {
|
||||
"content": "<|im_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151645": {
|
||||
"content": "<|im_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151646": {
|
||||
"content": "<|object_ref_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151647": {
|
||||
"content": "<|object_ref_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151648": {
|
||||
"content": "<|box_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151649": {
|
||||
"content": "<|box_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151650": {
|
||||
"content": "<|quad_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151651": {
|
||||
"content": "<|quad_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151652": {
|
||||
"content": "<|vision_start|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151653": {
|
||||
"content": "<|vision_end|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151654": {
|
||||
"content": "<|vision_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151655": {
|
||||
"content": "<|image_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151656": {
|
||||
"content": "<|video_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"151657": {
|
||||
"content": "<tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151658": {
|
||||
"content": "</tool_call>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151659": {
|
||||
"content": "<|fim_prefix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151660": {
|
||||
"content": "<|fim_middle|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151661": {
|
||||
"content": "<|fim_suffix|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151662": {
|
||||
"content": "<|fim_pad|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151663": {
|
||||
"content": "<|repo_name|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
},
|
||||
"151664": {
|
||||
"content": "<|file_sep|>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": false
|
||||
}
|
||||
},
|
||||
"additional_special_tokens": [
|
||||
"<|im_start|>",
|
||||
"<|im_end|>",
|
||||
"<|object_ref_start|>",
|
||||
"<|object_ref_end|>",
|
||||
"<|box_start|>",
|
||||
"<|box_end|>",
|
||||
"<|quad_start|>",
|
||||
"<|quad_end|>",
|
||||
"<|vision_start|>",
|
||||
"<|vision_end|>",
|
||||
"<|vision_pad|>",
|
||||
"<|image_pad|>",
|
||||
"<|video_pad|>"
|
||||
],
|
||||
"bos_token": null,
|
||||
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "<|im_end|>",
|
||||
"errors": "replace",
|
||||
"extra_special_tokens": {},
|
||||
"model_max_length": 32768,
|
||||
"pad_token": "<|vision_pad|>",
|
||||
"padding_side": "left",
|
||||
"split_special_tokens": false,
|
||||
"tokenizer_class": "Qwen2Tokenizer",
|
||||
"unk_token": null
|
||||
}
|
||||
3
unsloth.F16.gguf
Normal file
3
unsloth.F16.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6556df5fb0226a5b6a3fba5209857b23755ed16ca69b9062e121726bffda881f
|
||||
size 6178316672
|
||||
3
unsloth.Q4_K_M.gguf
Normal file
3
unsloth.Q4_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4b28d094d7db6fe2a62f6b68af145c773db5587c2234f77c1f1dcf602d3315fc
|
||||
size 1929902464
|
||||
3
unsloth.Q5_K_M.gguf
Normal file
3
unsloth.Q5_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6af51bae63b28601e41e6f1c63b373ba1897f11408be61984d3feb2849539411
|
||||
size 2224814464
|
||||
3
unsloth.Q8_0.gguf
Normal file
3
unsloth.Q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:26b0d8320feb3c7aba134c89c2de3c0e9476158a054117493f72cf355abee005
|
||||
size 3285475712
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user