初始化项目,由ModelHub XC社区提供模型
Model: hamilton65/MMed-Llama-3-8B-EnIns Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
88
README.md
Normal file
88
README.md
Normal file
@@ -0,0 +1,88 @@
|
|||||||
|
---
|
||||||
|
license: llama3
|
||||||
|
datasets:
|
||||||
|
- Henrychur/MMedC
|
||||||
|
- axiong/pmc_llama_instructions
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
- zh
|
||||||
|
- ja
|
||||||
|
- fr
|
||||||
|
- ru
|
||||||
|
- es
|
||||||
|
tags:
|
||||||
|
- medical
|
||||||
|
base_model: Henrychur/MMed-Llama-3-8B
|
||||||
|
library_name: transformers
|
||||||
|
---
|
||||||
|
# MMedLM
|
||||||
|
[💻Github Repo](https://github.com/MAGIC-AI4Med/MMedLM) [🖨️arXiv Paper](https://arxiv.org/abs/2402.13963)
|
||||||
|
|
||||||
|
The official model weights for "Towards Building Multilingual Language Model for Medicine".
|
||||||
|
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
This repo contains MMed-Llama 3-8B-EnIns, which is based on MMed-Llama 3-8B. We further fine-tune the model on **English instruction fine-tuning dataset**(from PMC-LLaMA). We did this for a fair comparison with existing models on commonly-used English benchmarks.
|
||||||
|
Notice that, MMed-Llama 3-8B-EnIns has only been trained on pmc_llama_instructions, which is a English medical SFT dataset focusing on QA tasks. So this model's ability to respond multilingual input is still limited.
|
||||||
|
|
||||||
|
The model can be loaded as follows:
|
||||||
|
```py
|
||||||
|
import torch
|
||||||
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("Henrychur/MMed-Llama-3-8B-EnIns")
|
||||||
|
model = AutoModelForCausalLM.from_pretrained("Henrychur/MMed-Llama-3-8B-EnIns", torch_dtype=torch.float16)
|
||||||
|
```
|
||||||
|
|
||||||
|
- Inference format is similar to Llama 3-Instruct, you can check our inference code [here](https://github.com/MAGIC-AI4Med/MedS-Ins/tree/main/Inference).
|
||||||
|
- For multiple-choice question and answering tasks, we suggest using the following instruction.
|
||||||
|
```py
|
||||||
|
from model import MedS_Llama3 # https://github.com/MAGIC-AI4Med/MedS-Ins/blob/main/Inference/model.py
|
||||||
|
sdk_api = MedS_Llama3(model_path="Henrychur/MMed-Llama-3-8B-EnIns", gpu_id=0)
|
||||||
|
INSTRUCTION = "Given a question and a list of options, select the correct answer from the options directly."
|
||||||
|
input_ = "Question: A mother brings her 3-week-old infant to the pediatrician's office because she is concerned about his feeding habits. He was born without complications and has not had any medical problems up until this time. However, for the past 4 days, he has been fussy, is regurgitating all of his feeds, and his vomit is yellow in color. On physical exam, the child's abdomen is minimally distended but no other abnormalities are appreciated. Which of the following embryologic errors could account for this presentation?\nOptions: A: Abnormal migration of ventral pancreatic bud\tB: Complete failure of proximal duodenum to recanalize\tC: Abnormal hypertrophy of the pylorus\tD: Failure of lateral body folds to move ventrally and fuse in the midline\t"
|
||||||
|
results = sdk_api.chat([], input_, INSTRUCTION)
|
||||||
|
print(results)
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## News
|
||||||
|
[2024.2.21] Our pre-print paper is released ArXiv. Dive into our findings [here](https://arxiv.org/abs/2402.13963).
|
||||||
|
|
||||||
|
[2024.2.20] We release [MMedLM](https://huggingface.co/Henrychur/MMedLM) and [MMedLM 2](https://huggingface.co/Henrychur/MMedLM2). With an auto-regressive continues training on MMedC, these models achieves superior performance compared to all other open-source models, even rivaling GPT-4 on MMedBench.
|
||||||
|
|
||||||
|
[2023.2.20] We release [MMedC](https://huggingface.co/datasets/Henrychur/MMedC), a multilingual medical corpus containing 25.5B tokens.
|
||||||
|
|
||||||
|
[2023.2.20] We release [MMedBench](https://huggingface.co/datasets/Henrychur/MMedBench), a new multilingual medical multi-choice question-answering
|
||||||
|
benchmark with rationale. Check out the leaderboard [here](https://henrychur.github.io/MultilingualMedQA/).
|
||||||
|
|
||||||
|
## Evaluation on Commonly-used English Benchmark
|
||||||
|
The further pretrained MMed-Llama3 also showcast it's great performance in medical domain on different English benchmarks.
|
||||||
|
|
||||||
|
| Method | Size | Year | MedQA | MedMCQA | PubMedQA | MMLU_CK | MMLU_MG | MMLU_AN | MMLU_PM | MMLU_CB | MMLU_CM | Avg. |
|
||||||
|
| ------------------- | ---- | ------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | --------- |
|
||||||
|
| MedAlpaca | 7B | 2023.3 | 41.7 | 37.5 | 72.8 | 57.4 | 69.0 | 57.0 | 67.3 | 65.3 | 54.3 | 58.03 |
|
||||||
|
| PMC-LLaMA | 13B | 2023.9 | 56.4 | 56.0 | 77.9 | - | - | - | - | - | - | - |
|
||||||
|
| MEDITRON | 7B | 2023.11 | 57.2 | 59.2 | 74.4 | 64.6 | 59.9 | 49.3 | 55.4 | 53.8 | 44.8 | 57.62 |
|
||||||
|
| Mistral | 7B | 2023.12 | 50.8 | 48.2 | 75.4 | 68.7 | 71.0 | 55.6 | 68.4 | 68.1 | 59.5 | 62.97 |
|
||||||
|
| Gemma | 7B | 2024.2 | 47.2 | 49.0 | 76.2 | 69.8 | 70.0 | 59.3 | 66.2 | **79.9** | 60.1 | 64.19 |
|
||||||
|
| BioMistral | 7B | 2024.2 | 50.6 | 48.1 | 77.5 | 59.9 | 64.0 | 56.5 | 60.4 | 59.0 | 54.7 | 58.97 |
|
||||||
|
| Llama 3 | 8B | 2024.4 | 60.9 | 50.7 | 73.0 | **72.1** | 76.0 | 63.0 | 77.2 | **79.9** | 64.2 | 68.56 |
|
||||||
|
| MMed-Llama 3~(Ours) | 8B | - | **65.4** | **63.5** | **80.1** | 71.3 | **85.0** | **69.6** | **77.6** | 74.3 | **66.5** | **72.59** |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Contact
|
||||||
|
If you have any question, please feel free to contact qiupengcheng@pjlab.org.cn.
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
```
|
||||||
|
@misc{qiu2024building,
|
||||||
|
title={Towards Building Multilingual Language Model for Medicine},
|
||||||
|
author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie},
|
||||||
|
year={2024},
|
||||||
|
eprint={2402.13963},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CL}
|
||||||
|
}
|
||||||
|
```
|
||||||
28
config.json
Normal file
28
config.json
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
{
|
||||||
|
"_name_or_path": "/mnt/petrelfs/share_data/wuchaoyi/Med_Instruction_Dataset/Super-MedInstruction/Language_Training/results/PMC_llama_sft_MMedllama3",
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 128000,
|
||||||
|
"eos_token_id": 128001,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 4096,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 14336,
|
||||||
|
"max_position_embeddings": 8192,
|
||||||
|
"model_type": "llama",
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_hidden_layers": 32,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"pretraining_tp": 1,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 500000.0,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"torch_dtype": "float32",
|
||||||
|
"transformers_version": "4.39.2",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 128256
|
||||||
|
}
|
||||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 128000,
|
||||||
|
"eos_token_id": 128001,
|
||||||
|
"transformers_version": "4.39.2"
|
||||||
|
}
|
||||||
3
model-00001-of-00007.safetensors
Normal file
3
model-00001-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:711db0b2e28c84c566766434dd03836512aefc397829fd9ee1ef9a514370c4f5
|
||||||
|
size 4886466168
|
||||||
3
model-00002-of-00007.safetensors
Normal file
3
model-00002-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:30e09841c3cb147fd0feaa857bf328c905288b1716968b6c9b41b3c6a1978d54
|
||||||
|
size 4832007448
|
||||||
3
model-00003-of-00007.safetensors
Normal file
3
model-00003-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a4840bf8643f0eafe8f7fb2ee4d1cbcbbf3e6a6092b33c151349f079f65ea800
|
||||||
|
size 4999813112
|
||||||
3
model-00004-of-00007.safetensors
Normal file
3
model-00004-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a51ead6b22a27e8e9c88652e72ff63c9df843acd1a63a6c6920e5c0a1426cef7
|
||||||
|
size 4999813128
|
||||||
3
model-00005-of-00007.safetensors
Normal file
3
model-00005-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:5539d61bfde676230cf0def7633ac41e1eba6882eacd72f367ac0663127a2851
|
||||||
|
size 4832007496
|
||||||
3
model-00006-of-00007.safetensors
Normal file
3
model-00006-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:fdb3f92c2d7aa3d15377e558979b8cecfc066876efc35f2661a4657907d6a375
|
||||||
|
size 4999813120
|
||||||
3
model-00007-of-00007.safetensors
Normal file
3
model-00007-of-00007.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:225d18ad122d1e2cb0d9a18af6c35dd8337d1ac7c9f717d21eb59977aaa44519
|
||||||
|
size 2571158184
|
||||||
298
model.safetensors.index.json
Normal file
298
model.safetensors.index.json
Normal file
@@ -0,0 +1,298 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_size": 32121044992
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "model-00007-of-00007.safetensors",
|
||||||
|
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
||||||
|
"model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
||||||
|
"model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
||||||
|
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||||
|
"model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
||||||
|
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
||||||
|
"model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
||||||
|
"model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
||||||
|
"model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
||||||
|
"model.norm.weight": "model-00007-of-00007.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
16
special_tokens_map.json
Normal file
16
special_tokens_map.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|begin_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|end_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
410503
tokenizer.json
Normal file
410503
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
2063
tokenizer_config.json
Normal file
2063
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user