初始化项目,由ModelHub XC社区提供模型

Model: integration1857/prescription-simplifier-mistral7b
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-24 17:17:06 +08:00
commit b6b5e77415
16 changed files with 276459 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

228
README.md Normal file
View File

@@ -0,0 +1,228 @@
---
library_name: transformers
tags: []
---
# Model Card for Model ID
## Model Details
### Model Description
A fine-tuned version of Mistral-7B-Instruct-v0.3, trained to convert complex medical prescriptions into simple,
patient-friendly explanations. Fine-tuned using QLoRA on Kaggle T4×2 GPUs with 8 hand-crafted prescription→explanation training
examples covering Amoxicillin, Metformin, Lisinopril, Atorvastatin, Salbutamol, Sertraline, Warfarin, and Pantoprazole.
- **Developed by:** Madhukar Kumar
- **Funded by [optional]:** Self Funded
- **Shared by [optional]:** Madhukar Kumar
- **Model type:** Causal Language Model (LLM) — fine-tuned using QLoRA (4-bit NF4 quantization + LoRA adapters) on mistralai/Mistral-7B-Instruct-v0.3
- **Language(s) (NLP):** English
- **License:** Apache 2.0 (inherited from Mistral-7B-Instruct-v0.3)
- **Finetuned from model [optional]:** mistralai/Mistral-7B-Instruct-v0.3
### Model Sources [optional]
- **Repository:** (https://huggingface.co/integration1857/prescription-simplifier-mistral7b)
- **Paper [optional]:** https://medium.com/p/49e94536f72b
- **Demo [optional]:** (https://d6zhmy6z4ifp0.cloudfront.net)
## Uses
### Direct Use
This model is designed to convert complex medical prescription text into simple, plain-language explanations for patients.
It can be used directly via the HuggingFace Inference API or integrated into healthcare applications, pharmacy portals,
or patient-facing tools.
### Downstream Use [optional]
Can be fine-tuned further on larger prescription datasets for improved accuracy.
Can be integrated into hospital information systems, pharmacy apps, or telemedicine platforms to improve patient health literacy.
### Out-of-Scope Use
This model is NOT intended for clinical decision-making, medical diagnosis, or treatment recommendations.
It should not replace professional medical advice. Not suitable for prescriptions in languages other than English.
## Bias, Risks, and Limitations
Trained on only 8 examples — limited coverage of drug types and medical conditions
May produce inaccurate explanations for uncommon medications
Does not account for patient-specific factors such as allergies or drug interactions
English only — not suitable for multilingual use cases
Should always be used alongside professional medical guidance
### Recommendations
Always display a medical disclaimer alongside model outputs.
Never use this model as a substitute for pharmacist or physician consultation.
Outputs should be reviewed by a healthcare professional before deployment in clinical settings.
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
More information needed for further recommendations.
## How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "integration1857/prescription-simplifier-mistral7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
prompt = """[INST] Convert this prescription into patient-friendly language:
Amoxicillin 500mg TID x 7 days [/INST]"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300, temperature=0.3)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
## Training Details
### Training Data
8 hand-crafted prescription→explanation pairs covering: Amoxicillin, Metformin, Lisinopril, Atorvastatin, Salbutamol, Sertraline, Warfarin, and Pantoprazole.
Each example follows the Mistral [INST]...[/INST] instruction format.
### Training Procedure
#### Preprocessing [optional]
Prescriptions formatted using Mistral instruct template.
Tokenized using the Mistral-7B-v0.3 tokenizer with a maximum sequence length of 512 tokens.
#### Training Hyperparameters
- **Training regime:**
bf16 mixed precision
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, v_proj, k_proj, o_proj
Batch size: 2
Gradient accumulation steps: 4
Learning rate: 2e-4
Epochs: 3
Optimizer: paged_adamw_32bit
#### Speeds, Sizes, Times [optional]
Training time: 4.6 minutes
Hardware: Kaggle T4×2 GPUs (16GB VRAM each)
Trainable parameters: 41.94M (1.11% of 3.78B total)
Base model size: ~14GB (fp16)
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
Held-out prescription examples not seen during training, covering similar drug categories.
#### Factors
Evaluated on clarity of explanation, accuracy of dosage instructions, and completeness of warnings.
#### Metrics
ROUGE-1 score used to measure overlap between generated explanations and reference explanations.
### Results
ROUGE-1: 0.51
Training loss: 1.78
#### Summary
The model achieves reasonable performance on the prescription simplification task given the very small training set.
A larger, more diverse dataset would significantly improve generalisation.
## Model Examination [optional]
## Environmental Impact
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [NVIDIA Tesla T4 ×2]
- **Hours used:** [0.08 hours (4.6 minutes)]
- **Cloud Provider:** [Kaggle (Google Cloud backend)]
- **Compute Region:** [US]
- **Carbon Emitted:** [(< 0.01 kg CO estimated)]
## Technical Specifications [optional]
### Model Architecture and Objective
Based on Mistral-7B-Instruct-v0.3 (decoder-only transformer).
Fine-tuned with QLoRA 4-bit NF4 quantization via bitsandbytes with LoRA adapter layers injected into attention modules.
Objective: causal language modelling on prescriptionexplanation pairs.
### Compute Infrastructure
Kaggle Notebooks free tier T4×2 GPU environment.
#### Hardware
2× NVIDIA Tesla T4 (16GB VRAM each)
#### Software
transformers, peft, trl, bitsandbytes, accelerate, torch 2.0
## Citation [optional]
**BibTeX:**
@misc{prescription-simplifier-2025,
author = {integration1857},
title = {Prescription Simplifier Mistral-7B Fine-tuned for Patient-Friendly Medical Explanations},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/integration1857/prescription-simplifier-mistral7b}
}
**APA:**
integration1857. (2025).Prescription Simplifier Mistral-7B Fine-tuned for Patient-Friendly Medical Explanations.
HuggingFace. https://huggingface.co/integration1857/prescription-simplifier-mistral7b
## Glossary [optional]
https://medium.com/p/49e94536f72b
## More Information [optional]
(https://www.linkedin.com/pulse/doctors-write-code-i-built-ai-translate-madhukar-kumar-3f9cc/?trackingId=v%2BIJoWuLQPuJPt0yNtxe6Q%3D%3D)
## Model Card Authors [optional]
Madhukar Kumar
## Model Card Contact
Via HuggingFace profile: https://huggingface.co/integration1857

87
chat_template.jinja Normal file
View File

@@ -0,0 +1,87 @@
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = none %}
{%- endif %}
{%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
{%- set ns = namespace() %}
{%- set ns.index = 0 %}
{%- for message in loop_messages %}
{%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
{%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
{{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif %}
{%- set ns.index = ns.index + 1 %}
{%- endif %}
{%- endfor %}
{{- bos_token }}
{%- for message in loop_messages %}
{%- if message["role"] == "user" %}
{%- if tools is not none and (message == user_messages[-1]) %}
{{- "[AVAILABLE_TOOLS] [" }}
{%- for tool in tools %}
{%- set tool = tool.function %}
{{- '{"type": "function", "function": {' }}
{%- for key, val in tool.items() if key != "return" %}
{%- if val is string %}
{{- '"' + key + '": "' + val + '"' }}
{%- else %}
{{- '"' + key + '": ' + val|tojson }}
{%- endif %}
{%- if not loop.last %}
{{- ", " }}
{%- endif %}
{%- endfor %}
{{- "}}" }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" }}
{%- endif %}
{%- endfor %}
{{- "[/AVAILABLE_TOOLS]" }}
{%- endif %}
{%- if loop.last and system_message is defined %}
{{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
{%- else %}
{{- "[INST] " + message["content"] + "[/INST]" }}
{%- endif %}
{%- elif message.tool_calls is defined and message.tool_calls is not none %}
{{- "[TOOL_CALLS] [" }}
{%- for tool_call in message.tool_calls %}
{%- set out = tool_call.function|tojson %}
{{- out[:-1] }}
{%- if not tool_call.id is defined or tool_call.id|length != 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
{%- endif %}
{{- ', "id": "' + tool_call.id + '"}' }}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]" + eos_token }}
{%- endif %}
{%- endfor %}
{%- elif message["role"] == "assistant" %}
{{- " " + message["content"]|trim + eos_token}}
{%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
{%- if message.content is defined and message.content.content is defined %}
{%- set content = message.content.content %}
{%- else %}
{%- set content = message.content %}
{%- endif %}
{{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
{%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
{{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
{%- endif %}
{{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}

30
config.json Normal file
View File

@@ -0,0 +1,30 @@
{
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"dtype": "float16",
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pad_token_id": null,
"rms_norm_eps": 1e-05,
"rope_parameters": {
"rope_theta": 1000000.0,
"rope_type": "default"
},
"sliding_window": null,
"tie_word_embeddings": false,
"transformers_version": "5.0.0",
"use_cache": true,
"vocab_size": 32768
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "5.0.0"
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:be499e0a3b96903dc8e5267f664732bda0552d7247fd3200a34e2ae0517b3ad3
size 1962995144

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:31f63d9850d4f830ef8c9fac7ece5904f897c6a80ef42258b584fed8e7f9fff4
size 1988178416

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e2f61d13ef31926dea4d78cdeee4bf2832d4ac9397b4db7a531c426e4d783ca2
size 1937846888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ee8a7d49b9de90a5e388d1d6da3e5add6d28957c97e4389552044cdb12d973c2
size 1988178456

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8ce76a3fe7a0a32fd102b08f328f584630e557eaf68932b205921dcc9d2b4325
size 1937846904

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bafdf550fe9e8cdfc0c7e55acce03634664b6f6c0e6fb8c3cab2f6dae806aaa4
size 1988178456

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:01069b2e700ccef966271cc3811e6bda5748d2c0f5b3a55c0dfa05df0946a92e
size 1937846904

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1ae8e420618cd7b0b43e4ffb802aa67850270ff90c0bec5fe39752ed2f54f5a5
size 755009440

View File

@@ -0,0 +1,299 @@
{
"metadata": {
"total_parameters": 7248023552,
"total_size": 14496047104
},
"weight_map": {
"lm_head.weight": "model-00001-of-00008.safetensors",
"model.embed_tokens.weight": "model-00001-of-00008.safetensors",
"model.layers.0.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.10.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.12.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.12.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.12.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.12.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.12.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.12.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.13.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.14.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.input_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
"model.layers.16.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.16.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.16.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.17.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.17.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.17.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.17.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.17.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.18.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.19.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.2.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.20.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.21.input_layernorm.weight": "model-00005-of-00008.safetensors",
"model.layers.21.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
"model.layers.21.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.21.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.21.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.21.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.21.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.21.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.22.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.22.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.22.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.22.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.22.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.23.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.23.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.23.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.23.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.23.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.23.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.24.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.24.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.24.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.25.input_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
"model.layers.25.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
"model.layers.25.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.25.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.25.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.26.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.26.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.26.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.26.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.26.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.26.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.26.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.26.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.26.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.27.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.27.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.27.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.27.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.27.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.27.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.27.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.27.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.27.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.28.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.28.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.28.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.28.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.28.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.28.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.29.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.29.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.29.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.29.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.29.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.3.input_layernorm.weight": "model-00001-of-00008.safetensors",
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.3.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.3.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.3.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.30.input_layernorm.weight": "model-00007-of-00008.safetensors",
"model.layers.30.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
"model.layers.30.mlp.gate_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.30.mlp.up_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.30.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
"model.layers.30.self_attn.k_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.30.self_attn.o_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.30.self_attn.q_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.30.self_attn.v_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.31.input_layernorm.weight": "model-00008-of-00008.safetensors",
"model.layers.31.mlp.down_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.31.mlp.gate_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.31.mlp.up_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.31.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
"model.layers.31.self_attn.k_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.31.self_attn.o_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.31.self_attn.q_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.31.self_attn.v_proj.weight": "model-00008-of-00008.safetensors",
"model.layers.4.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.input_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
"model.layers.7.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.7.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.7.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.8.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.input_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
"model.norm.weight": "model-00008-of-00008.safetensors"
}
}

275733
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

17
tokenizer_config.json Normal file
View File

@@ -0,0 +1,17 @@
{
"add_prefix_space": true,
"backend": "tokenizers",
"bos_token": "<s>",
"clean_up_tokenization_spaces": false,
"eos_token": "</s>",
"is_local": true,
"legacy": false,
"model_max_length": 1000000000000000019884624838656,
"pad_token": "</s>",
"padding_side": "right",
"sp_model_kwargs": {},
"spaces_between_special_tokens": false,
"tokenizer_class": "PreTrainedTokenizerFast",
"unk_token": "<unk>",
"use_default_system_prompt": false
}