初始化项目,由ModelHub XC社区提供模型
Model: DATEXIS/DeepICD-R1-Llama-8B Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
332
README.md
Normal file
332
README.md
Normal file
@@ -0,0 +1,332 @@
|
|||||||
|
---
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
license: other
|
||||||
|
base_model:
|
||||||
|
- meta-llama/Llama-3.1-8B-Instruct
|
||||||
|
tags:
|
||||||
|
- clinical-nlp
|
||||||
|
- medical-coding
|
||||||
|
- icd10
|
||||||
|
- icd-10-cm
|
||||||
|
- reasoning
|
||||||
|
- grpo
|
||||||
|
- rl
|
||||||
|
- verl
|
||||||
|
- llama-3.1
|
||||||
|
- healthcare
|
||||||
|
- diagnosis-prediction
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
library_name: transformers
|
||||||
|
model-index:
|
||||||
|
- name: clinical-llama3p1-8b-o-f-sft-llm-rag
|
||||||
|
results: []
|
||||||
|
---
|
||||||
|
|
||||||
|
# clinical-llama3p1-8b-o-f-sft-llm-rag
|
||||||
|
|
||||||
|
## Model Summary
|
||||||
|
|
||||||
|
`clinical-llama3p1-8b-o-f-sft-llm-rag` is a clinical reasoning model for **single-label ICD-10-CM diagnosis prediction from admission notes**. It is a **GRPO post-trained** variant initialized from an SFT checkpoint derived from the DeepICD-R1 training workflow described in the paper *DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation*. The paper frames ICD-10-CM outcome prediction as a reinforcement learning problem with **format reward**, **hierarchical outcome reward**, and **LLM-as-a-judge reward**, and shows that **SFT + GRPO** gives the strongest overall results, especially for the Llama3.1-8B model family. :contentReference[oaicite:3]{index=3} :contentReference[oaicite:4]{index=4} :contentReference[oaicite:5]{index=5}
|
||||||
|
|
||||||
|
This checkpoint appears to correspond to a **paper-related Llama3.1-8B SFT-initialized GRPO run**, using VERL with 8 rollouts per prompt, an effective batch size of 64, temperature 0.9, and a custom reward that combines outcome, format, and judge-oriented reasoning supervision—consistent with the training recipe described in the paper. :contentReference[oaicite:6]{index=6} :contentReference[oaicite:7]{index=7}
|
||||||
|
|
||||||
|
> **Important:** This model card documents the provided training configuration in the context of the paper. It should not be treated as a verified exact reproduction of the published checkpoint unless you confirm that the underlying SFT checkpoint, prompts, reward code, and evaluation scripts are identical to the released DeepICD-R1 artifacts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Model Details
|
||||||
|
|
||||||
|
### Model Description
|
||||||
|
|
||||||
|
- **Model name:** `DeepICD-R1-Llama-8B`
|
||||||
|
- **Architecture family:** Llama 3.1 8B instruct model, further adapted for clinical reasoning
|
||||||
|
- **Base initialization for this run:** [Llama](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
|
||||||
|
- **Training framework:** VERL (`verl.trainer.main_ppo`)
|
||||||
|
- **RL method:** GRPO (`algorithm.adv_estimator=grpo`)
|
||||||
|
- **Domain:** Clinical NLP
|
||||||
|
- **Task:** prospective single-label ICD-10-CM diagnosis prediction from admission notes
|
||||||
|
- **Paper connection:** DeepICD-R1 paper and training framework :contentReference[oaicite:8]{index=8}
|
||||||
|
|
||||||
|
### Relation to the Paper
|
||||||
|
|
||||||
|
The paper presents DeepICD-R1 as a framework that:
|
||||||
|
1. reformulates ICD-10-CM prediction as a reinforcement learning problem,
|
||||||
|
2. uses a **hierarchical ICD reward** reflecting chapter, category, and full-code correctness,
|
||||||
|
3. constructs a large distilled reasoning dataset from MIMIC-IV admission notes,
|
||||||
|
4. finds that **SFT + GRPO** outperforms either method alone. :contentReference[oaicite:9]{index=9} :contentReference[oaicite:10]{index=10} :contentReference[oaicite:11]{index=11}
|
||||||
|
|
||||||
|
In the paper’s main results table, **Llama3.1-8B-Instruct (SFT + GRPO)** achieves the best reported macro-F1 across chapter, category, and full-code prediction among the evaluated models. Specifically, the paper reports **59.5 F1** at chapter level, **15.6 F1** at category level, and **4.3 F1** at full-code level for that setting. :contentReference[oaicite:12]{index=12}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Intended Use
|
||||||
|
|
||||||
|
This model is intended for **research use** in:
|
||||||
|
|
||||||
|
- clinical reasoning from admission notes
|
||||||
|
- ICD-10-CM diagnosis outcome prediction
|
||||||
|
- reinforcement learning for medical language models
|
||||||
|
- reasoning-trace generation for structured prediction tasks
|
||||||
|
- reproducible study of SFT + GRPO in healthcare NLP
|
||||||
|
|
||||||
|
### Out-of-Scope Use
|
||||||
|
|
||||||
|
This model is **not** intended for:
|
||||||
|
|
||||||
|
- real-world diagnosis
|
||||||
|
- treatment decisions
|
||||||
|
- triage or emergency use
|
||||||
|
- autonomous clinical coding without expert oversight
|
||||||
|
- billing, compliance, or medical record finalization
|
||||||
|
- deployment without task-specific validation and human review
|
||||||
|
|
||||||
|
The paper explicitly states that the system is a **research prototype** and must not be used for real-world diagnosis or clinical decision-making, and warns that generated reasoning may be plausible while still clinically incorrect. :contentReference[oaicite:13]{index=13}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Training Data
|
||||||
|
|
||||||
|
|
||||||
|
In the paper, the underlying task is built from **MIMIC-IV admission notes**, using only records from the hospital split and excluding sections that would leak diagnosis or treatment information. The first annotated diagnosis code is used as the target. The paper reports stratified train/validation/test splits of **65,228 / 9,260 / 18,654** samples. :contentReference[oaicite:14]{index=14} :contentReference[oaicite:15]{index=15}
|
||||||
|
|
||||||
|
The paper also describes an SFT reasoning dataset with **93,142 samples**, **6,368 unique codes**, and an average trace length of **477.3 words**. :contentReference[oaicite:16]{index=16}
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Training Procedure
|
||||||
|
|
||||||
|
### Initialization
|
||||||
|
|
||||||
|
This run starts from an **SFT checkpoint**, rather than directly from the public instruct model. That matches the paper’s conclusion that standalone GRPO is weaker than **SFT + GRPO**, and that supervised reasoning traces are important for fine-grained code prediction. :contentReference[oaicite:17]{index=17} :contentReference[oaicite:18]{index=18}
|
||||||
|
|
||||||
|
### Reinforcement Learning Setup
|
||||||
|
|
||||||
|
The provided config uses:
|
||||||
|
|
||||||
|
- **Algorithm:** GRPO
|
||||||
|
- **Trainer:** VERL PPO entrypoint
|
||||||
|
- **Batch size:** 64
|
||||||
|
- **Rollouts per prompt (`n`):** 8
|
||||||
|
- **Learning rate:** `1e-6`
|
||||||
|
- **Warmup steps:** `80`
|
||||||
|
- **Epochs:** `1`
|
||||||
|
- **Temperature:** `0.9`
|
||||||
|
- **Max prompt length:** `2048`
|
||||||
|
- **Max response length:** `1024`
|
||||||
|
- **vLLM rollout backend**
|
||||||
|
- **dtype:** `bfloat16`
|
||||||
|
|
||||||
|
These settings line up with the paper’s reported GRPO recipe: effective batch size **64**, **8 rollouts per update**, temperature **0.9**, using VERL and vLLM. The paper also notes that the KL regularization term is disabled in the main GRPO setup. :contentReference[oaicite:19]{index=19}
|
||||||
|
|
||||||
|
### Reward Design
|
||||||
|
|
||||||
|
The paper defines three complementary reward components:
|
||||||
|
|
||||||
|
1. **Format reward**
|
||||||
|
Requires one `<think>...</think>` block and one `<diagnosis>...</diagnosis>` block, with the diagnosis matching ICD-10-CM formatting constraints. :contentReference[oaicite:20]{index=20}
|
||||||
|
|
||||||
|
2. **Hierarchical outcome reward**
|
||||||
|
Gives partial credit according to ICD prefix overlap, rewarding correctness at chapter, category, and full-code levels. The paper emphasizes that the first three digits carry especially important information. :contentReference[oaicite:21]{index=21}
|
||||||
|
|
||||||
|
3. **LLM-as-a-judge reward**
|
||||||
|
Scores reasoning quality using an external model with auxiliary ICD information. :contentReference[oaicite:22]{index=22}
|
||||||
|
|
||||||
|
The provided config uses a custom reward function:
|
||||||
|
`verl_batched_compute_score_single_think_trace_and_llm_wo_meili`
|
||||||
|
|
||||||
|
This strongly suggests a reward stack centered on:
|
||||||
|
- single diagnosis output,
|
||||||
|
- reasoning trace formatting,
|
||||||
|
- LLM-based evaluation.
|
||||||
|
|
||||||
|
Because the exact implementation of this function is not included here, this card describes it as **paper-aligned** rather than identical to the published reward code.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Model Inputs and Outputs
|
||||||
|
|
||||||
|
### Input
|
||||||
|
|
||||||
|
The model expects an admission-note style prompt describing a patient presentation and asking for a single ICD-10-CM diagnosis code.
|
||||||
|
|
||||||
|
### Output Format
|
||||||
|
|
||||||
|
The paper’s reward design requires outputs in the form:
|
||||||
|
|
||||||
|
```text
|
||||||
|
<think>
|
||||||
|
...reasoning trace...
|
||||||
|
</think>
|
||||||
|
<diagnosis>
|
||||||
|
ICD_CODE
|
||||||
|
</diagnosis>
|
||||||
|
```
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
The model is trained to produce outputs in a structured format that separates reasoning from the predicted diagnosis. This format is used both during evaluation and for reward computation during reinforcement learning.
|
||||||
|
|
||||||
|
### Example
|
||||||
|
|
||||||
|
<think>
|
||||||
|
The patient presents with ...
|
||||||
|
...
|
||||||
|
</think>
|
||||||
|
|
||||||
|
<diagnosis>
|
||||||
|
M5116
|
||||||
|
</diagnosis>
|
||||||
|
|
||||||
|
The DeepICD-R1 paper includes examples of this structured output format and analyzes full reasoning traces produced by the model.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Evaluation
|
||||||
|
|
||||||
|
### Paper Results Most Relevant to This Configuration
|
||||||
|
|
||||||
|
For the **Llama3.1-8B-Instruct (SFT + GRPO)** setting reported in the DeepICD-R1 paper:
|
||||||
|
|
||||||
|
| Metric | Macro-F1 |
|
||||||
|
|------|------|
|
||||||
|
| Chapter-level | **59.5** |
|
||||||
|
| Category-level | **15.6** |
|
||||||
|
| Full ICD-10 code | **4.3** |
|
||||||
|
|
||||||
|
Additional macro precision and recall values are reported in the paper.
|
||||||
|
|
||||||
|
### Interpretation
|
||||||
|
|
||||||
|
The paper reports several key findings:
|
||||||
|
|
||||||
|
- **SFT + GRPO** was the best-performing setup overall.
|
||||||
|
- **Supervised reasoning traces** significantly improve performance for detailed ICD prediction.
|
||||||
|
- Removing reasoning traces during SFT causes major performance drops.
|
||||||
|
- **Outcome reward and format reward** are essential for stable GRPO training.
|
||||||
|
- **LLM-as-a-judge reward** improves reasoning quality.
|
||||||
|
- Longer reasoning traces often introduce redundancy rather than better reasoning.
|
||||||
|
|
||||||
|
### Caveat
|
||||||
|
|
||||||
|
These paper metrics should only be attached to this exact model if the following conditions hold:
|
||||||
|
|
||||||
|
- the SFT checkpoint is the same one used in the paper
|
||||||
|
- the reward implementation matches the released code
|
||||||
|
- preprocessing and evaluation scripts are identical
|
||||||
|
|
||||||
|
Otherwise, treat these numbers as **reference results for the corresponding experimental setting**, not guaranteed metrics for this checkpoint.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
The DeepICD-R1 paper highlights several limitations that apply to this model family:
|
||||||
|
|
||||||
|
- All experiments use **English-language MIMIC-IV admission notes**.
|
||||||
|
- **ICD label imbalance** strongly affects rare-code performance.
|
||||||
|
- Reasoning traces may appear coherent but still be **clinically incorrect**.
|
||||||
|
- Automatic reward signals and LLM judges are **proxies for expert feedback**.
|
||||||
|
- GRPO training remains **computationally expensive** despite efficiency improvements.
|
||||||
|
|
||||||
|
The paper also reports several clinically relevant failure modes:
|
||||||
|
|
||||||
|
- premature diagnostic closure
|
||||||
|
- insufficient awareness of disease severity
|
||||||
|
- plausible but incomplete explanations
|
||||||
|
- reduced performance for long-tail ICD chapters
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ethical Considerations
|
||||||
|
|
||||||
|
This model was trained using **de-identified clinical data derived from MIMIC-IV** within a research setting.
|
||||||
|
|
||||||
|
While the dataset removes patient identifiers, potential biases remain due to:
|
||||||
|
|
||||||
|
- demographic imbalance in the dataset
|
||||||
|
- hospital-specific clinical practices
|
||||||
|
- uneven disease prevalence
|
||||||
|
|
||||||
|
These biases may propagate into model outputs.
|
||||||
|
|
||||||
|
This model should be used only for:
|
||||||
|
|
||||||
|
- research
|
||||||
|
- benchmarking
|
||||||
|
- method development
|
||||||
|
- controlled analysis with domain experts
|
||||||
|
|
||||||
|
It **must not be used as a clinical decision system** or as a substitute for professional medical judgment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Hardware and Training Setup
|
||||||
|
|
||||||
|
Training configuration derived from the provided GRPO experiment:
|
||||||
|
|
||||||
|
- **GPUs:** 4
|
||||||
|
- **Nodes:** 1
|
||||||
|
- **Rollout backend:** vLLM
|
||||||
|
- **Gradient checkpointing:** enabled
|
||||||
|
- **Torch compile:** enabled
|
||||||
|
- **FSDP offload:** disabled
|
||||||
|
- **GPU memory utilization:** 0.4
|
||||||
|
|
||||||
|
The DeepICD-R1 experiments used **VERL with vLLM rollouts** under a consistent decoding setup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Transformers Example
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||||
|
|
||||||
|
model_id = "YOUR_ORG/clinical-llama3p1-8b-o-f-sft-llm-rag"
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
torch_dtype="auto",
|
||||||
|
device_map="auto",
|
||||||
|
)
|
||||||
|
|
||||||
|
prompt = """You are a clinical reasoning model.
|
||||||
|
Read the admission note and produce:
|
||||||
|
1) a concise reasoning trace in <think> tags
|
||||||
|
2) a single ICD-10-CM diagnosis in <diagnosis> tags
|
||||||
|
|
||||||
|
[ADMISSION NOTE HERE]
|
||||||
|
"""
|
||||||
|
|
||||||
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
||||||
|
outputs = model.generate(
|
||||||
|
**inputs,
|
||||||
|
max_new_tokens=512,
|
||||||
|
do_sample=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||||
|
```
|
||||||
|
## Recommended Inference Practices
|
||||||
|
|
||||||
|
- Keep prompts close to the format used during training.
|
||||||
|
- Validate predicted diagnosis codes against valid ICD-10-CM formatting rules.
|
||||||
|
- Use expert human review when interpreting outputs.
|
||||||
|
- Avoid exposing reasoning traces directly to end users in safety-critical environments.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use this model or the associated training approach, please cite the DeepICD-R1 paper:
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@inproceedings{roehr2026deepicdr1,
|
||||||
|
title={DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation},
|
||||||
|
author={R{\"o}hr, Tom and Steffek, Thomas and Teucher, Roman and Bressem, Keno and Figueroa, Alexei and Grundmann, Paul and Troeger, Peter and Gers, Felix and L{\"o}ser, Alexander},
|
||||||
|
booktitle={Proceedings of LREC-COLING 2026},
|
||||||
|
year={2026}
|
||||||
|
}
|
||||||
|
|
||||||
109
chat_template.jinja
Normal file
109
chat_template.jinja
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
{{- bos_token }}
|
||||||
|
{%- if custom_tools is defined %}
|
||||||
|
{%- set tools = custom_tools %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if not tools_in_user_message is defined %}
|
||||||
|
{%- set tools_in_user_message = true %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if not date_string is defined %}
|
||||||
|
{%- set date_string = "26 Jul 2024" %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if not tools is defined %}
|
||||||
|
{%- set tools = none %}
|
||||||
|
{%- endif %}
|
||||||
|
|
||||||
|
{#- This block extracts the system message, so we can slot it into the right place. #}
|
||||||
|
{%- if messages[0]['role'] == 'system' %}
|
||||||
|
{%- set system_message = messages[0]['content']|trim %}
|
||||||
|
{%- set messages = messages[1:] %}
|
||||||
|
{%- else %}
|
||||||
|
{%- set system_message = "" %}
|
||||||
|
{%- endif %}
|
||||||
|
|
||||||
|
{#- System message + builtin tools #}
|
||||||
|
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
|
||||||
|
{%- if builtin_tools is defined or tools is not none %}
|
||||||
|
{{- "Environment: ipython\n" }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if builtin_tools is defined %}
|
||||||
|
{{- "Tools: " + builtin_tools | reject('equalto', 'code_interpreter') | join(", ") + "\n\n"}}
|
||||||
|
{%- endif %}
|
||||||
|
{{- "Cutting Knowledge Date: December 2023\n" }}
|
||||||
|
{{- "Today Date: " + date_string + "\n\n" }}
|
||||||
|
{%- if tools is not none and not tools_in_user_message %}
|
||||||
|
{{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
|
||||||
|
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
|
||||||
|
{{- "Do not use variables.\n\n" }}
|
||||||
|
{%- for t in tools %}
|
||||||
|
{{- t | tojson(indent=4) }}
|
||||||
|
{{- "\n\n" }}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- endif %}
|
||||||
|
{{- system_message }}
|
||||||
|
{{- "<|eot_id|>" }}
|
||||||
|
|
||||||
|
{#- Custom tools are passed in a user message with some extra guidance #}
|
||||||
|
{%- if tools_in_user_message and not tools is none %}
|
||||||
|
{#- Extract the first user message so we can plug it in here #}
|
||||||
|
{%- if messages | length != 0 %}
|
||||||
|
{%- set first_user_message = messages[0]['content']|trim %}
|
||||||
|
{%- set messages = messages[1:] %}
|
||||||
|
{%- else %}
|
||||||
|
{{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
|
||||||
|
{{- "Given the following functions, please respond with a JSON for a function call " }}
|
||||||
|
{{- "with its proper arguments that best answers the given prompt.\n\n" }}
|
||||||
|
{{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
|
||||||
|
{{- "Do not use variables.\n\n" }}
|
||||||
|
{%- for t in tools %}
|
||||||
|
{{- t | tojson(indent=4) }}
|
||||||
|
{{- "\n\n" }}
|
||||||
|
{%- endfor %}
|
||||||
|
{{- first_user_message + "<|eot_id|>"}}
|
||||||
|
{%- endif %}
|
||||||
|
|
||||||
|
{%- for message in messages %}
|
||||||
|
{%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
|
||||||
|
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
|
||||||
|
{%- elif 'tool_calls' in message %}
|
||||||
|
{%- if not message.tool_calls|length == 1 %}
|
||||||
|
{{- raise_exception("This model only supports single tool-calls at once!") }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- set tool_call = message.tool_calls[0].function %}
|
||||||
|
{%- if builtin_tools is defined and tool_call.name in builtin_tools %}
|
||||||
|
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
|
||||||
|
{{- "<|python_tag|>" + tool_call.name + ".call(" }}
|
||||||
|
{%- for arg_name, arg_val in tool_call.arguments | items %}
|
||||||
|
{{- arg_name + '="' + arg_val + '"' }}
|
||||||
|
{%- if not loop.last %}
|
||||||
|
{{- ", " }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
||||||
|
{{- ")" }}
|
||||||
|
{%- else %}
|
||||||
|
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
|
||||||
|
{{- '{"name": "' + tool_call.name + '", ' }}
|
||||||
|
{{- '"parameters": ' }}
|
||||||
|
{{- tool_call.arguments | tojson }}
|
||||||
|
{{- "}" }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if builtin_tools is defined %}
|
||||||
|
{#- This means we're in ipython mode #}
|
||||||
|
{{- "<|eom_id|>" }}
|
||||||
|
{%- else %}
|
||||||
|
{{- "<|eot_id|>" }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- elif message.role == "tool" or message.role == "ipython" %}
|
||||||
|
{{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
|
||||||
|
{%- if message.content is mapping or message.content is iterable %}
|
||||||
|
{{- message.content | tojson }}
|
||||||
|
{%- else %}
|
||||||
|
{{- message.content }}
|
||||||
|
{%- endif %}
|
||||||
|
{{- "<|eot_id|>" }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- if add_generation_prompt %}
|
||||||
|
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
|
||||||
|
{%- endif %}
|
||||||
36
config.json
Normal file
36
config.json
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"LlamaForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 128000,
|
||||||
|
"eos_token_id": 128009,
|
||||||
|
"head_dim": 128,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 4096,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 14336,
|
||||||
|
"max_position_embeddings": 131072,
|
||||||
|
"mlp_bias": false,
|
||||||
|
"model_type": "llama",
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"num_hidden_layers": 32,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"pad_token_id": 128009,
|
||||||
|
"pretraining_tp": 1,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"rope_scaling": {
|
||||||
|
"factor": 8.0,
|
||||||
|
"high_freq_factor": 4.0,
|
||||||
|
"low_freq_factor": 1.0,
|
||||||
|
"original_max_position_embeddings": 8192,
|
||||||
|
"rope_type": "llama3"
|
||||||
|
},
|
||||||
|
"rope_theta": 500000.0,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.53.3",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 128256
|
||||||
|
}
|
||||||
12
generation_config.json
Normal file
12
generation_config.json
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
{
|
||||||
|
"bos_token_id": 128000,
|
||||||
|
"do_sample": true,
|
||||||
|
"eos_token_id": [
|
||||||
|
128001,
|
||||||
|
128008,
|
||||||
|
128009
|
||||||
|
],
|
||||||
|
"temperature": 0.6,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"transformers_version": "4.53.3"
|
||||||
|
}
|
||||||
3
model-00001-of-00004.safetensors
Normal file
3
model-00001-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d60acc960c38f2efc81f560d4f745fc9890fb170e0081702feaf0f99538b26d5
|
||||||
|
size 4949478560
|
||||||
3
model-00002-of-00004.safetensors
Normal file
3
model-00002-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:47f4275fafd8caad4c5cdb94d8ba9b1f0a0b7e8251747713677ad820b7a8ab3f
|
||||||
|
size 4928470520
|
||||||
3
model-00003-of-00004.safetensors
Normal file
3
model-00003-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:38965f3a67dafd456e5264f99375bf83bdcf54ab6d8fd550f2e1f03a98466a06
|
||||||
|
size 4966231848
|
||||||
3
model-00004-of-00004.safetensors
Normal file
3
model-00004-of-00004.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:0188469b428e8c8a9f5a6579110cd39cc14c33c0dd0cdd3f4546fca8374514cc
|
||||||
|
size 1216375408
|
||||||
299
model.safetensors.index.json
Normal file
299
model.safetensors.index.json
Normal file
@@ -0,0 +1,299 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_parameters": 8030261248,
|
||||||
|
"total_size": 16060522496
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.embed_tokens.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.1.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.10.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.11.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.11.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.12.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.12.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.12.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.13.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.14.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.15.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.16.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.16.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.17.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.18.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.18.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.18.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.19.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.19.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.19.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.19.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.2.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.2.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.20.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.20.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.20.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.20.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.21.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.21.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.22.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.22.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.23.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.23.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.24.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.24.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.25.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.25.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.26.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.26.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.27.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.27.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.27.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.28.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.28.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.29.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.29.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.3.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.3.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.3.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.30.input_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.30.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.31.input_layernorm.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.31.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.4.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.4.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.4.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.4.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.4.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.5.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.5.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.5.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.5.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.6.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.7.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.input_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.9.input_layernorm.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
|
||||||
|
"model.norm.weight": "model-00003-of-00004.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|begin_of_text|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|eot_id|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|eot_id|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
2063
tokenizer_config.json
Normal file
2063
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user