初始化项目,由ModelHub XC社区提供模型
Model: cogbuji/Mr-Grammatology-clinical-problems-Mistral-7B-0.5 Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
128
README.md
Normal file
128
README.md
Normal file
@@ -0,0 +1,128 @@
|
|||||||
|
---
|
||||||
|
base_model: teknium/OpenHermes-2.5-Mistral-7B
|
||||||
|
license: mit
|
||||||
|
language:
|
||||||
|
- en
|
||||||
|
model_creator: Chime Ogbuji
|
||||||
|
library_name: mlx
|
||||||
|
model_name: Mr-Grammatology-clinical-problems-Mistral-7B-0.5
|
||||||
|
pipeline_tag: text-generation
|
||||||
|
prompt_template: '<|im_start|>system
|
||||||
|
|
||||||
|
{system_message}<|im_end|>
|
||||||
|
|
||||||
|
<|im_start|>user
|
||||||
|
|
||||||
|
{prompt}<|im_end|>
|
||||||
|
|
||||||
|
<|im_start|>assistant
|
||||||
|
'
|
||||||
|
tags:
|
||||||
|
- mlx
|
||||||
|
- medical
|
||||||
|
- health
|
||||||
|
- mistral
|
||||||
|
- instruct
|
||||||
|
- finetune
|
||||||
|
- chatml
|
||||||
|
---
|
||||||
|
|
||||||
|
# Mr-Grammatology-clinical-problems-Mistral-7B-0.5
|
||||||
|

|
||||||
|
|
||||||
|
The name of the model is a homage to Fela Kuti's song __Mr Grammarticalogy-Lisationalsim Is The Boss__ released on the B-side of his 1976 LP [Excuse O](https://www.discogs.com/release/3149841-Fela-And-The-Africa-70-Excuse-O).
|
||||||
|
|
||||||
|
It is a 16/32 QLoRa all linear layers finetune of [teknium/OpenHermes-2.5-Mistral-7B](/teknium/OpenHermes-2.5-Mistral-7B) using [controlled natural language (CNL) phrases](https://github.com/chimezie/django-snomed-ct#controlled-natural-language)
|
||||||
|
generated from the September 23rd release of [SNOMED CT United States Edition](https://www.snomed.org/snomed-ct/Use-SNOMED-CT). The general idea is described in [Domain-Specific Biomedical Ontologies, RALM, and Generative Medical Expert Systems](https://chimezie.medium.com/biomedical-ontology-retrieval-augmented-language-models-using-django-snomed-ct-and-ogbujipt-dfa0d0b150d8).
|
||||||
|
|
||||||
|
It is an experimental model for non-production environments to test how generative AI systems can be trained for use in various medical informatics scenarios.
|
||||||
|
|
||||||
|
The original model was converted to MLX format, quantized, and then subject to continued pretraining using all the active domain-expert text definitions available in SNOMED-CT at a constant learning rate of 1e-5 using
|
||||||
|
[mlx_lm's LoRa finetuning functionality](https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md) with 16 LoRa layers.
|
||||||
|
|
||||||
|
It was then trained on a dataset of 336,762 records of medical terminology **definition instructions** generated from SNOMED-CT using a fork of [django-snomed-ct](https://github.com/chimezie/django-snomed-ct). These definition instructions were generated from the **disorder**, **finding**, **morphological abnormality**, and **situation** hierarchies in SNOMED-CT. This training step was done using [mlx-tuning-fork](https://github.com/chimezie/mlx-tuning-fork) through 42,096 training iterations, with a batch size of 8 at a time, using LoRa on all linear layers.
|
||||||
|
|
||||||
|
There were 51,082 records of more granular definition instructions, part of which were incorporated into the training dataset. However, 40% were kept aside for validation.
|
||||||
|
|
||||||
|
## Use with mlx
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install mlx-lm
|
||||||
|
```
|
||||||
|
|
||||||
|
Download and convert.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ python -m mlx_lm.convert --hf-path cogbuji/Mr-Grammatology-clinical-problems-Mistral-7B-0.5 \
|
||||||
|
--mlx-path /path/to/mlx/model
|
||||||
|
```
|
||||||
|
|
||||||
|
Generate from prompts in commandline (see [Generate Text with LLMs and MLX](https://github.com/ml-explore/mlx-examples/tree/main/llms) for more options )
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ python -m mlx_lm.generate --prompt "How is Cardiomyopathy characterized in form?" \
|
||||||
|
--temp .4 -m 300 --model /path/to/mlx/model --seed 4
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Prompt: <|im_start|>user
|
||||||
|
How is Cardiomyopathy characterized in form?<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
|
||||||
|
Cardiomyopathy is characterized in form by a morphologically abnormal structure located in a myocardium structure
|
||||||
|
```
|
||||||
|
|
||||||
|
## Example of use of 1-shot description prompting
|
||||||
|
Using mlx-tuning-fork with OgbujiPts word looms to construct 1-shot example of description prompting.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ python -m mlx_tuning_fork.training -nt 1200 -t .4 --loom-file=sct_prompt.toml -f chatml config.yaml
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Prompt: <|im_start|>system
|
||||||
|
Give detailed responses. Use critical thinking and think step-by-step.
|
||||||
|
Don't make up answers if you don't know. Use SNOMED-CT definitions as much as possible, describing
|
||||||
|
characterizations in form of morphological abnormalities involved in diagnoses (their signs), their etiology (causes/basis),
|
||||||
|
their related findings.
|
||||||
|
## Question ##
|
||||||
|
What are the signs and etiology of Skin ulcer due to diabetes mellitus?
|
||||||
|
|
||||||
|
## Answer ##
|
||||||
|
SNOMED-CT defines Skin ulcer due to diabetes mellitus as characterized in form by an ulcer located in a skin structure.
|
||||||
|
Ulcer is a mophologic abnormality.
|
||||||
|
The cause or origin of a disorder is part of its etiology.
|
||||||
|
Skin ulcer due to diabetes mellitus is caused by diabetes mellitus.
|
||||||
|
So, the etiology of Skin ulcer due to diabetes mellitus is diabetes mellitus.
|
||||||
|
|
||||||
|
Infected ulcer of skin is an infectious process characterized in form by an ulcer located in a skin structure.
|
||||||
|
Since the signs of Skin ulcer due to diabetes mellitus is also an ulcer, Infected ulcer of skin is
|
||||||
|
an example of a disorder with the same signs as Skin ulcer due to diabetes mellitus.
|
||||||
|
|
||||||
|
<|im_end|>
|
||||||
|
<|im_start|>user
|
||||||
|
|
||||||
|
Take a deep breath and work on this problem step by step.
|
||||||
|
## Question ##
|
||||||
|
The patient has Irritable bowel syndrome.
|
||||||
|
Summarize the the biological processes and signs (characterizations in form)
|
||||||
|
involved in the patient's diagnoses. Give examples of other disorders with the same signs and etiology.
|
||||||
|
<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
|
||||||
|
Irritable bowel syndrome (SNOMED-CT definition: Irritable bowel syndrome) is a functional disorder of the gastrointestinal tract, characterized in form by abnormal bowel movements. It is a clinical diagnosis, and its signs and symptoms include abdominal pain, bloating, and altered bowel movements (diarrhea, constipation, or both).
|
||||||
|
|
||||||
|
Etiology: Irritable bowel syndrome is believed to be caused by factors including abnormal gut motility, gastrointestinal dysrhythmias, and psychological factors.
|
||||||
|
|
||||||
|
Signs: Examples of signs of the disorder include changes in stool consistency and abdominal pain.
|
||||||
|
|
||||||
|
Irritable bowel syndrome is a functional bowel disorder, which is a category of gastrointestinal disorders with similar signs and symptoms. Examples of disorders with the same signs and etiology include:
|
||||||
|
|
||||||
|
1. Functional dyspepsia: It is characterized by an upper abdominal pain or discomfort and has a similar etiology as irritable bowel syndrome. It is a functional disorder of the stomach and small intestine, and its signs include epigastric pain and discomfort.
|
||||||
|
|
||||||
|
2. Chronic idiopathic constipation: It is characterized by chronic constipation and has a similar etiology as irritable bowel syndrome. It is a functional disorder of the colon
|
||||||
|
==========
|
||||||
|
Prompt: 447.658 tokens-per-sec
|
||||||
|
|
||||||
|
```
|
||||||
4
added_tokens.json
Normal file
4
added_tokens.json
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
{
|
||||||
|
"<|im_end|>": 32000,
|
||||||
|
"<|im_start|>": 32001
|
||||||
|
}
|
||||||
80
config.json
Normal file
80
config.json
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
{
|
||||||
|
"vocab_size": 32002,
|
||||||
|
"max_position_embeddings": 32768,
|
||||||
|
"hidden_size": 4096,
|
||||||
|
"intermediate_size": 14336,
|
||||||
|
"num_hidden_layers": 32,
|
||||||
|
"num_attention_heads": 32,
|
||||||
|
"sliding_window": 4096,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"rms_norm_eps": 1e-05,
|
||||||
|
"use_cache": false,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"return_dict": true,
|
||||||
|
"output_hidden_states": false,
|
||||||
|
"output_attentions": false,
|
||||||
|
"torchscript": false,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"use_bfloat16": false,
|
||||||
|
"tf_legacy_loss": false,
|
||||||
|
"pruned_heads": {},
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"chunk_size_feed_forward": 0,
|
||||||
|
"is_encoder_decoder": false,
|
||||||
|
"is_decoder": false,
|
||||||
|
"cross_attention_hidden_size": null,
|
||||||
|
"add_cross_attention": false,
|
||||||
|
"tie_encoder_decoder": false,
|
||||||
|
"max_length": 20,
|
||||||
|
"min_length": 0,
|
||||||
|
"do_sample": false,
|
||||||
|
"early_stopping": false,
|
||||||
|
"num_beams": 1,
|
||||||
|
"num_beam_groups": 1,
|
||||||
|
"diversity_penalty": 0.0,
|
||||||
|
"temperature": 1.0,
|
||||||
|
"top_k": 50,
|
||||||
|
"top_p": 1.0,
|
||||||
|
"typical_p": 1.0,
|
||||||
|
"repetition_penalty": 1.0,
|
||||||
|
"length_penalty": 1.0,
|
||||||
|
"no_repeat_ngram_size": 0,
|
||||||
|
"encoder_no_repeat_ngram_size": 0,
|
||||||
|
"bad_words_ids": null,
|
||||||
|
"num_return_sequences": 1,
|
||||||
|
"output_scores": false,
|
||||||
|
"return_dict_in_generate": false,
|
||||||
|
"forced_bos_token_id": null,
|
||||||
|
"forced_eos_token_id": null,
|
||||||
|
"remove_invalid_values": false,
|
||||||
|
"exponential_decay_length_penalty": null,
|
||||||
|
"suppress_tokens": null,
|
||||||
|
"begin_suppress_tokens": null,
|
||||||
|
"architectures": [
|
||||||
|
"MistralForCausalLM"
|
||||||
|
],
|
||||||
|
"finetuning_task": null,
|
||||||
|
"id2label": {
|
||||||
|
"0": "LABEL_0",
|
||||||
|
"1": "LABEL_1"
|
||||||
|
},
|
||||||
|
"label2id": {
|
||||||
|
"LABEL_0": 0,
|
||||||
|
"LABEL_1": 1
|
||||||
|
},
|
||||||
|
"tokenizer_class": null,
|
||||||
|
"prefix": null,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"pad_token_id": null,
|
||||||
|
"eos_token_id": 32000,
|
||||||
|
"sep_token_id": null,
|
||||||
|
"decoder_start_token_id": null,
|
||||||
|
"task_specific_params": null,
|
||||||
|
"problem_type": null,
|
||||||
|
"_name_or_path": "/Users/oori/medical_llm/raw_models/mlx/other",
|
||||||
|
"transformers_version": "4.37.0",
|
||||||
|
"model_type": "mistral"
|
||||||
|
}
|
||||||
3
model-00001-of-00003.safetensors
Normal file
3
model-00001-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:5a8e70339f19ea9fe5811f6f6bd86a04c2db632762850d24643dd6fd8f6277eb
|
||||||
|
size 5261963013
|
||||||
3
model-00002-of-00003.safetensors
Normal file
3
model-00002-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:3a31a4c8bdbbb58378bc53068c465b8230f39cae23a35d90840e73a143332214
|
||||||
|
size 5352141111
|
||||||
3
model-00003-of-00003.safetensors
Normal file
3
model-00003-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:1c2a7fc34bb8004cbee7787524a66b603c79d260fe224b243853f23387a5691c
|
||||||
|
size 3869426317
|
||||||
298
model.safetensors.index.json
Normal file
298
model.safetensors.index.json
Normal file
@@ -0,0 +1,298 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_size": 14483496960
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.embed_tokens.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.11.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.23.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
||||||
|
"model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
||||||
|
"model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
||||||
|
"model.norm.weight": "model-00003-of-00003.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
91140
tokenizer.json
Normal file
91140
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
BIN
tokenizer.model
(Stored with Git LFS)
Normal file
Binary file not shown.
61
tokenizer_config.json
Normal file
61
tokenizer_config.json
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
{
|
||||||
|
"add_bos_token": true,
|
||||||
|
"add_eos_token": false,
|
||||||
|
"added_tokens_decoder": {
|
||||||
|
"0": {
|
||||||
|
"content": "<unk>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"1": {
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"2": {
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"32000": {
|
||||||
|
"content": "<|im_end|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
},
|
||||||
|
"32001": {
|
||||||
|
"content": "<|im_start|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false,
|
||||||
|
"special": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additional_special_tokens": [],
|
||||||
|
"bos_token": "<s>",
|
||||||
|
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
|
||||||
|
"clean_up_tokenization_spaces": false,
|
||||||
|
"eos_token": "<|im_end|>",
|
||||||
|
"legacy": true,
|
||||||
|
"model_max_length": 1000000000000000019884624838656,
|
||||||
|
"pad_token": null,
|
||||||
|
"sp_model_kwargs": {},
|
||||||
|
"spaces_between_special_tokens": false,
|
||||||
|
"tokenizer_class": "LlamaTokenizer",
|
||||||
|
"trust_remote_code": false,
|
||||||
|
"unk_token": "<unk>",
|
||||||
|
"use_default_system_prompt": true,
|
||||||
|
"use_fast": true
|
||||||
|
}
|
||||||
32
transformers_inference.py
Normal file
32
transformers_inference.py
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
# Code to inference Open Hermes 2.5 with HF Transformers
|
||||||
|
# Requires pytorch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn packages
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||||
|
from transformers import LlamaTokenizer, LlamaForCausalLM, MistralForCausalLM
|
||||||
|
import bitsandbytes, flash_attn
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained('teknium/OpenHermes-2.5-Mistral-7B', trust_remote_code=True)
|
||||||
|
model = MistralForCausalLM.from_pretrained(
|
||||||
|
"teknium/OpenHermes-2.5-Mistral-7B",
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
device_map="auto",#{'': 'cuda:0'},
|
||||||
|
load_in_8bit=False,
|
||||||
|
load_in_4bit=True,
|
||||||
|
use_flash_attention_2=True
|
||||||
|
)
|
||||||
|
|
||||||
|
prompts = [
|
||||||
|
"""<|im_start|>system
|
||||||
|
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
|
||||||
|
<|im_start|>user
|
||||||
|
Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.<|im_end|>
|
||||||
|
<|im_start|>assistant""",
|
||||||
|
]
|
||||||
|
|
||||||
|
for chat in prompts:
|
||||||
|
print(chat)
|
||||||
|
input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
|
||||||
|
generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
|
||||||
|
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
|
||||||
|
print(f"Response: {response}")
|
||||||
Reference in New Issue
Block a user