初始化项目,由ModelHub XC社区提供模型
Model: YonatanDavidov/qasem-he-dictalm2-full Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
184
README.md
Normal file
184
README.md
Normal file
@@ -0,0 +1,184 @@
|
||||
---
|
||||
base_model: dicta-il/dictalm2.0-instruct
|
||||
library_name: transformers
|
||||
license: mit
|
||||
language:
|
||||
- he
|
||||
tags:
|
||||
- qasem
|
||||
- hebrew
|
||||
- causal-lm
|
||||
- semantic-parsing
|
||||
datasets:
|
||||
- biu-nlp/Multilingual_QASem_Datasets
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# QASem Hebrew Full Model (DictaLM 2.0)
|
||||
|
||||
This model performs **QA-based semantic parsing (QASem) in Hebrew**.
|
||||
|
||||
## Overview
|
||||
|
||||
This repository provides a **fully fine-tuned model** for performing **QA-based semantic parsing (QASem) in Hebrew**.
|
||||
|
||||
QASem represents predicate–argument structure using **natural-language question–answer pairs**, rather than predefined semantic role labels. This makes the representation more interpretable and flexible across languages.
|
||||
|
||||
The model is based on:
|
||||
|
||||
**Base model:** `dicta-il/dictalm2.0-instruct`
|
||||
|
||||
and was fully fine-tuned for QA-based semantic parsing.
|
||||
|
||||
|
||||
## ✨ Why this model matters
|
||||
|
||||
Traditional semantic role labeling methods rely on fixed label schemas and costly expert annotation.
|
||||
|
||||
This model takes a different approach by:
|
||||
|
||||
- Representing semantics using **natural-language question–answer pairs**
|
||||
- Enabling **automatic dataset construction** via cross-lingual projection
|
||||
- Supporting **scalable semantic parsing across languages**
|
||||
- Achieving strong performance with **efficient fine-tuned models**
|
||||
|
||||
This makes it possible to build semantic parsers for new languages with minimal cost.
|
||||
|
||||
## Use Cases
|
||||
|
||||
This model can be used for:
|
||||
- Research in **QA-based semantic parsing (QASem)** and semantic representation learning
|
||||
- Extraction of **predicate–argument structures** from Hebrew text
|
||||
- Automatic **dataset creation** for training semantic models in new languages
|
||||
- Downstream NLP applications such as:
|
||||
- Information extraction
|
||||
- Text understanding
|
||||
- Factuality and attribution evaluation
|
||||
|
||||
## Language
|
||||
|
||||
- Hebrew 🇮🇱
|
||||
|
||||
|
||||
## Training Data
|
||||
|
||||
The model was trained on the **Multilingual QASem Dataset**:
|
||||
|
||||
👉 https://huggingface.co/datasets/biu-nlp/Multilingual_QASem_Datasets
|
||||
|
||||
The dataset includes:
|
||||
|
||||
- Automatically generated QASem annotations
|
||||
- Train / Development / Test splits
|
||||
- Multiple languages: **French, Hebrew, Russian**
|
||||
- Tens of thousands of QA pairs per language
|
||||
|
||||
The data was constructed using a cross-lingual projection approach, ensuring scalability across languages.
|
||||
|
||||
|
||||
## 📄 Associated Work
|
||||
|
||||
This model and the underlying dataset are introduced in:
|
||||
[Effective QA-Driven Annotation of Predicate-Argument Relations Across Languages](https://aclanthology.org/2026.eacl-long.112/).
|
||||
|
||||
The paper presents the full methodology, dataset construction process, and evaluation across multiple languages.
|
||||
|
||||
|
||||
## 🚀 Quick Start (Recommended)
|
||||
|
||||
### Using the XQASem Parser
|
||||
|
||||
For a simple and structured interface, you can use the XQASem parser.
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install xqasem
|
||||
```
|
||||
|
||||
### Basic Example
|
||||
|
||||
```python
|
||||
from xqasem import XQasemParser
|
||||
|
||||
parser = XQasemParser.from_language("he")
|
||||
|
||||
sentences = [
|
||||
"המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות."
|
||||
]
|
||||
|
||||
df = parser(sentences)
|
||||
|
||||
print(df)
|
||||
```
|
||||
|
||||
|
||||
## Output Format
|
||||
|
||||
The model produces structured predicate–argument representations in the form of:
|
||||
|
||||
- A predicate (verb or nominal)
|
||||
- A natural-language question
|
||||
- A corresponding answer span from the sentence
|
||||
|
||||
This structure can be easily converted into tabular or JSON format for downstream use.
|
||||
|
||||
|
||||
### Example Output
|
||||
|
||||
| sentence | predicate | predicate_type | question | answer |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | הדגישו | verb | מי הדגיש משהו? | המומחים |
|
||||
| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | הדגישו | verb | מה מישהו הדגיש? | שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות |
|
||||
| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | מאיץ | verb | מה מאיץ משהו? | האלגוריתם החדש |
|
||||
| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | מאיץ | verb | מה משהו מאיץ? | את עיבוד הבקשות המורכבות |
|
||||
|
||||
|
||||
👉 For more details and advanced usage, see the project repository:
|
||||
https://github.com/JohnnieDavidov/xqasem
|
||||
|
||||
## Manual Model Loading (Advanced)
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModelForCausalLM
|
||||
|
||||
model_id = "YonatanDavidov/qasem-he-dictalm2-full"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id)
|
||||
```
|
||||
|
||||
|
||||
## Limitations
|
||||
|
||||
- Performance may degrade on out-of-domain text
|
||||
- Complex or ambiguous predicates may lead to inconsistent outputs
|
||||
- The model is optimized for QASem-style generation and not for general-purpose text generation
|
||||
|
||||
|
||||
## 📄 Citation
|
||||
|
||||
If you use this model, please cite our work:
|
||||
```
|
||||
@inproceedings{davidov-etal-2026-effective,
|
||||
title = "Effective {QA}-Driven Annotation of Predicate{--}Argument Relations Across Languages",
|
||||
author = "Davidov, Jonathan and
|
||||
Slobodkin, Aviv and
|
||||
Klein, Shmuel Tomi and
|
||||
Tsarfaty, Reut and
|
||||
Dagan, Ido and
|
||||
Klein, Ayal",
|
||||
editor = "Demberg, Vera and
|
||||
Inui, Kentaro and
|
||||
Marquez, Llu{\'i}s",
|
||||
booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
|
||||
month = mar,
|
||||
year = "2026",
|
||||
address = "Rabat, Morocco",
|
||||
publisher = "Association for Computational Linguistics",
|
||||
url = "https://aclanthology.org/2026.eacl-long.112/",
|
||||
doi = "10.18653/v1/2026.eacl-long.112",
|
||||
pages = "2484--2502",
|
||||
ISBN = "979-8-89176-380-7",
|
||||
}
|
||||
```
|
||||
28
config.json
Normal file
28
config.json
Normal file
@@ -0,0 +1,28 @@
|
||||
{
|
||||
"_name_or_path": "dicta-il/dictalm2.0-instruct",
|
||||
"architectures": [
|
||||
"MistralForCausalLM"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"bos_token_id": 1,
|
||||
"document_attention": true,
|
||||
"eos_token_id": 2,
|
||||
"head_dim": 128,
|
||||
"hidden_act": "silu",
|
||||
"hidden_size": 4096,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 14336,
|
||||
"max_position_embeddings": 32768,
|
||||
"model_type": "mistral",
|
||||
"num_attention_heads": 32,
|
||||
"num_hidden_layers": 32,
|
||||
"num_key_value_heads": 8,
|
||||
"rms_norm_eps": 1e-05,
|
||||
"rope_theta": 10000.0,
|
||||
"sliding_window": 4096,
|
||||
"tie_word_embeddings": false,
|
||||
"torch_dtype": "float16",
|
||||
"transformers_version": "4.50.0.dev0",
|
||||
"use_cache": false,
|
||||
"vocab_size": 33152
|
||||
}
|
||||
6
generation_config.json
Normal file
6
generation_config.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"_from_model_config": true,
|
||||
"bos_token_id": 1,
|
||||
"eos_token_id": 2,
|
||||
"transformers_version": "4.50.0.dev0"
|
||||
}
|
||||
3
model-00001-of-00008.safetensors
Normal file
3
model-00001-of-00008.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e23de6d21aaf32b8d64b5981f0ea8a9c6ff138808b72f9d45ac1ed53f136f82b
|
||||
size 1899024192
|
||||
3
model-00002-of-00008.safetensors
Normal file
3
model-00002-of-00008.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d251901fc78b1b6cb2e7963f174b6892dffe069f203fdd9928a5ba70e2c428d3
|
||||
size 1946243896
|
||||
3
model-00003-of-00008.safetensors
Normal file
3
model-00003-of-00008.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:833813ee6712fc597b929dcdc4b1426fd237d8345d1637d5b66fd075d1fba0d8
|
||||
size 1979781392
|
||||
3
model-00004-of-00008.safetensors
Normal file
3
model-00004-of-00008.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:00a304181eda7d7b969dc910098aeb188c202e90acb6e882bede433cfb4edc3a
|
||||
size 1946243936
|
||||
3
model-00005-of-00008.safetensors
Normal file
3
model-00005-of-00008.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f6559c05eb1d3921a1cb0b34ae2ea2b60aa460c2b2561efbee58ca8e290cf60a
|
||||
size 1979781416
|
||||
3
model-00006-of-00008.safetensors
Normal file
3
model-00006-of-00008.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:680b9fed1f1ccbd3ae20756a444fa27ef20679fcc1cb6d2f0016b121e6fc6731
|
||||
size 1946243936
|
||||
3
model-00007-of-00008.safetensors
Normal file
3
model-00007-of-00008.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c01da20c8d5a4d63286276e7b7de23ebb6905222d54f996282ff5ef748c1da03
|
||||
size 1979781416
|
||||
3
model-00008-of-00008.safetensors
Normal file
3
model-00008-of-00008.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1df2f10a084f95ca367998a47138c27c1a562d950f658e34ebb4308653b33239
|
||||
size 825271848
|
||||
298
model.safetensors.index.json
Normal file
298
model.safetensors.index.json
Normal file
@@ -0,0 +1,298 @@
|
||||
{
|
||||
"metadata": {
|
||||
"total_size": 14502338560
|
||||
},
|
||||
"weight_map": {
|
||||
"lm_head.weight": "model-00008-of-00008.safetensors",
|
||||
"model.embed_tokens.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.input_layernorm.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.input_layernorm.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.10.input_layernorm.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.input_layernorm.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.12.input_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.13.input_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.input_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.input_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.input_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.17.input_layernorm.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.17.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.17.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.17.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.17.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
|
||||
"model.layers.18.input_layernorm.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.input_layernorm.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.2.input_layernorm.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.20.input_layernorm.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.21.input_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.21.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.21.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
|
||||
"model.layers.22.input_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.22.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.22.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.22.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.22.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.input_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.23.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.input_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.input_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.25.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.26.input_layernorm.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.26.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.26.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.26.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.26.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.26.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.26.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.26.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.26.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
|
||||
"model.layers.27.input_layernorm.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.27.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.27.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.27.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.27.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.27.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.27.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.27.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.27.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.input_layernorm.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.28.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.input_layernorm.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.3.input_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
|
||||
"model.layers.30.input_layernorm.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.30.mlp.down_proj.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.30.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.30.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.30.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.30.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.30.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.30.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.30.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
|
||||
"model.layers.31.input_layernorm.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.31.mlp.down_proj.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.31.mlp.gate_proj.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.31.mlp.up_proj.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.31.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.31.self_attn.k_proj.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.31.self_attn.o_proj.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.31.self_attn.q_proj.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.31.self_attn.v_proj.weight": "model-00008-of-00008.safetensors",
|
||||
"model.layers.4.input_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.input_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.input_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.input_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.8.input_layernorm.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
|
||||
"model.layers.9.input_layernorm.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
|
||||
"model.norm.weight": "model-00008-of-00008.safetensors"
|
||||
}
|
||||
}
|
||||
24
special_tokens_map.json
Normal file
24
special_tokens_map.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"bos_token": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"eos_token": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
},
|
||||
"pad_token": "</s>",
|
||||
"unk_token": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false
|
||||
}
|
||||
}
|
||||
275995
tokenizer.json
Normal file
275995
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
3
tokenizer.model
Normal file
3
tokenizer.model
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4be9ca9094291607e3f9943ff0a0dfd3ceb35d2fcb383aac560a4399725918b8
|
||||
size 512736
|
||||
44
tokenizer_config.json
Normal file
44
tokenizer_config.json
Normal file
@@ -0,0 +1,44 @@
|
||||
{
|
||||
"add_bos_token": true,
|
||||
"add_eos_token": false,
|
||||
"add_prefix_space": null,
|
||||
"added_tokens_decoder": {
|
||||
"0": {
|
||||
"content": "<unk>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"1": {
|
||||
"content": "<s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
},
|
||||
"2": {
|
||||
"content": "</s>",
|
||||
"lstrip": false,
|
||||
"normalized": false,
|
||||
"rstrip": false,
|
||||
"single_word": false,
|
||||
"special": true
|
||||
}
|
||||
},
|
||||
"bos_token": "<s>",
|
||||
"chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]\n' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
|
||||
"clean_up_tokenization_spaces": false,
|
||||
"eos_token": "</s>",
|
||||
"extra_special_tokens": {},
|
||||
"legacy": true,
|
||||
"model_max_length": 1000000000000000019884624838656,
|
||||
"pad_token": "</s>",
|
||||
"sp_model_kwargs": {},
|
||||
"spaces_between_special_tokens": false,
|
||||
"tokenizer_class": "LlamaTokenizer",
|
||||
"unk_token": "<unk>",
|
||||
"use_default_system_prompt": false
|
||||
}
|
||||
Reference in New Issue
Block a user