初始化项目，由ModelHub XC社区提供模型

Model: YonatanDavidov/qasem-he-dictalm2-full Source: Original Platform
2026-05-02 19:18:44 +08:00
commit 047f83ff0a
17 changed files with 276641 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,184 @@
+---
+base_model: dicta-il/dictalm2.0-instruct
+library_name: transformers
+license: mit
+language:
+- he
+tags:
+- qasem
+- hebrew
+- causal-lm
+- semantic-parsing
+datasets:
+- biu-nlp/Multilingual_QASem_Datasets
+pipeline_tag: text-generation
+---
+
+# QASem Hebrew Full Model (DictaLM 2.0)
+
+This model performs **QA-based semantic parsing (QASem) in Hebrew**.
+
+## Overview
+
+This repository provides a **fully fine-tuned model** for performing **QA-based semantic parsing (QASem) in Hebrew**.
+
+QASem represents predicate–argument structure using **natural-language question–answer pairs**, rather than predefined semantic role labels. This makes the representation more interpretable and flexible across languages.
+
+The model is based on:
+
+**Base model:** `dicta-il/dictalm2.0-instruct`
+
+and was fully fine-tuned for QA-based semantic parsing.
+
+
+## ✨ Why this model matters
+
+Traditional semantic role labeling methods rely on fixed label schemas and costly expert annotation.
+
+This model takes a different approach by:
+
+- Representing semantics using **natural-language question–answer pairs**
+- Enabling **automatic dataset construction** via cross-lingual projection
+- Supporting **scalable semantic parsing across languages**
+- Achieving strong performance with **efficient fine-tuned models**
+
+This makes it possible to build semantic parsers for new languages with minimal cost.
+
+## Use Cases
+
+This model can be used for:
+- Research in **QA-based semantic parsing (QASem)** and semantic representation learning
+- Extraction of **predicate–argument structures** from Hebrew text  
+- Automatic **dataset creation** for training semantic models in new languages  
+- Downstream NLP applications such as:
+  - Information extraction  
+  - Text understanding  
+  - Factuality and attribution evaluation  
+
+## Language
+
+- Hebrew 🇮🇱
+
+
+## Training Data
+
+The model was trained on the **Multilingual QASem Dataset**:
+
+👉 https://huggingface.co/datasets/biu-nlp/Multilingual_QASem_Datasets  
+
+The dataset includes:
+
+- Automatically generated QASem annotations  
+- Train / Development / Test splits  
+- Multiple languages: **French, Hebrew, Russian**  
+- Tens of thousands of QA pairs per language  
+
+The data was constructed using a cross-lingual projection approach, ensuring scalability across languages.
+
+
+## 📄 Associated Work
+
+This model and the underlying dataset are introduced in:
+[Effective QA-Driven Annotation of Predicate-Argument Relations Across Languages](https://aclanthology.org/2026.eacl-long.112/).
+
+The paper presents the full methodology, dataset construction process, and evaluation across multiple languages.
+
+
+## 🚀 Quick Start (Recommended)
+
+### Using the XQASem Parser
+
+For a simple and structured interface, you can use the XQASem parser.
+
+### Installation
+
+```bash
+pip install xqasem
+```
+
+### Basic Example
+
+```python
+from xqasem import XQasemParser
+
+parser = XQasemParser.from_language("he")
+
+sentences = [
+    "המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות."
+]
+
+df = parser(sentences)
+
+print(df)
+```
+
+
+## Output Format
+
+The model produces structured predicate–argument representations in the form of:
+
+- A predicate (verb or nominal)
+- A natural-language question
+- A corresponding answer span from the sentence
+
+This structure can be easily converted into tabular or JSON format for downstream use.
+
+
+### Example Output
+
+| sentence | predicate | predicate_type | question | answer |
+| --- | --- | --- | --- | --- |
+| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | הדגישו | verb | מי הדגיש משהו? | המומחים |
+| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | הדגישו | verb | מה מישהו הדגיש? | שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות |
+| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | מאיץ | verb | מה מאיץ משהו? | האלגוריתם החדש |
+| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | מאיץ | verb | מה משהו מאיץ? | את עיבוד הבקשות המורכבות |
+
+
+👉 For more details and advanced usage, see the project repository:  
+https://github.com/JohnnieDavidov/xqasem
+
+## Manual Model Loading (Advanced)
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+model_id = "YonatanDavidov/qasem-he-dictalm2-full"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+```
+
+
+## Limitations
+
+- Performance may degrade on out-of-domain text
+- Complex or ambiguous predicates may lead to inconsistent outputs
+- The model is optimized for QASem-style generation and not for general-purpose text generation
+
+
+## 📄 Citation
+
+If you use this model, please cite our work:
+```
+@inproceedings{davidov-etal-2026-effective,
+    title = "Effective {QA}-Driven Annotation of Predicate{--}Argument Relations Across Languages",
+    author = "Davidov, Jonathan  and
+      Slobodkin, Aviv  and
+      Klein, Shmuel Tomi  and
+      Tsarfaty, Reut  and
+      Dagan, Ido  and
+      Klein, Ayal",
+    editor = "Demberg, Vera  and
+      Inui, Kentaro  and
+      Marquez, Llu{\'i}s",
+    booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
+    month = mar,
+    year = "2026",
+    address = "Rabat, Morocco",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2026.eacl-long.112/",
+    doi = "10.18653/v1/2026.eacl-long.112",
+    pages = "2484--2502",
+    ISBN = "979-8-89176-380-7",
+}
+```