Files
qasem-he-dictalm2-full/README.md
ModelHub XC 047f83ff0a 初始化项目,由ModelHub XC社区提供模型
Model: YonatanDavidov/qasem-he-dictalm2-full
Source: Original Platform
2026-05-02 19:18:44 +08:00

185 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
base_model: dicta-il/dictalm2.0-instruct
library_name: transformers
license: mit
language:
- he
tags:
- qasem
- hebrew
- causal-lm
- semantic-parsing
datasets:
- biu-nlp/Multilingual_QASem_Datasets
pipeline_tag: text-generation
---
# QASem Hebrew Full Model (DictaLM 2.0)
This model performs **QA-based semantic parsing (QASem) in Hebrew**.
## Overview
This repository provides a **fully fine-tuned model** for performing **QA-based semantic parsing (QASem) in Hebrew**.
QASem represents predicateargument structure using **natural-language questionanswer pairs**, rather than predefined semantic role labels. This makes the representation more interpretable and flexible across languages.
The model is based on:
**Base model:** `dicta-il/dictalm2.0-instruct`
and was fully fine-tuned for QA-based semantic parsing.
## ✨ Why this model matters
Traditional semantic role labeling methods rely on fixed label schemas and costly expert annotation.
This model takes a different approach by:
- Representing semantics using **natural-language questionanswer pairs**
- Enabling **automatic dataset construction** via cross-lingual projection
- Supporting **scalable semantic parsing across languages**
- Achieving strong performance with **efficient fine-tuned models**
This makes it possible to build semantic parsers for new languages with minimal cost.
## Use Cases
This model can be used for:
- Research in **QA-based semantic parsing (QASem)** and semantic representation learning
- Extraction of **predicateargument structures** from Hebrew text
- Automatic **dataset creation** for training semantic models in new languages
- Downstream NLP applications such as:
- Information extraction
- Text understanding
- Factuality and attribution evaluation
## Language
- Hebrew 🇮🇱
## Training Data
The model was trained on the **Multilingual QASem Dataset**:
👉 https://huggingface.co/datasets/biu-nlp/Multilingual_QASem_Datasets
The dataset includes:
- Automatically generated QASem annotations
- Train / Development / Test splits
- Multiple languages: **French, Hebrew, Russian**
- Tens of thousands of QA pairs per language
The data was constructed using a cross-lingual projection approach, ensuring scalability across languages.
## 📄 Associated Work
This model and the underlying dataset are introduced in:
[Effective QA-Driven Annotation of Predicate-Argument Relations Across Languages](https://aclanthology.org/2026.eacl-long.112/).
The paper presents the full methodology, dataset construction process, and evaluation across multiple languages.
## 🚀 Quick Start (Recommended)
### Using the XQASem Parser
For a simple and structured interface, you can use the XQASem parser.
### Installation
```bash
pip install xqasem
```
### Basic Example
```python
from xqasem import XQasemParser
parser = XQasemParser.from_language("he")
sentences = [
"המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות."
]
df = parser(sentences)
print(df)
```
## Output Format
The model produces structured predicateargument representations in the form of:
- A predicate (verb or nominal)
- A natural-language question
- A corresponding answer span from the sentence
This structure can be easily converted into tabular or JSON format for downstream use.
### Example Output
| sentence | predicate | predicate_type | question | answer |
| --- | --- | --- | --- | --- |
| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | הדגישו | verb | מי הדגיש משהו? | המומחים |
| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | הדגישו | verb | מה מישהו הדגיש? | שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות |
| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | מאיץ | verb | מה מאיץ משהו? | האלגוריתם החדש |
| המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. | מאיץ | verb | מה משהו מאיץ? | את עיבוד הבקשות המורכבות |
👉 For more details and advanced usage, see the project repository:
https://github.com/JohnnieDavidov/xqasem
## Manual Model Loading (Advanced)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "YonatanDavidov/qasem-he-dictalm2-full"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
```
## Limitations
- Performance may degrade on out-of-domain text
- Complex or ambiguous predicates may lead to inconsistent outputs
- The model is optimized for QASem-style generation and not for general-purpose text generation
## 📄 Citation
If you use this model, please cite our work:
```
@inproceedings{davidov-etal-2026-effective,
title = "Effective {QA}-Driven Annotation of Predicate{--}Argument Relations Across Languages",
author = "Davidov, Jonathan and
Slobodkin, Aviv and
Klein, Shmuel Tomi and
Tsarfaty, Reut and
Dagan, Ido and
Klein, Ayal",
editor = "Demberg, Vera and
Inui, Kentaro and
Marquez, Llu{\'i}s",
booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
month = mar,
year = "2026",
address = "Rabat, Morocco",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.eacl-long.112/",
doi = "10.18653/v1/2026.eacl-long.112",
pages = "2484--2502",
ISBN = "979-8-89176-380-7",
}
```