Files

ModelHub XC 047f83ff0a 初始化项目，由ModelHub XC社区提供模型

Model: YonatanDavidov/qasem-he-dictalm2-full
Source: Original Platform

2026-05-02 19:18:44 +08:00

5.9 KiB

Raw Permalink Blame History

base_model, library_name, license, language, tags, datasets, pipeline_tag

base_model

library_name

license

language

QASem Hebrew Full Model (DictaLM 2.0)

This model performs QA-based semantic parsing (QASem) in Hebrew.

Overview

This repository provides a fully fine-tuned model for performing QA-based semantic parsing (QASem) in Hebrew.

QASem represents predicate–argument structure using natural-language question–answer pairs, rather than predefined semantic role labels. This makes the representation more interpretable and flexible across languages.

The model is based on:

Base model: dicta-il/dictalm2.0-instruct

and was fully fine-tuned for QA-based semantic parsing.

✨ Why this model matters

Traditional semantic role labeling methods rely on fixed label schemas and costly expert annotation.

This model takes a different approach by:

Representing semantics using natural-language question–answer pairs
Enabling automatic dataset construction via cross-lingual projection
Supporting scalable semantic parsing across languages
Achieving strong performance with efficient fine-tuned models

This makes it possible to build semantic parsers for new languages with minimal cost.

Use Cases

This model can be used for:

Research in QA-based semantic parsing (QASem) and semantic representation learning
Extraction of predicate–argument structures from Hebrew text
Automatic dataset creation for training semantic models in new languages
Downstream NLP applications such as:
- Information extraction
- Text understanding
- Factuality and attribution evaluation

Language

Hebrew 🇮🇱

Training Data

The model was trained on the Multilingual QASem Dataset:

👉 https://huggingface.co/datasets/biu-nlp/Multilingual_QASem_Datasets

The dataset includes:

Automatically generated QASem annotations
Train / Development / Test splits
Multiple languages: French, Hebrew, Russian
Tens of thousands of QA pairs per language

The data was constructed using a cross-lingual projection approach, ensuring scalability across languages.

📄 Associated Work

This model and the underlying dataset are introduced in: Effective QA-Driven Annotation of Predicate-Argument Relations Across Languages.

The paper presents the full methodology, dataset construction process, and evaluation across multiple languages.

🚀 Quick Start (Recommended)

Using the XQASem Parser

For a simple and structured interface, you can use the XQASem parser.

Installation

pip install xqasem

Basic Example

from xqasem import XQasemParser

parser = XQasemParser.from_language("he")

sentences = [
    "המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות."
]

df = parser(sentences)

print(df)

Output Format

The model produces structured predicate–argument representations in the form of:

A predicate (verb or nominal)
A natural-language question
A corresponding answer span from the sentence

This structure can be easily converted into tabular or JSON format for downstream use.

Example Output

sentence	predicate	predicate_type	question	answer
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות.	הדגישו	verb	מי הדגיש משהו?	המומחים
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות.	הדגישו	verb	מה מישהו הדגיש?	שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות.	מאיץ	verb	מה מאיץ משהו?	האלגוריתם החדש
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות.	מאיץ	verb	מה משהו מאיץ?	את עיבוד הבקשות המורכבות

👉 For more details and advanced usage, see the project repository:
https://github.com/JohnnieDavidov/xqasem

Manual Model Loading (Advanced)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "YonatanDavidov/qasem-he-dictalm2-full"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Limitations

Performance may degrade on out-of-domain text
Complex or ambiguous predicates may lead to inconsistent outputs
The model is optimized for QASem-style generation and not for general-purpose text generation

📄 Citation

If you use this model, please cite our work:

@inproceedings{davidov-etal-2026-effective,
    title = "Effective {QA}-Driven Annotation of Predicate{--}Argument Relations Across Languages",
    author = "Davidov, Jonathan  and
      Slobodkin, Aviv  and
      Klein, Shmuel Tomi  and
      Tsarfaty, Reut  and
      Dagan, Ido  and
      Klein, Ayal",
    editor = "Demberg, Vera  and
      Inui, Kentaro  and
      Marquez, Llu{\'i}s",
    booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.eacl-long.112/",
    doi = "10.18653/v1/2026.eacl-long.112",
    pages = "2484--2502",
    ISBN = "979-8-89176-380-7",
}

5.9 KiB Raw Permalink Blame History Unescape Escape