ModelHub XC 047f83ff0a 初始化项目,由ModelHub XC社区提供模型
Model: YonatanDavidov/qasem-he-dictalm2-full
Source: Original Platform
2026-05-02 19:18:44 +08:00

base_model, library_name, license, language, tags, datasets, pipeline_tag
base_model library_name license language tags datasets pipeline_tag
dicta-il/dictalm2.0-instruct transformers mit
he
qasem
hebrew
causal-lm
semantic-parsing
biu-nlp/Multilingual_QASem_Datasets
text-generation

QASem Hebrew Full Model (DictaLM 2.0)

This model performs QA-based semantic parsing (QASem) in Hebrew.

Overview

This repository provides a fully fine-tuned model for performing QA-based semantic parsing (QASem) in Hebrew.

QASem represents predicateargument structure using natural-language questionanswer pairs, rather than predefined semantic role labels. This makes the representation more interpretable and flexible across languages.

The model is based on:

Base model: dicta-il/dictalm2.0-instruct

and was fully fine-tuned for QA-based semantic parsing.

Why this model matters

Traditional semantic role labeling methods rely on fixed label schemas and costly expert annotation.

This model takes a different approach by:

  • Representing semantics using natural-language questionanswer pairs
  • Enabling automatic dataset construction via cross-lingual projection
  • Supporting scalable semantic parsing across languages
  • Achieving strong performance with efficient fine-tuned models

This makes it possible to build semantic parsers for new languages with minimal cost.

Use Cases

This model can be used for:

  • Research in QA-based semantic parsing (QASem) and semantic representation learning
  • Extraction of predicateargument structures from Hebrew text
  • Automatic dataset creation for training semantic models in new languages
  • Downstream NLP applications such as:
    • Information extraction
    • Text understanding
    • Factuality and attribution evaluation

Language

  • Hebrew 🇮🇱

Training Data

The model was trained on the Multilingual QASem Dataset:

👉 https://huggingface.co/datasets/biu-nlp/Multilingual_QASem_Datasets

The dataset includes:

  • Automatically generated QASem annotations
  • Train / Development / Test splits
  • Multiple languages: French, Hebrew, Russian
  • Tens of thousands of QA pairs per language

The data was constructed using a cross-lingual projection approach, ensuring scalability across languages.

📄 Associated Work

This model and the underlying dataset are introduced in: Effective QA-Driven Annotation of Predicate-Argument Relations Across Languages.

The paper presents the full methodology, dataset construction process, and evaluation across multiple languages.

Using the XQASem Parser

For a simple and structured interface, you can use the XQASem parser.

Installation

pip install xqasem

Basic Example

from xqasem import XQasemParser

parser = XQasemParser.from_language("he")

sentences = [
    "המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות."
]

df = parser(sentences)

print(df)

Output Format

The model produces structured predicateargument representations in the form of:

  • A predicate (verb or nominal)
  • A natural-language question
  • A corresponding answer span from the sentence

This structure can be easily converted into tabular or JSON format for downstream use.

Example Output

sentence predicate predicate_type question answer
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. הדגישו verb מי הדגיש משהו? המומחים
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. הדגישו verb מה מישהו הדגיש? שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. מאיץ verb מה מאיץ משהו? האלגוריתם החדש
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. מאיץ verb מה משהו מאיץ? את עיבוד הבקשות המורכבות

👉 For more details and advanced usage, see the project repository:
https://github.com/JohnnieDavidov/xqasem

Manual Model Loading (Advanced)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "YonatanDavidov/qasem-he-dictalm2-full"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Limitations

  • Performance may degrade on out-of-domain text
  • Complex or ambiguous predicates may lead to inconsistent outputs
  • The model is optimized for QASem-style generation and not for general-purpose text generation

📄 Citation

If you use this model, please cite our work:

@inproceedings{davidov-etal-2026-effective,
    title = "Effective {QA}-Driven Annotation of Predicate{--}Argument Relations Across Languages",
    author = "Davidov, Jonathan  and
      Slobodkin, Aviv  and
      Klein, Shmuel Tomi  and
      Tsarfaty, Reut  and
      Dagan, Ido  and
      Klein, Ayal",
    editor = "Demberg, Vera  and
      Inui, Kentaro  and
      Marquez, Llu{\'i}s",
    booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.eacl-long.112/",
    doi = "10.18653/v1/2026.eacl-long.112",
    pages = "2484--2502",
    ISBN = "979-8-89176-380-7",
}
Description
Model synced from source: YonatanDavidov/qasem-he-dictalm2-full
Readme 656 KiB