Files

ModelHub XC eaa8605e92 初始化项目，由ModelHub XC社区提供模型

Model: khazarai/HeisenbergQ-0.5B-RL
Source: Original Platform

2026-06-01 10:57:17 +08:00

2.8 KiB

Raw Permalink Blame History

library_name, tags, license, datasets, language, base_model, pipeline_tag

library_name

Model Card for HeisenbergQ-0.5B

Model Details

HeisenbergQ-0.5B is a fine-tuned version of Qwen2.5-0.5B-Instruct, optimized for quantum physics reasoning using GRPO reinforcement learning with custom reward functions. This model is trained to produce structured answers in XML format with and tags. It excels at step-by-step logical reasoning in physics-related problems.

Model Description

Language(s) (NLP): English
License: MIT
Finetuned from model: Qwen/Qwen2.5-0.5B-Instruct
Fine-Tuning Method: GRPO with LoRA
Domain: Quantum Physics
Dataset: jilp00/YouToks-Instruct-Quantum-Physics-II

Uses

Direct Use

Primary: Solving and reasoning through quantum physics problems
Secondary: General scientific reasoning in math & physics
Not for: General-purpose conversation (model is specialized)

Bias, Risks, and Limitations

Trained only on ~1K samples (domain-specific)
May hallucinate outside physics domain
Small 0.5B parameter size = lightweight, but reasoning depth is limited compared to larger models

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("khazarai/HeisenbergQ-0.5B-RL")
model = AutoModelForCausalLM.from_pretrained(
    "khazarai/HeisenbergQ-0.5B-RL",
    device_map={"": 0}
)

system = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

question = """
What is the significance of setting mass equal to 1 in a quantum dynamical system, and how does it impact the formulation of the Hamiltonian and the operators?
"""

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 1800,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

Training Details

Training Procedure

Training Method: GRPO (Grouped Relative Policy Optimization)
Reward Models: Reasoning Quality Reward: Encourages logical markers & coherent chains of thought
Token Count Reward: Prevents under- or over-explaining
XML Reward: Enforces / format
Soft Format Reward: Ensures graceful handling of edge cases
Steps: ~390 steps, 3 epochs
Batch Size: 16 (with 2 generations per prompt)

2.8 KiB Raw Permalink Blame History