---
library_name: transformers
license: other
base_model: Qwen/Qwen3-32B
pipeline_tag: text-generation
tags:
- psychological-counseling
- counseling-agent
- qwen3
- llama-factory
- full
model-index:
- name: PsychAgent-Qwen3-32B
  results: []
---

# PsychAgent-Qwen3-32B

PsychAgent-Qwen3-32B is a psychological counseling model built on top of `Qwen/Qwen3-32B`. It is the 32B instantiation of **PsychAgent**, an experience-driven lifelong learning framework for AI psychological counseling.

This checkpoint is trained from `Qwen/Qwen3-32B` on two training variants:

- `rft_explicit_skill_0218`: the system prompt retains the skill candidate pool.
- `rft_implicit_skill_0218`: the system prompt removes the skill candidate pool to encourage stronger skill internalization.

## Model description

PsychAgent is designed for **multi-session psychological counseling**. Unlike static SFT-only counseling models, it improves longitudinal consistency and counseling quality through a closed-loop framework with three components:

- **Memory-Augmented Planning Engine (MAPE)**: maintains an evolving client profile and session summaries, then performs session-level planning for longitudinal continuity.
- **Skill Evolution Engine (SEE)**: extracts and organizes practice-grounded therapeutic skills into a hierarchical skill tree.
- **Reinforced Internalization Engine (RIE)**: internalizes successful counseling trajectories via rejection fine-tuning so that useful strategies become more endogenous to the model.

According to the paper, the 32B model is trained with a maximum context length of **32,768 tokens**, using **DeepSpeed ZeRO-3**, **bf16** precision, **history masking**, and a rollout number of **8**.

## Key features

- Longitudinal multi-session counseling with memory-augmented planning.
- Experience-driven skill evolution from historical counseling trajectories.
- Reinforced internalization of high-quality trajectories through rejection fine-tuning.
- Strong benchmark performance on PsychEval across both counselor-side and client-side dimensions.

## Intended uses

This model is intended for:

- research on AI psychological counseling and longitudinal dialogue agents;
- experiments on memory, planning, skill evolution, and lifelong learning for counseling agents;
- benchmarking on multi-session counseling settings similar to PsychEval.

## Out-of-scope use and limitations

This model is **not** a licensed mental health professional and should **not** be used as a substitute for clinical care.

It should **not** be relied on in emergencies, crisis intervention, suicide risk handling, or any high-stakes clinical scenario requiring qualified professionals.

The paper evaluates the model on benchmarked multi-session counseling tasks rather than real-world clinical deployment. The authors also note that future work is needed for more realistic counseling settings and for stronger safety and privacy protections. Some observed improvements should therefore be interpreted as **benchmark trends rather than direct clinical evidence**.

## Training and evaluation data

The experiments are built on **PsychEval**. Following the benchmark protocol, the authors first perform supervised fine-tuning on the released multi-session counseling corpus, and then reuse a pool of **2,000+ client profiles** for rollout training and evaluation.

For each therapeutic school, **140** client profiles are sampled, with **120** used for training and **20** for evaluation.

Reported metrics include shared counseling metrics such as:

- **PANAS**
- **RRO**
- **SRS**
- **CUSTOM_DIM**
- **HTAIS**
- **WAI**
- **DIALOGUE_PLANNING**

The benchmark also includes school-specific metrics such as:

- Behavioral Therapy: **MITI**, **STAI**
- Cognitive Behavioral Therapy: **CTRS**, **BDI_II**
- Postmodernist Therapy: **EFT_TFS**, **SFBT**
- Humanistic-Existential Therapy: **TES**, **CCT**
- Psychodynamic Therapy: **PSC**, **IPO**

## Performance

In the paper, PsychAgent outperforms the compared general-purpose and psychology-specific baselines on all four aggregated PsychEval dimensions.

| Model | Counselor Shared | Counselor Specific | Client Shared | Client Specific |
|---|---:|---:|---:|---:|
| GPT-5.4 | 5.54 | 7.41 | 5.07 | 7.72 |
| Gemini-3 | 5.34 | 7.04 | 4.97 | 7.52 |
| Qwen3-Max | 5.88 | 7.74 | 5.41 | 7.81 |
| DeepSeek-V3.2 | 5.54 | 7.12 | 5.06 | 7.70 |
| PsyLLM | 5.30 | 4.67 | 5.63 | 7.93 |
| PsyDTLLM | 6.10 | 5.43 | 5.27 | 7.42 |
| CPsyCounX | 4.21 | 2.51 | 4.73 | 7.27 |
| TheraMind | 6.25 | 6.94 | 5.48 | 7.83 |
| PsychAgent† (8B) | 7.35 | 7.78 | 5.94 | 8.19 |
| **PsychAgent (32B)** | **7.32** | **7.91** | **5.92** | **8.24** |

The paper also reports human evaluation on **522 matched multi-session dialogues** rated by two human annotators and one LLM rater (**Gemini-3**) across four dimensions:

- **Ethics**
- **Interaction**
- **Intervention**
- **Perception**

PsychAgent ranks first in all three rater columns, ahead of Qwen3-Max and TheraMind. The paper further reports moderate-to-strong inter-rater agreement, with:

- **human-human QWK = 0.675**
- **LLM-human QWK = 0.770 / 0.877**

## Ablation summary

The paper shows that removing any of the three main modules—**MAPE**, **SEE**, or **RIE**—degrades performance.

Among them, removing **SEE** causes the largest drop in the reported ablation, suggesting that skill evolution is especially important under the reported setting.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 16
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- total_eval_batch_size: 128
- optimizer: adamw_torch_fused
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3.0

The paper additionally states that training used:

- **bf16** precision
- **DeepSpeed ZeRO-3**
- maximum context length **32,768**
- rollout number **N = 8**
- **10% warmup**
- two servers with **8 NVIDIA H200 GPUs each**

### Framework versions

- Transformers 4.55.0
- Pytorch 2.9.0+cu128
- Datasets 3.6.0
- Tokenizers 0.21.1

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "ecnu-icalk/PsychAgent-Qwen3-32B"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "system",
        "content": "You are a supportive and cautious psychological counseling assistant. Do not claim to be a licensed clinician.",
    },
    {
        "role": "user",
        "content": "I've been feeling increasingly anxious about job hunting and sleeping poorly for two nights. Can we talk through it step by step?",
    },
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
````

## Citation

If you use this model, please cite the **PsychAgent** paper.

```bibtex
@article{yang2026psychagent,
  title={PsychAgent: An Experience-Driven Lifelong Learning Agent for Self-Evolving Psychological Counselor},
  author={Yang, Yutao and Li, Junsong and Pan, Qianjun and Zhou, Jie and Chen, Kai and Chen, Qin and Zhao, Jingyuan and Zhou, Ningning and Li, Xin and He, Liang},
  journal={arXiv preprint arXiv},
  year={2026}
}
```