初始化项目，由ModelHub XC社区提供模型

Model: DATEXIS/DeepICD-R1-7B Source: Original Platform
2026-05-05 07:29:42 +08:00
commit 8170fd0aa8
16 changed files with 152498 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,323 @@
+---
+language:
+- en
+license: other
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- clinical-nlp
+- medical-coding
+- icd10
+- icd-10-cm
+- reasoning
+- reinforcement-learning
+- grpo
+- healthcare
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+---
+
+# DeepICD-R1-7B
+
+## Model Summary
+
+**DeepICD-R1-7B** is a clinical reasoning language model for **ICD-10-CM diagnosis outcome prediction from admission notes**.  
+It is derived from **Qwen2.5-7B-Instruct** and trained using the **DeepICD-R1 framework**, which combines structured reasoning traces with reinforcement learning and hierarchical reward signals.
+
+The model is designed to predict a **single ICD-10-CM diagnosis code** from clinical text while producing an interpretable reasoning trace explaining the decision.
+
+The training methodology follows the approach described in the paper:
+
+**DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation**
+
+This work frames clinical diagnosis prediction as a **reasoning task optimized through reinforcement learning**.
+
+---
+
+# Model Details
+
+- **Model name:** DeepICD-R1-7B  
+- **Organization:** DATEXIS  
+- **Base model:** Qwen2.5-7B-Instruct  
+- **Parameters:** ~7B  
+- **Task:** Single ICD-10-CM diagnosis prediction from admission notes  
+- **Training paradigm:** Supervised reasoning + reinforcement learning  
+- **Framework:** VERL RL trainer  
+- **Domain:** Clinical NLP / healthcare reasoning  
+
+The Qwen2.5-7B-Instruct architecture is a **7-billion-parameter instruction-tuned language model designed for instruction following and long-form generation tasks**. :contentReference[oaicite:1]{index=1}
+
+---
+
+# Intended Use
+
+This model is intended for **research purposes**, including:
+
+- clinical reasoning research
+- ICD-10-CM coding prediction
+- reinforcement learning for language models
+- reasoning trace generation
+- structured prediction from clinical text
+
+### Out-of-Scope Use
+
+This model **must not be used for**:
+
+- medical diagnosis
+- clinical decision support
+- patient triage
+- automated medical coding without expert supervision
+- billing or compliance workflows
+
+---
+
+# Training Methodology
+
+The **DeepICD-R1 framework** treats diagnosis prediction as a reasoning problem.
+
+Training combines:
+
+### 1. Supervised reasoning traces
+A dataset of reasoning chains explaining diagnosis predictions.
+
+### 2. Reinforcement learning optimization
+
+Training uses **Group Relative Policy Optimization (GRPO)** to improve reasoning and prediction accuracy.
+
+### 3. Hierarchical reward signals
+
+Rewards are aligned with the hierarchical structure of ICD codes.
+
+The reward function combines:
+
+- **format reward** — correct reasoning + diagnosis structure  
+- **outcome reward** — correct diagnosis prediction  
+- **hierarchical reward** — partial credit for correct ICD prefixes  
+
+This design encourages models to produce both **accurate diagnoses and structured reasoning**.
+
+---
+
+# Training Data
+
+The training task uses **clinical admission notes paired with ICD-10-CM diagnosis codes**, derived from de-identified electronic health record datasets such as **MIMIC-IV**.
+
+Task formulation:
+
+**Input**
+
+Clinical admission note describing patient presentation.
+
+**Output**
+
+Structured reasoning trace and predicted ICD-10-CM code.
+
+---
+
+# Output Format
+
+The model is trained to produce structured outputs separating reasoning from the final diagnosis.
+
+### Example
+
+```text
+<think>
+The patient presents with ...
+Symptoms and clinical history suggest ...
+...
+</think>
+
+<diagnosis>
+M5116
+</diagnosis>
+```
+## Training Configuration
+
+The model was trained using the **VERL reinforcement learning trainer** with **Group Relative Policy Optimization (GRPO)**, following the DeepICD-R1 training framework.
+
+### Core Training Parameters
+
+| Parameter | Value |
+|-----------|------|
+| Algorithm | GRPO |
+| Training framework | VERL (`verl.trainer.main_ppo`) |
+| Base model | Qwen2.5-7B-Instruct |
+| Training batch size | 64 |
+| PPO mini batch size | 64 |
+| PPO micro batch size per GPU | 16 |
+| Learning rate | 1e-6 |
+| LR warmup steps | 80 |
+| Total epochs | 1 |
+| Max prompt length | 2048 tokens |
+| Max response length | 1024 tokens |
+
+### Rollout / Generation Settings
+
+| Parameter | Value |
+|-----------|------|
+| Rollout engine | vLLM |
+| Samples per prompt (`n`) | 8 |
+| Temperature | 0.9 |
+| Top-k | disabled |
+| dtype | bfloat16 |
+| Tensor parallel size | 1 |
+| GPU memory utilization | 0.4 |
+
+### Optimization Details
+
+| Parameter | Value |
+|-----------|------|
+| Entropy coefficient | 0.001 |
+| KL controller coefficient | 0.001 |
+| KL loss | disabled |
+| Gradient checkpointing | enabled |
+| Torch compile | enabled |
+| FSDP param offload | disabled |
+| FSDP optimizer offload | disabled |
+
+### Hardware
+
+| Component | Value |
+|-----------|------|
+| GPUs | 4 |
+| Nodes | 1 |
+| Precision | bfloat16 |
+
+### Reward Function
+
+Training uses a **custom batched reward function** combining several reward signals:
+
+- **Outcome reward** — correct ICD-10 prediction
+- **Format reward** — correct `<think>` and `<diagnosis>` structure
+- **Hierarchical reward** — partial credit for ICD prefix matches
+- **Reasoning reward** — encourages meaningful reasoning traces
+- **LLM-based reward** — optional external judge scoring
+
+These rewards align the model toward producing **both accurate diagnoses and structured reasoning traces**.
+
+The reasoning trace provides transparency into how the diagnosis was derived from the clinical note.
+
+---
+
+## Evaluation
+
+Evaluation follows the methodology described in the **DeepICD-R1 paper**.
+
+Performance is measured using **macro-averaged F1 scores** at multiple levels of the ICD hierarchy.
+
+| Level | Description |
+|------|-------------|
+| Chapter | Broad ICD category |
+| Category | First three digits |
+| Full code | Complete ICD-10 code |
+
+Hierarchical evaluation allows partial credit when the model predicts the correct high-level diagnostic category even if the full code is incorrect.
+
+---
+
+## Limitations
+
+Models following the **DeepICD-R1 framework** share several limitations.
+
+### Dataset limitations
+
+- Training data consists primarily of **English clinical notes**
+- Distribution reflects **hospital-specific patient populations**
+- ICD labels are **highly imbalanced**, affecting rare diagnoses
+
+### Model limitations
+
+- Reasoning traces may appear convincing while being incorrect
+- Predictions may fail for rare or long-tail diagnoses
+- Models may demonstrate **premature diagnostic closure**
+- Reinforcement learning rewards are only proxies for expert feedback
+
+---
+
+## Ethical Considerations
+
+This model is trained on **de-identified clinical data** and intended strictly for research.
+
+### Potential risks
+
+- propagation of dataset biases  
+- overconfidence in generated reasoning  
+- misuse in clinical decision making  
+
+### Appropriate safeguards
+
+- expert oversight  
+- dataset bias evaluation  
+- fairness audits  
+- controlled deployment environments  
+
+---
+
+## Hardware and Training Setup
+
+Typical training configuration for models in this family includes:
+
+- **GPUs:** multi-GPU training (4–8 GPUs)  
+- **Precision:** bfloat16  
+- **Rollout engine:** vLLM  
+- **Training framework:** VERL PPO / GRPO trainer  
+- **Sampling:** multiple rollouts per prompt  
+
+---
+
+## Usage
+
+### Transformers Example
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+model_id = "DATEXIS/DeepICD-R1-7B"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    torch_dtype="auto"
+)
+
+prompt = """
+You are a clinical reasoning model.
+
+Given the following admission note,
+produce reasoning in <think> tags
+and a final ICD-10 diagnosis in <diagnosis> tags.
+
+[ADMISSION NOTE]
+"""
+
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512
+)
+
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Recommended Inference Practices
+
+- Use prompts consistent with the training format.
+- Validate predicted ICD-10 codes against official code formats.
+- Always review predictions with medical experts.
+- Avoid exposing reasoning traces in safety-critical settings without verification.
+
+---
+
+## Citation
+
+If you use this model, please cite:
+
+```bibtex
+@inproceedings{roehr2026deepicdr1,
+  title={DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation},
+  author={R{\"o}hr, Tom and Steffek, Thomas and Teucher, Roman and Bressem, Keno and others},
+  booktitle={Proceedings of LREC-COLING},
+  year={2026}
+}
+