typhoon-si-med-thinking-4b-…/README.md

---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen3-4B-Instruct-2507
pipeline_tag: text-generation
tags:
- medical
---

# 💊 Typhoon-Si-Med-Thinking-4B: Ranked-List Medical Reasoning Model

**Typhoon-Si-Med-Thinking-4B** is **Southeast Asia’s first state-of-the-art, small, and efficient medical reasoning model**, jointly developed by **Typhoon (SCB 10X)** and the **Siriraj Informatics and Data Innovation Center (SiData+) at Siriraj Hospital, Mahidol University**.

This 4-billion-parameter instructive model is trained with **reinforcement learning** to generate *ranked lists of candidate answers*, giving users breadth and multiple perspectives. Despite its lightweight footprint, it performs robustly across multiple formats—multiple choice, short answer, and ranked list reasoning.

Traditional multiple-choice (MCQ) formats constrain models to a single “best” answer, which fails to reflect the uncertainty inherent in real clinical decision-making. In contrast, Typhoon-Si-Med-Thinking-4B adopts a **ranked-list approach** that mirrors how clinicians think—evaluating several plausible possibilities before making a decision. This approach better captures diagnostic uncertainty, mitigates overreliance on potentially incorrect single outputs, and fosters safer, more collaborative reasoning between models and medical professionals.

The model achieves **state-of-the-art performance** on medical QA benchmarks—including **MedQA**, **MedMCQA**, **MedXpertQA**, and **MMLU Pro (Health)** —surpassing larger systems such as **Gemini 2.5 Pro** on list-based and short-answer tasks. Its reinforcement-learning design allows it to optimally balance correctness and diversity, setting a new benchmark for **efficient, domain-specific medical reasoning** in Southeast Asia and beyond.

For more details, see the [paper](https://arxiv.org/abs/2509.20866).


## **Performance**


![image](https://cdn-uploads.huggingface.co/production/uploads/615313b0793ef66b3324da1f/m2oUwe2yzm1RJjWNVMqQr.png)


## **Model Description**

- **Model type**: A 4B instruct decoder-only model based on Qwen3 architecture.
- **Requirement**: transformers 4.51.1 or newer.
- **Primary Language**: English 🇬🇧
- **License**: [Apache 2.0 License](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507/blob/main/LICENSE)

## Usage

This is a **reasoning-enabled clinical assistant model**, designed to output both an intermediate reasoning process and a final answer.

### Modes of Reasoning

The model supports **two reasoning modes**, which are enabled by prefixing the user query with special instruction strings:

* **`TEXT_MODE`**
  Produces a reasoning trace enclosed within `<think></think>` tags, followed by a single answer.

  Use the following prefix by prepending it to the beginning of the first user message:

  ```python
  "You are a helpful and harmless expert clinical assistant. The assistant first thinks about the reasoning process and then provides the user with an accurate answer. The reasoning process is enclosed within <think></think> tags followed by an answer, i.e., <think>reasoning process here</think> answer here. After thinking, when you finally reach a conclusion, clearly state the answer.\n\n"
  ```

* **`LIST_MODE`**
  Produces a reasoning trace enclosed within `<think></think>` tags, followed by a ranked list of possible answers in descending order of likelihood.

  Use the following prefix by prepending it to the beginning of the first user message:

  ```python
  "You are a helpful and harmless expert clinical assistant. The assistant first thinks about the reasoning process and then provides the user with an accurate answer. The reasoning process is enclosed within <think></think> tags followed by an answer, i.e., <think>reasoning process here</think> answer here. After thinking, when you finally reach a conclusion, clearly list all possible answers in order from most likely to least likely. Start with "# Final Answer" followed by numbered lines using the format `n. answer` for each answer. Each item MUST contain only the answer without any explanation or reasoning.\n\nExample:\n<think>...</think>\n\n# Final Answer\n1. xxx\n2. xxx\n\nNow the user asks you to solve a problem.\n\n"
  ```

You must prepend the prompt with either `TEXT_MODE` or `LIST_MODE` before passing it to the model to enable reasoning.

### Quirks

* When reasoning is enabled, the model may sometimes output the special token `<tool_call>` at the **beginning** of its response.
  This does not affect the reasoning or answer itself, but should be removed in post-processing.

## Usage Example

This code snippet shows how to use the Typhoon-Si-Med-Thinking-4B model for text generation using the transformers library. It includes setting up the model and tokenizer, formatting chat messages in a system-user style, and generating a response.

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

TEXT_MODE = "You are a helpful and harmless expert clinical assistant. The assistant first thinks about the reasoning process and then provides the user with an accurate answer. The reasoning process is enclosed within <think></think> tags followed by an answer, i.e., <think>reasoning process here</think> answer here. After thinking, when you finally reach a conclusion, clearly state the answer.\n\n"
LIST_MODE = """You are a helpful and harmless expert clinical assistant. The assistant first thinks about the reasoning process and then provides the user with an accurate answer. The reasoning process is enclosed within <think></think> tags followed by an answer, i.e., <think>reasoning process here</think> answer here. After thinking, when you finally reach a conclusion, clearly list all possible answers in order from most likely to least likely. Start with "# Final Answer" followed by numbered lines using the format `n. answer` for each answer. Each item MUST contain only the answer without any explanation or reasoning.

Example:
<think>...</think>

# Final Answer
1. xxx
2. xxx

Now the user asks you to solve a problem.\n\n"""

model_id = "scb10x/typhoon-si-med-thinking-4b-research-preview"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": LIST_MODE + "A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. What is the best treatment for this patient?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=4096,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    repetition_penalty=1.05
)
response = outputs[0][input_ids.shape[-1]:]
decoded = tokenizer.decode(response, skip_special_tokens=True)

# Remove <tool_call> prefix if present
if decoded.startswith("<tool_call>"):
    decoded = decoded[len("<tool_call>"):].lstrip()

print(decoded)
```

## **Intended Uses & Limitations**

This model is an instructional reasoning model and part of a research preview. It is not intended for medical use. While it incorporates some level of guardrails, it may produce answers that are inaccurate, biased, or otherwise objectionable in response to user prompts. We recommend that developers assess these risks in the context of their use case.

## **Follow us**

**https://twitter.com/opentyphoon**

## **Support**

**https://discord.gg/us5gAYmrxw**


## **Citation**

If you find this model useful, please cite it using:
```
@misc{taveekitworachai2025singleanswerenoughgenerating,
      title={Single Answer is Not Enough: On Generating Ranked Lists with Medical Reasoning Models},
      author={Pittawat Taveekitworachai and Natpatchara Pongjirapat and Krittaphas Chaisutyakorn and Piyalitt Ittichaiwong and Tossaporn Saengja and Kunat Pipatanakul},
      year={2025},
      eprint={2509.20866},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.20866},
}
```