Qwen3-1.7B-SFT/README.md

---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen3-1.7B
pipeline_tag: text-generation
tags:
- 20-questions
- sft
- multi-turn
- information-seeking
---
# 20 Questions SFT - Qwen3-1.7B

This model is a supervised fine-tuned (SFT) version of [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) for the **20 Questions** task, released as part of the paper *"Intrinsic Credit Assignment for Long Horizon Interaction"*.

## Overview

The model plays the role of a **Questioner** in a game of 20 Questions: it asks up to 20 yes-or-no questions to deduce a secret word (a common English noun). This SFT checkpoint serves as the initialization for reinforcement learning models (StarPO, CIA).

## Training

- **Base model:** [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
- **Method:** Supervised fine-tuning on successful, filtered single-turn trajectories
- **Training data:** 341 words from the COCA (Corpus of Contemporary American English) word list, with no overlap with the RL training or test sets
- **Judge/Oracle:** Qwen3-14B

## Intended Use

This model is intended for:
- Playing 20 Questions as a questioner agent
- Serving as a starting checkpoint for RL-based training (e.g., StarPO, CIA)
- Research on multi-turn interactive language agents

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Klingspor/Qwen3-1.7B-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

system_prompt = """You are the Questioner in a game of 20 Questions, and your goal is to determine the secret word.
The secret is randomly drawn from the most frequent nouns of the English language.

Ask clear, concise, and strategic yes/no questions that will help you narrow down the possibilities.
Consider previous answers to inform your subsequent questions, and keep track of the information you gather.
Focus on deductive reasoning, start with a broad question and refine your queries as you progress."""

user_prompt = """Ask a question to gain additional information about the secret or guess what the secret is.

Instructions:
1. Ask a question that can be answered with "Yes" or "No" to help you deduce the secret word.
2. Your answer must be a single question. Do not provide any additional commentary or reasoning.

Ask your question: """

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Links

- **Paper:** [Intrinsic Credit Assignment for Long Horizon Interaction](https://bethgelab.github.io/delta-belief-rl/)
- **Code:** [github.com/bethgelab/delta-belief-rl](https://github.com/bethgelab/delta-belief-rl/)
- **Model collection:** [bethgelab/delta-belief-rl](https://huggingface.co/collections/bethgelab/delta-belief-rl)