75 lines
3.0 KiB
Markdown
75 lines
3.0 KiB
Markdown
---
|
|
license: apache-2.0
|
|
language:
|
|
- en
|
|
base_model:
|
|
- Qwen/Qwen3-1.7B
|
|
pipeline_tag: text-generation
|
|
tags:
|
|
- 20-questions
|
|
- sft
|
|
- multi-turn
|
|
- information-seeking
|
|
---
|
|
# 20 Questions SFT - Qwen3-1.7B
|
|
|
|
This model is a supervised fine-tuned (SFT) version of [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) for the **20 Questions** task, released as part of the paper *"Intrinsic Credit Assignment for Long Horizon Interaction"*.
|
|
|
|
## Overview
|
|
|
|
The model plays the role of a **Questioner** in a game of 20 Questions: it asks up to 20 yes-or-no questions to deduce a secret word (a common English noun). This SFT checkpoint serves as the initialization for reinforcement learning models (StarPO, CIA).
|
|
|
|
## Training
|
|
|
|
- **Base model:** [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
|
|
- **Method:** Supervised fine-tuning on successful, filtered single-turn trajectories
|
|
- **Training data:** 341 words from the COCA (Corpus of Contemporary American English) word list, with no overlap with the RL training or test sets
|
|
- **Judge/Oracle:** Qwen3-14B
|
|
|
|
## Intended Use
|
|
|
|
This model is intended for:
|
|
- Playing 20 Questions as a questioner agent
|
|
- Serving as a starting checkpoint for RL-based training (e.g., StarPO, CIA)
|
|
- Research on multi-turn interactive language agents
|
|
|
|
## How to Use
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
model_name = "Klingspor/Qwen3-1.7B-SFT"
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|
model = AutoModelForCausalLM.from_pretrained(model_name)
|
|
|
|
system_prompt = """You are the Questioner in a game of 20 Questions, and your goal is to determine the secret word.
|
|
The secret is randomly drawn from the most frequent nouns of the English language.
|
|
|
|
Ask clear, concise, and strategic yes/no questions that will help you narrow down the possibilities.
|
|
Consider previous answers to inform your subsequent questions, and keep track of the information you gather.
|
|
Focus on deductive reasoning, start with a broad question and refine your queries as you progress."""
|
|
|
|
user_prompt = """Ask a question to gain additional information about the secret or guess what the secret is.
|
|
|
|
Instructions:
|
|
1. Ask a question that can be answered with "Yes" or "No" to help you deduce the secret word.
|
|
2. Your answer must be a single question. Do not provide any additional commentary or reasoning.
|
|
|
|
Ask your question: """
|
|
|
|
messages = [
|
|
{"role": "system", "content": system_prompt},
|
|
{"role": "user", "content": user_prompt},
|
|
]
|
|
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
inputs = tokenizer(text, return_tensors="pt")
|
|
outputs = model.generate(**inputs, max_new_tokens=128)
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
```
|
|
|
|
## Links
|
|
|
|
- **Paper:** [Intrinsic Credit Assignment for Long Horizon Interaction](https://bethgelab.github.io/delta-belief-rl/)
|
|
- **Code:** [github.com/bethgelab/delta-belief-rl](https://github.com/bethgelab/delta-belief-rl/)
|
|
- **Model collection:** [bethgelab/delta-belief-rl](https://huggingface.co/collections/bethgelab/delta-belief-rl) |