初始化项目,由ModelHub XC社区提供模型
Model: Klingspor/Qwen3-1.7B-SFT Source: Original Platform
This commit is contained in:
75
README.md
Normal file
75
README.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
base_model:
|
||||
- Qwen/Qwen3-1.7B
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- 20-questions
|
||||
- sft
|
||||
- multi-turn
|
||||
- information-seeking
|
||||
---
|
||||
# 20 Questions SFT - Qwen3-1.7B
|
||||
|
||||
This model is a supervised fine-tuned (SFT) version of [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) for the **20 Questions** task, released as part of the paper *"Intrinsic Credit Assignment for Long Horizon Interaction"*.
|
||||
|
||||
## Overview
|
||||
|
||||
The model plays the role of a **Questioner** in a game of 20 Questions: it asks up to 20 yes-or-no questions to deduce a secret word (a common English noun). This SFT checkpoint serves as the initialization for reinforcement learning models (StarPO, CIA).
|
||||
|
||||
## Training
|
||||
|
||||
- **Base model:** [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
|
||||
- **Method:** Supervised fine-tuning on successful, filtered single-turn trajectories
|
||||
- **Training data:** 341 words from the COCA (Corpus of Contemporary American English) word list, with no overlap with the RL training or test sets
|
||||
- **Judge/Oracle:** Qwen3-14B
|
||||
|
||||
## Intended Use
|
||||
|
||||
This model is intended for:
|
||||
- Playing 20 Questions as a questioner agent
|
||||
- Serving as a starting checkpoint for RL-based training (e.g., StarPO, CIA)
|
||||
- Research on multi-turn interactive language agents
|
||||
|
||||
## How to Use
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "Klingspor/Qwen3-1.7B-SFT"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_name)
|
||||
|
||||
system_prompt = """You are the Questioner in a game of 20 Questions, and your goal is to determine the secret word.
|
||||
The secret is randomly drawn from the most frequent nouns of the English language.
|
||||
|
||||
Ask clear, concise, and strategic yes/no questions that will help you narrow down the possibilities.
|
||||
Consider previous answers to inform your subsequent questions, and keep track of the information you gather.
|
||||
Focus on deductive reasoning, start with a broad question and refine your queries as you progress."""
|
||||
|
||||
user_prompt = """Ask a question to gain additional information about the secret or guess what the secret is.
|
||||
|
||||
Instructions:
|
||||
1. Ask a question that can be answered with "Yes" or "No" to help you deduce the secret word.
|
||||
2. Your answer must be a single question. Do not provide any additional commentary or reasoning.
|
||||
|
||||
Ask your question: """
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": system_prompt},
|
||||
{"role": "user", "content": user_prompt},
|
||||
]
|
||||
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=128)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Links
|
||||
|
||||
- **Paper:** [Intrinsic Credit Assignment for Long Horizon Interaction](https://bethgelab.github.io/delta-belief-rl/)
|
||||
- **Code:** [github.com/bethgelab/delta-belief-rl](https://github.com/bethgelab/delta-belief-rl/)
|
||||
- **Model collection:** [bethgelab/delta-belief-rl](https://huggingface.co/collections/bethgelab/delta-belief-rl)
|
||||
Reference in New Issue
Block a user