初始化项目，由ModelHub XC社区提供模型

Model: Klingspor/Qwen3-1.7B-SFT Source: Original Platform
2026-05-06 10:00:45 +08:00
commit 7babffbc29
13 changed files with 152169 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,75 @@
+---
+license: apache-2.0
+language:
+- en
+base_model:
+- Qwen/Qwen3-1.7B
+pipeline_tag: text-generation
+tags:
+- 20-questions
+- sft
+- multi-turn
+- information-seeking
+---
+# 20 Questions SFT - Qwen3-1.7B
+
+This model is a supervised fine-tuned (SFT) version of [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) for the **20 Questions** task, released as part of the paper *"Intrinsic Credit Assignment for Long Horizon Interaction"*.
+
+## Overview
+
+The model plays the role of a **Questioner** in a game of 20 Questions: it asks up to 20 yes-or-no questions to deduce a secret word (a common English noun). This SFT checkpoint serves as the initialization for reinforcement learning models (StarPO, CIA).
+
+## Training
+
+- **Base model:** [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
+- **Method:** Supervised fine-tuning on successful, filtered single-turn trajectories
+- **Training data:** 341 words from the COCA (Corpus of Contemporary American English) word list, with no overlap with the RL training or test sets
+- **Judge/Oracle:** Qwen3-14B
+
+## Intended Use
+
+This model is intended for:
+- Playing 20 Questions as a questioner agent
+- Serving as a starting checkpoint for RL-based training (e.g., StarPO, CIA)
+- Research on multi-turn interactive language agents
+
+## How to Use
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "Klingspor/Qwen3-1.7B-SFT"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+
+system_prompt = """You are the Questioner in a game of 20 Questions, and your goal is to determine the secret word.
+The secret is randomly drawn from the most frequent nouns of the English language.
+
+Ask clear, concise, and strategic yes/no questions that will help you narrow down the possibilities.
+Consider previous answers to inform your subsequent questions, and keep track of the information you gather.
+Focus on deductive reasoning, start with a broad question and refine your queries as you progress."""
+
+user_prompt = """Ask a question to gain additional information about the secret or guess what the secret is.
+
+Instructions:
+1. Ask a question that can be answered with "Yes" or "No" to help you deduce the secret word.
+2. Your answer must be a single question. Do not provide any additional commentary or reasoning.
+
+Ask your question: """
+
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": user_prompt},
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=128)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+
+## Links
+
+- **Paper:** [Intrinsic Credit Assignment for Long Horizon Interaction](https://bethgelab.github.io/delta-belief-rl/)
+- **Code:** [github.com/bethgelab/delta-belief-rl](https://github.com/bethgelab/delta-belief-rl/)
+- **Model collection:** [bethgelab/delta-belief-rl](https://huggingface.co/collections/bethgelab/delta-belief-rl)