初始化项目，由ModelHub XC社区提供模型

Model: miaolu3/qwen3-8b-alfworld-rl-step570 Source: Original Platform
2026-05-31 17:46:12 +08:00
commit df2b17584a
16 changed files with 152347 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,32 @@
+---
+library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen3-8B
+pipeline_tag: text-generation
+tags:
+- agent
+- alfworld
+- reinforcement-learning
+- qwen3
+---
+
+# Qwen3-8B ALFWorld RL (step 570)
+
+Qwen3-8B fine-tuned with reinforcement learning on the ALFWorld text-world
+benchmark. This is a snapshot at training step 570.
+
+* **Base model:** [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
+* **Task:** ALFWorld (TextWorld variant; 6 task types, valid_seen / valid_unseen)
+* **Training step:** 570
+* **Validation success rate (valid_seen, T=0.4, 8 rollouts/game):** ~0.957
+* **Inference:** Qwen3 chat template with thinking enabled (`enable_thinking=True`).
+  The model reasons inside `<think>...</think>` and emits the chosen action
+  inside `<action>...</action>` tags, conditioned on the current observation
+  and admissible-action list.
+* **Tokenizer:** identical to `Qwen/Qwen3-8B` (vocab 151643).
+
+## Intended use
+
+Distillation source for smaller students (Qwen3-0.6B / Qwen2.5-0.5B), and
+as a strong baseline for ALFWorld policy work. Released alongside the
+training recipe at <https://github.com/MiaoLu3/verl>.