--- library_name: transformers license: apache-2.0 base_model: Qwen/Qwen3-8B pipeline_tag: text-generation tags: - agent - alfworld - reinforcement-learning - qwen3 --- # Qwen3-8B ALFWorld RL (step 570) Qwen3-8B fine-tuned with reinforcement learning on the ALFWorld text-world benchmark. This is a snapshot at training step 570. * **Base model:** [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) * **Task:** ALFWorld (TextWorld variant; 6 task types, valid_seen / valid_unseen) * **Training step:** 570 * **Validation success rate (valid_seen, T=0.4, 8 rollouts/game):** ~0.957 * **Inference:** Qwen3 chat template with thinking enabled (`enable_thinking=True`). The model reasons inside `...` and emits the chosen action inside `...` tags, conditioned on the current observation and admissible-action list. * **Tokenizer:** identical to `Qwen/Qwen3-8B` (vocab 151643). ## Intended use Distillation source for smaller students (Qwen3-0.6B / Qwen2.5-0.5B), and as a strong baseline for ALFWorld policy work. Released alongside the training recipe at .