Files
qwen3-8b-alfworld-rl-step570/README.md
ModelHub XC df2b17584a 初始化项目,由ModelHub XC社区提供模型
Model: miaolu3/qwen3-8b-alfworld-rl-step570
Source: Original Platform
2026-05-31 17:46:12 +08:00

1.1 KiB

library_name, license, base_model, pipeline_tag, tags
library_name license base_model pipeline_tag tags
transformers apache-2.0 Qwen/Qwen3-8B text-generation
agent
alfworld
reinforcement-learning
qwen3

Qwen3-8B ALFWorld RL (step 570)

Qwen3-8B fine-tuned with reinforcement learning on the ALFWorld text-world benchmark. This is a snapshot at training step 570.

  • Base model: Qwen/Qwen3-8B
  • Task: ALFWorld (TextWorld variant; 6 task types, valid_seen / valid_unseen)
  • Training step: 570
  • Validation success rate (valid_seen, T=0.4, 8 rollouts/game): ~0.957
  • Inference: Qwen3 chat template with thinking enabled (enable_thinking=True). The model reasons inside <think>...</think> and emits the chosen action inside <action>...</action> tags, conditioned on the current observation and admissible-action list.
  • Tokenizer: identical to Qwen/Qwen3-8B (vocab 151643).

Intended use

Distillation source for smaller students (Qwen3-0.6B / Qwen2.5-0.5B), and as a strong baseline for ALFWorld policy work. Released alongside the training recipe at https://github.com/MiaoLu3/verl.