1.1 KiB
1.1 KiB
library_name, license, base_model, pipeline_tag, tags
| library_name | license | base_model | pipeline_tag | tags | ||||
|---|---|---|---|---|---|---|---|---|
| transformers | apache-2.0 | Qwen/Qwen3-8B | text-generation |
|
Qwen3-8B ALFWorld RL (step 570)
Qwen3-8B fine-tuned with reinforcement learning on the ALFWorld text-world benchmark. This is a snapshot at training step 570.
- Base model: Qwen/Qwen3-8B
- Task: ALFWorld (TextWorld variant; 6 task types, valid_seen / valid_unseen)
- Training step: 570
- Validation success rate (valid_seen, T=0.4, 8 rollouts/game): ~0.957
- Inference: Qwen3 chat template with thinking enabled (
enable_thinking=True). The model reasons inside<think>...</think>and emits the chosen action inside<action>...</action>tags, conditioned on the current observation and admissible-action list. - Tokenizer: identical to
Qwen/Qwen3-8B(vocab 151643).
Intended use
Distillation source for smaller students (Qwen3-0.6B / Qwen2.5-0.5B), and as a strong baseline for ALFWorld policy work. Released alongside the training recipe at https://github.com/MiaoLu3/verl.