初始化项目,由ModelHub XC社区提供模型
Model: miaolu3/qwen3-8b-alfworld-rl-step570 Source: Original Platform
This commit is contained in:
32
README.md
Normal file
32
README.md
Normal file
@@ -0,0 +1,32 @@
|
||||
---
|
||||
library_name: transformers
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-8B
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- agent
|
||||
- alfworld
|
||||
- reinforcement-learning
|
||||
- qwen3
|
||||
---
|
||||
|
||||
# Qwen3-8B ALFWorld RL (step 570)
|
||||
|
||||
Qwen3-8B fine-tuned with reinforcement learning on the ALFWorld text-world
|
||||
benchmark. This is a snapshot at training step 570.
|
||||
|
||||
* **Base model:** [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
|
||||
* **Task:** ALFWorld (TextWorld variant; 6 task types, valid_seen / valid_unseen)
|
||||
* **Training step:** 570
|
||||
* **Validation success rate (valid_seen, T=0.4, 8 rollouts/game):** ~0.957
|
||||
* **Inference:** Qwen3 chat template with thinking enabled (`enable_thinking=True`).
|
||||
The model reasons inside `<think>...</think>` and emits the chosen action
|
||||
inside `<action>...</action>` tags, conditioned on the current observation
|
||||
and admissible-action list.
|
||||
* **Tokenizer:** identical to `Qwen/Qwen3-8B` (vocab 151643).
|
||||
|
||||
## Intended use
|
||||
|
||||
Distillation source for smaller students (Qwen3-0.6B / Qwen2.5-0.5B), and
|
||||
as a strong baseline for ALFWorld policy work. Released alongside the
|
||||
training recipe at <https://github.com/MiaoLu3/verl>.
|
||||
Reference in New Issue
Block a user