初始化项目,由ModelHub XC社区提供模型
Model: y-ohtani/qwen3-4b-agent-sft-true Source: Original Platform
This commit is contained in:
76
README.md
Normal file
76
README.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
base_model: Qwen/Qwen3-4B-Instruct-2507
|
||||
datasets:
|
||||
- Gen-Verse/Open-AgentRL-SFT-3K
|
||||
language:
|
||||
- en
|
||||
license: apache-2.0
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- agent
|
||||
- tool-use
|
||||
- sft
|
||||
- multi-turn
|
||||
- code-interpreter
|
||||
- open-agentrl
|
||||
---
|
||||
|
||||
# Qwen3-4B-Agent-SFT-True
|
||||
|
||||
This repository contains a **full fine-tuned model** (not LoRA adapter) based on
|
||||
**Qwen3-4B-Instruct-2507**, trained with multi-turn agentic SFT using the
|
||||
[Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL) framework (verl FSDP SFT Trainer).
|
||||
|
||||
## Training Configuration
|
||||
|
||||
| Parameter | Value |
|
||||
|---|---|
|
||||
| Base model | `Qwen/Qwen3-4B-Instruct-2507` |
|
||||
| Method | Full fine-tuning (FSDP, bfloat16) |
|
||||
| Max sequence length | 32,768 |
|
||||
| Epochs | 10 |
|
||||
| Train batch size | 16 |
|
||||
| Micro batch size per GPU | 1 |
|
||||
| Truncation | right |
|
||||
| Trainer | `verl.trainer.fsdp_sft_trainer` |
|
||||
|
||||
## Dataset
|
||||
|
||||
- **Name**: [Gen-Verse/Open-AgentRL-SFT-3K](https://huggingface.co/datasets/Gen-Verse/Open-AgentRL-SFT-3K)
|
||||
- **Samples**: 3,000 multi-turn conversations
|
||||
- **Source**: Original Open-AgentRL SFT dataset (real End-to-End agentic trajectories)
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch
|
||||
|
||||
model_id = "y-ohtani/qwen3-4b-agent-sft-true"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "Solve the equation x^2 - 5x + 6 = 0 step by step."}
|
||||
]
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
||||
outputs = model.generate(**inputs, max_new_tokens=2048)
|
||||
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Sources & Terms
|
||||
|
||||
| Component | Source | License |
|
||||
|---|---|---|
|
||||
| Base model | [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | Apache-2.0 |
|
||||
| SFT dataset | [Gen-Verse/Open-AgentRL-SFT-3K](https://huggingface.co/datasets/Gen-Verse/Open-AgentRL-SFT-3K) | -- |
|
||||
| Training framework | [Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL) (verl) | Apache-2.0 |
|
||||
|
||||
Users must comply with the base model license and dataset terms.
|
||||
Reference in New Issue
Block a user