55 lines
1.2 KiB
Markdown
55 lines
1.2 KiB
Markdown
---
|
|
base_model: Qwen/Qwen3-4B-Instruct-2507
|
|
datasets:
|
|
- u-10bei/sft_alfworld_trajectory_dataset_v5
|
|
language:
|
|
- en
|
|
license: apache-2.0
|
|
library_name: transformers
|
|
pipeline_tag: text-generation
|
|
tags:
|
|
- agent
|
|
- alfworld
|
|
- sft
|
|
- merged
|
|
---
|
|
|
|
# qwen3-4b-advanced-sft-v13-merged
|
|
|
|
This repository provides a **merged model** fine-tuned from Qwen/Qwen3-4B-Instruct-2507.
|
|
|
|
- Dataset: u-10bei/sft_alfworld_trajectory_dataset_v5
|
|
- Method: LoRA SFT (merged into base model)
|
|
- Max sequence length: 4096
|
|
- Epochs: 1
|
|
- Learning rate: 1e-06
|
|
- LoRA: r=32, alpha=128
|
|
|
|
## vLLM Compatibility
|
|
|
|
- LoRA adapter has been merged
|
|
- No tokenizer vocabulary modification
|
|
- Intended for AgentBench Advanced evaluation
|
|
|
|
## Usage
|
|
|
|
```python
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
|
model_id = "deepkick/qwen3-4b-advanced-sft-v13-merged"
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_id,
|
|
device_map="auto"
|
|
)
|
|
```
|
|
|
|
## Data / License Notes
|
|
|
|
- Dataset: u-10bei/sft_alfworld_trajectory_dataset_v5
|
|
Please refer to the dataset page for license and terms.
|
|
- Base model: Qwen/Qwen3-4B-Instruct-2507
|
|
Please comply with the base model's original terms of use.
|
|
|