Model: kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant-GRPO Source: Original Platform
base_model, tags, license, language
| base_model | tags | license | language | |||||
|---|---|---|---|---|---|---|---|---|
| kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant |
|
apache-2.0 |
|
Uploaded finetuned model
- Developed by: kendrickfff
- License: apache-2.0
- Finetuned from model : kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant
Qwen2.5-1.5B Indonesian Assistant (GRPO)
Training Method
- Type: Group Relative Policy Optimization (GRPO)
- Base Model: kendrickfff/Qwen2.5-1.5B-Indonesian-Assistant (SFT model)
- Steps: 100
- Reward Functions: 4 (format, reasoning length, correctness, language)
- Key Difference: Model learns to use ... reasoning tags
This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
Description
