114 lines
2.9 KiB
Markdown
114 lines
2.9 KiB
Markdown
---
|
|
license: apache-2.0
|
|
base_model: Qwen/Qwen3-0.6B
|
|
tags:
|
|
- code
|
|
- qwen3
|
|
- text-generation
|
|
- sft
|
|
- grpo
|
|
- trl
|
|
language:
|
|
- en
|
|
pipeline_tag: text-generation
|
|
library_name: transformers
|
|
datasets:
|
|
- microsoft/rStar-Coder
|
|
- open-r1/codeforces-cots
|
|
---
|
|
|
|
# Qwen3-0.6B-Sushi-Coder
|
|
|
|
<p align="center">
|
|
<img src="Qwen3-0.6B-Sushi-Coder.png" alt="Qwen3-0.6B-Sushi-Coder" width="400">
|
|
</p>
|
|
|
|
A 0.6B code generation model fine-tuned from Qwen3-0.6B for Python code generation.
|
|
|
|
## Training
|
|
|
|
This model was trained in two stages:
|
|
|
|
1. **GRPO** using TRL with reward model based on test execution and formatting
|
|
2. **SFT** on [microsoft/rStar-Coder](https://huggingface.co/datasets/microsoft/rStar-Coder) and [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots)
|
|
|
|
Training was done on HuggingFace Jobs infrastructure using TRL.
|
|
|
|
## Usage
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
model_id = "bigatuna/Qwen3-0.6B-Sushi-Coder"
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
|
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
|
|
|
|
messages = [
|
|
{"role": "user", "content": "Write a Python function to check if a number is prime."}
|
|
]
|
|
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
|
|
|
outputs = model.generate(
|
|
**inputs,
|
|
max_new_tokens=512,
|
|
temperature=0.6,
|
|
top_p=0.95,
|
|
do_sample=True
|
|
)
|
|
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
|
|
```
|
|
|
|
## Recommended Parameters
|
|
|
|
| Setting | Value |
|
|
|---------|-------|
|
|
| Temperature | 0.6 |
|
|
| Top-p | 0.95 |
|
|
| Top-k | 20 |
|
|
| Max tokens | 2048 |
|
|
|
|
Note: Avoid greedy decoding (temperature=0) as it can cause repetition issues with Qwen3 models.
|
|
|
|
## Evaluation
|
|
|
|
| Model | pass@1 |
|
|
|-------|--------|
|
|
| Qwen/Qwen3-0.6B (base) | 20.1% |
|
|
| **Qwen3-0.6B-Sushi-Coder** | **29.3%** |
|
|
|
|
Note: The model scores slightly higher with prompt tuning and max_model_len=4000, but the results above represent baseline settings.
|
|
|
|
The evaluation can be reproduced with the following command:
|
|
|
|
```bash
|
|
lm_eval \
|
|
--model vllm \
|
|
--model_args pretrained=bigatuna/Qwen3-0.6B-Sushi-Coder,tensor_parallel_size=1,dtype=float16,gpu_memory_utilization=0.8,max_model_len=2048,trust_remote_code=True \
|
|
--tasks humaneval \
|
|
--batch_size 1 \
|
|
--confirm_run_unsafe_code
|
|
```
|
|
|
|
Baseline comparison:
|
|
|
|
```bash
|
|
lm_eval \
|
|
--model vllm \
|
|
--model_args pretrained=Qwen/Qwen3-0.6B,tensor_parallel_size=1,dtype=float16,gpu_memory_utilization=0.8,max_model_len=2048,trust_remote_code=True \
|
|
--tasks humaneval \
|
|
--batch_size 1 \
|
|
--confirm_run_unsafe_code
|
|
```
|
|
|
|
## Limitations
|
|
|
|
- Optimized for Python; other languages may have reduced quality
|
|
- Small model size limits complex reasoning
|
|
- May generate plausible but incorrect code for edge cases
|
|
|
|
## License
|
|
|
|
Apache 2.0
|