初始化项目,由ModelHub XC社区提供模型
Model: bigatuna/Qwen3-0.6B-Sushi-Coder Source: Original Platform
This commit is contained in:
113
README.md
Normal file
113
README.md
Normal file
@@ -0,0 +1,113 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
base_model: Qwen/Qwen3-0.6B
|
||||
tags:
|
||||
- code
|
||||
- qwen3
|
||||
- text-generation
|
||||
- sft
|
||||
- grpo
|
||||
- trl
|
||||
language:
|
||||
- en
|
||||
pipeline_tag: text-generation
|
||||
library_name: transformers
|
||||
datasets:
|
||||
- microsoft/rStar-Coder
|
||||
- open-r1/codeforces-cots
|
||||
---
|
||||
|
||||
# Qwen3-0.6B-Sushi-Coder
|
||||
|
||||
<p align="center">
|
||||
<img src="Qwen3-0.6B-Sushi-Coder.png" alt="Qwen3-0.6B-Sushi-Coder" width="400">
|
||||
</p>
|
||||
|
||||
A 0.6B code generation model fine-tuned from Qwen3-0.6B for Python code generation.
|
||||
|
||||
## Training
|
||||
|
||||
This model was trained in two stages:
|
||||
|
||||
1. **GRPO** using TRL with reward model based on test execution and formatting
|
||||
2. **SFT** on [microsoft/rStar-Coder](https://huggingface.co/datasets/microsoft/rStar-Coder) and [open-r1/codeforces-cots](https://huggingface.co/datasets/open-r1/codeforces-cots)
|
||||
|
||||
Training was done on HuggingFace Jobs infrastructure using TRL.
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_id = "bigatuna/Qwen3-0.6B-Sushi-Coder"
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "Write a Python function to check if a number is prime."}
|
||||
]
|
||||
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
||||
|
||||
outputs = model.generate(
|
||||
**inputs,
|
||||
max_new_tokens=512,
|
||||
temperature=0.6,
|
||||
top_p=0.95,
|
||||
do_sample=True
|
||||
)
|
||||
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Recommended Parameters
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Temperature | 0.6 |
|
||||
| Top-p | 0.95 |
|
||||
| Top-k | 20 |
|
||||
| Max tokens | 2048 |
|
||||
|
||||
Note: Avoid greedy decoding (temperature=0) as it can cause repetition issues with Qwen3 models.
|
||||
|
||||
## Evaluation
|
||||
|
||||
| Model | pass@1 |
|
||||
|-------|--------|
|
||||
| Qwen/Qwen3-0.6B (base) | 20.1% |
|
||||
| **Qwen3-0.6B-Sushi-Coder** | **29.3%** |
|
||||
|
||||
Note: The model scores slightly higher with prompt tuning and max_model_len=4000, but the results above represent baseline settings.
|
||||
|
||||
The evaluation can be reproduced with the following command:
|
||||
|
||||
```bash
|
||||
lm_eval \
|
||||
--model vllm \
|
||||
--model_args pretrained=bigatuna/Qwen3-0.6B-Sushi-Coder,tensor_parallel_size=1,dtype=float16,gpu_memory_utilization=0.8,max_model_len=2048,trust_remote_code=True \
|
||||
--tasks humaneval \
|
||||
--batch_size 1 \
|
||||
--confirm_run_unsafe_code
|
||||
```
|
||||
|
||||
Baseline comparison:
|
||||
|
||||
```bash
|
||||
lm_eval \
|
||||
--model vllm \
|
||||
--model_args pretrained=Qwen/Qwen3-0.6B,tensor_parallel_size=1,dtype=float16,gpu_memory_utilization=0.8,max_model_len=2048,trust_remote_code=True \
|
||||
--tasks humaneval \
|
||||
--batch_size 1 \
|
||||
--confirm_run_unsafe_code
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- Optimized for Python; other languages may have reduced quality
|
||||
- Small model size limits complex reasoning
|
||||
- May generate plausible but incorrect code for edge cases
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0
|
||||
Reference in New Issue
Block a user