Files
qwen3-4b-instruct-code-agent/README.md
ModelHub XC fb1edb3de5 初始化项目,由ModelHub XC社区提供模型
Model: Nanami14138/qwen3-4b-instruct-code-agent
Source: Original Platform
2026-06-02 06:20:18 +08:00

271 lines
8.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
language:
- en
- zh
datasets:
- m-a-p/Code-Feedback
metrics:
- pass@k
base_model: Qwen/Qwen3-4B-Instruct
tags:
- code
- agent
- react
- code-review
- lora
- unsloth
- qwen3
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: Qwen3-4B-CodeAgent
results:
- task:
type: text-generation
name: Code Generation
dataset:
name: HumanEval (10-problem subset)
type: openai/openai_humaneval
metrics:
- name: Pass@1
type: pass@1
value: 62.6
- name: Pass@3
type: pass@3
value: 75.61
---
# Qwen3-4B-CodeAgent
A fine-tuned code execution and Code Review agent based on Qwen3-4B-Instruct, trained to follow a structured ReAct (Plan → Execute → Reflect → Finish) workflow with XML-formatted responses.
## Model Description
This model is a LoRA fine-tuned version of [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B) designed to function as an autonomous coding agent. It generates structured XML responses that can be parsed by an orchestration framework to execute code, review results, and iteratively debug.
| Attribute | Value |
|-----------|-------|
| Base Model | Qwen3-4B-Instruct (3.6B params) |
| Architecture | Qwen3ForCausalLM, 36 layers, 2560 hidden size, GQA (32 heads / 8 KV heads) |
| Fine-tuning Method | LoRA (4-bit quantization + LoRA r=32, alpha=32) |
| Framework | Unsloth + TRL SFTTrainer |
| Training Data | m-a-p/Code-Feedback (~47K train samples) |
| Context Length | 4096 tokens |
| Precision | bfloat16 (merged weights) |
## Intended Use
This model is designed for building code agent systems that need structured, parseable output. It is suitable for:
- Automated code generation with execution feedback loops
- Code review and iterative debugging pipelines
- Tool-augmented LLM applications with sandbox execution
- Educational coding assistants
## Output Format
The model outputs XML-structured responses following a ReAct workflow:
```xml
<agent_response>
<node>Plan</node>
<next_node>Execute</next_node>
<content>
## Analysis
The task requires implementing a binary search algorithm.
## Plan
1. Define the function signature
2. Implement iterative binary search
3. Handle edge cases (empty array, target not found)
</content>
</agent_response>
```
### Node Types
| Node | Trigger | Content | Next Node |
|------|---------|---------|-----------|
| **Plan** | User sends a task | Markdown-formatted solution plan | Execute |
| **Execute** | After Plan or Reflect | `{"tool_name": "python_sandbox", "arguments": {"code": "..."}}` | Execute |
| **Reflect** | Execute fails (exit_code=1) | Root cause analysis and fix direction | Execute |
| **Finish** | Execute succeeds (exit_code=0) | Task summary | Finish |
### Standard Workflow
```
Plan → Execute → (failure → Reflect → Execute → ...) → Finish
```
## Usage
### With Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Nanami14138/qwen3-4b-instruct-code-agent"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
## 🛠️ Prompting Strategy (系统提示词策略)
本模型被设计为一个基于 ReAct 框架的智能 Code Agent为了让模型严格按照状态机Plan -> Execute -> Reflect -> Finish运行并输出结构化的 XML 格式**强烈建议在推理时使用以下 System Prompt**
system_prompt = """你是一个专业的代码执行与Code Review智能Agent遵循ReAct工作流。
## 输出格式
你的每一次回复都必须严格使用以下XML格式
<agent_response>
<node>当前节点</node>
<next_node>下一个节点</next_node>
<content>输出内容</content>
</agent_response>
## 节点定义
### Plan规划
- 触发:收到用户任务后立即进入
- <content>:分析任务需求,以 Markdown 格式输出解决方案规划
- <next_node>Execute
### Execute执行
- 触发Plan 或 Reflect 之后进入
- <content>:输出 {"tool_name": "python_sandbox", "arguments": {"code": "你的代码"}}
- <next_node>Execute等待执行结果
### Reflect反思
- 触发Execute 执行失败exit_code=1后进入
- <content>:分析失败原因,定位根因,给出修正方向
- <next_node>Execute修正后重新执行
### Finish完成
- 触发Execute 执行成功exit_code=0后进入
- <content>:输出任务总结
- <next_node>Finish
## 标准工作流
Plan → Execute → (失败 → Reflect → Execute → ...) → Finish"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "任务Write a Python function to check if a number is prime.\n\n当前状态Start"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.1, top_p=0.95)
response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```
### With Unsloth (Faster Inference)
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="your-username/qwen3-4b-code-agent",
max_seq_length=4096,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
# Then use the same message format as above
```
## Training Details
### Data
Trained on [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback), a multi-turn code conversation dataset with ~66K examples. The data was processed into three pools:
| Pool | Description | Train Samples | Ratio |
|------|-------------|---------------|-------|
| Pool A (Base SFT) | Single-turn code Q&A, plain text | 117 | 0.2% |
| Pool B (Code Review) | Multi-turn debug/review → ReAct XML format | 29,562 | 62.3% |
| Pool C (Discussion) | Multi-turn code discussion → ReAct XML format | 17,737 | 37.4% |
The system prompt is injected at training time (not stored in the data) to ensure consistent behavior.
### Hyperparameters
| Parameter | Value |
|-----------|-------|
| LoRA rank (r) | 32 |
| LoRA alpha | 32 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Learning rate | 2e-4 (cosine schedule) |
| Warmup ratio | 0.1 |
| Batch size | 4 × 4 (gradient accumulation) = 16 effective |
| Max sequence length | 4096 |
| Precision | LoRA 4-bit (training), bfloat16 (merged) |
| Optimizer | AdamW 8-bit |
| Epochs | 3 (stopped early at ~5.8% progress, step 620/8892) |
### Training Curve
| Step | Train Loss | Eval Loss |
|------|-----------|-----------|
| 20 | 1.927 | 1.905 |
| 100 | 0.649 | 0.573 |
| 200 | 0.463 | 0.454 |
| 300 | 0.412 | 0.422 |
| 400 | 0.413 | 0.409 |
| 500 | 0.374 | 0.401 |
| 600 | 0.383 | 0.397 |
Loss decreased from 1.90 to 0.40 with no signs of overfitting. The checkpoint at step 620 was merged for this release.
### Hardware
- 8× NVIDIA L20 (48GB each), single-GPU training via LoRA
## Evaluation
### HumanEval (10-problem subset)
| Metric | Score |
|--------|-------|
| Pass@1 | 62.6% |
| Pass@2 | 71.14% |
| Pass@3 | 75.61% |
| Avg tokens/problem | 215.2 |
Evaluation was conducted on a 10-problem subset of HumanEval. Full 164-problem evaluation is planned.
## Limitations
- **Early checkpoint**: This model was merged at step 620 out of 8892 total steps (~3.4% of training). Performance will likely improve with continued training.
- **English-centric data**: The training data (Code-Feedback) is predominantly in English. Chinese language coding tasks may have lower quality.
- **XML format dependency**: The model is trained to output structured XML. Without the system prompt, it may not follow the expected format.
- **No real execution**: The training data simulates tool responses; the model has not been trained with actual code execution feedback.
- **Limited code languages**: While the training data covers multiple languages, Python is heavily overrepresented.
- **Hallucination risk**: Like all LLMs, the model may generate plausible but incorrect code, especially for complex algorithms or domain-specific tasks.
## Ethical Considerations
- The model should not be used to generate malicious code or exploit vulnerabilities.
- Generated code should always be reviewed by a human before deployment in production systems.
- The model may reproduce biases present in the training data (e.g., coding style preferences, library choices).
## Citation
If you use this model, please cite the base model and training dataset:
```bibtex
@article{qwen3,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025}
}
@misc{code-feedback,
title={Code-Feedback: Multi-turn Code Conversation Dataset},
author={m-a-p},
url={https://huggingface.co/datasets/m-a-p/Code-Feedback}
}
```