Nanami14138/qwen3-4b-instruct-code-agent

Files

ModelHub XC fb1edb3de5 初始化项目，由ModelHub XC社区提供模型

Model: Nanami14138/qwen3-4b-instruct-code-agent
Source: Original Platform

2026-06-02 06:20:18 +08:00

8.9 KiB

Raw Blame History

license, language, datasets, metrics, base_model, tags, library_name, pipeline_tag, model-index

license

language

datasets

metrics

base_model

Qwen3-4B-CodeAgent

A fine-tuned code execution and Code Review agent based on Qwen3-4B-Instruct, trained to follow a structured ReAct (Plan → Execute → Reflect → Finish) workflow with XML-formatted responses.

Model Description

This model is a LoRA fine-tuned version of Qwen3-4B-Instruct designed to function as an autonomous coding agent. It generates structured XML responses that can be parsed by an orchestration framework to execute code, review results, and iteratively debug.

Attribute	Value
Base Model	Qwen3-4B-Instruct (3.6B params)
Architecture	Qwen3ForCausalLM, 36 layers, 2560 hidden size, GQA (32 heads / 8 KV heads)
Fine-tuning Method	LoRA (4-bit quantization + LoRA r=32, alpha=32)
Framework	Unsloth + TRL SFTTrainer
Training Data	m-a-p/Code-Feedback (~47K train samples)
Context Length	4096 tokens
Precision	bfloat16 (merged weights)

Intended Use

This model is designed for building code agent systems that need structured, parseable output. It is suitable for:

Automated code generation with execution feedback loops
Code review and iterative debugging pipelines
Tool-augmented LLM applications with sandbox execution
Educational coding assistants

Output Format

The model outputs XML-structured responses following a ReAct workflow:

<agent_response>
  <node>Plan</node>
  <next_node>Execute</next_node>
  <content>
    ## Analysis
    The task requires implementing a binary search algorithm.
    
    ## Plan
    1. Define the function signature
    2. Implement iterative binary search
    3. Handle edge cases (empty array, target not found)
  </content>
</agent_response>

Node Types

Node	Trigger	Content	Next Node
Plan	User sends a task	Markdown-formatted solution plan	Execute
Execute	After Plan or Reflect	`{"tool_name": "python_sandbox", "arguments": {"code": "..."}}`	Execute
Reflect	Execute fails (exit_code=1)	Root cause analysis and fix direction	Execute
Finish	Execute succeeds (exit_code=0)	Task summary	Finish

Standard Workflow

Plan → Execute → (failure → Reflect → Execute → ...) → Finish

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Nanami14138/qwen3-4b-instruct-code-agent"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

## 🛠️ Prompting Strategy (系统提示词策略)

本模型被设计为一个基于 ReAct 框架的智能 Code Agent。为了让模型严格按照状态机（Plan -> Execute -> Reflect -> Finish）运行，并输出结构化的 XML 格式，**强烈建议在推理时使用以下 System Prompt**：

system_prompt = """你是一个专业的代码执行与Code Review智能Agent，遵循ReAct工作流。

## 输出格式
你的每一次回复都必须严格使用以下XML格式：
<agent_response>
  <node>当前节点</node>
  <next_node>下一个节点</next_node>
  <content>输出内容</content>
</agent_response>

## 节点定义
### Plan（规划）
- 触发：收到用户任务后立即进入
- <content>：分析任务需求，以 Markdown 格式输出解决方案规划
- <next_node>：Execute

### Execute（执行）
- 触发：Plan 或 Reflect 之后进入
- <content>：输出 {"tool_name": "python_sandbox", "arguments": {"code": "你的代码"}}
- <next_node>：Execute（等待执行结果）

### Reflect（反思）
- 触发：Execute 执行失败（exit_code=1）后进入
- <content>：分析失败原因，定位根因，给出修正方向
- <next_node>：Execute（修正后重新执行）

### Finish（完成）
- 触发：Execute 执行成功（exit_code=0）后进入
- <content>：输出任务总结
- <next_node>：Finish

## 标准工作流
Plan → Execute → (失败 → Reflect → Execute → ...) → Finish"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "任务：Write a Python function to check if a number is prime.\n\n当前状态：Start"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=1024, temperature=0.1, top_p=0.95)

response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

With Unsloth (Faster Inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="your-username/qwen3-4b-code-agent",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
# Then use the same message format as above

Training Details

Data

Trained on m-a-p/Code-Feedback, a multi-turn code conversation dataset with ~66K examples. The data was processed into three pools:

Pool	Description	Train Samples	Ratio
Pool A (Base SFT)	Single-turn code Q&A, plain text	117	0.2%
Pool B (Code Review)	Multi-turn debug/review → ReAct XML format	29,562	62.3%
Pool C (Discussion)	Multi-turn code discussion → ReAct XML format	17,737	37.4%

The system prompt is injected at training time (not stored in the data) to ensure consistent behavior.

Hyperparameters

Parameter	Value
LoRA rank (r)	32
LoRA alpha	32
LoRA target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate	2e-4 (cosine schedule)
Warmup ratio	0.1
Batch size	4 × 4 (gradient accumulation) = 16 effective
Max sequence length	4096
Precision	LoRA 4-bit (training), bfloat16 (merged)
Optimizer	AdamW 8-bit
Epochs	3 (stopped early at ~5.8% progress, step 620/8892)

Training Curve

Step	Train Loss	Eval Loss
20	1.927	1.905
100	0.649	0.573
200	0.463	0.454
300	0.412	0.422
400	0.413	0.409
500	0.374	0.401
600	0.383	0.397

Loss decreased from 1.90 to 0.40 with no signs of overfitting. The checkpoint at step 620 was merged for this release.

Hardware

8× NVIDIA L20 (48GB each), single-GPU training via LoRA

Evaluation

HumanEval (10-problem subset)

Metric	Score
Pass@1	62.6%
Pass@2	71.14%
Pass@3	75.61%
Avg tokens/problem	215.2

Evaluation was conducted on a 10-problem subset of HumanEval. Full 164-problem evaluation is planned.

Limitations

Early checkpoint: This model was merged at step 620 out of 8892 total steps (~3.4% of training). Performance will likely improve with continued training.
English-centric data: The training data (Code-Feedback) is predominantly in English. Chinese language coding tasks may have lower quality.
XML format dependency: The model is trained to output structured XML. Without the system prompt, it may not follow the expected format.
No real execution: The training data simulates tool responses; the model has not been trained with actual code execution feedback.
Limited code languages: While the training data covers multiple languages, Python is heavily overrepresented.
Hallucination risk: Like all LLMs, the model may generate plausible but incorrect code, especially for complex algorithms or domain-specific tasks.

Ethical Considerations

The model should not be used to generate malicious code or exploit vulnerabilities.
Generated code should always be reviewed by a human before deployment in production systems.
The model may reproduce biases present in the training data (e.g., coding style preferences, library choices).

Citation

If you use this model, please cite the base model and training dataset:

@article{qwen3,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  year={2025}
}

@misc{code-feedback,
  title={Code-Feedback: Multi-turn Code Conversation Dataset},
  author={m-a-p},
  url={https://huggingface.co/datasets/m-a-p/Code-Feedback}
}

8.9 KiB Raw Blame History Unescape Escape