初始化项目，由ModelHub XC社区提供模型

Model: Nanami14138/qwen3-4b-instruct-code-agent Source: Original Platform
2026-06-02 06:20:18 +08:00
commit fb1edb3de5
16 changed files with 152592 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,270 @@
+---
+license: apache-2.0
+language:
+  - en
+  - zh
+datasets:
+  - m-a-p/Code-Feedback
+metrics:
+  - pass@k 
+base_model: Qwen/Qwen3-4B-Instruct
+tags:
+  - code
+  - agent
+  - react
+  - code-review
+  - lora
+  - unsloth
+  - qwen3
+library_name: transformers
+pipeline_tag: text-generation
+model-index:
+  - name: Qwen3-4B-CodeAgent
+    results:
+      - task:
+          type: text-generation
+          name: Code Generation
+        dataset:
+          name: HumanEval (10-problem subset)
+          type: openai/openai_humaneval
+        metrics:
+          - name: Pass@1
+            type: pass@1
+            value: 62.6
+          - name: Pass@3
+            type: pass@3
+            value: 75.61
+---
+
+# Qwen3-4B-CodeAgent
+
+A fine-tuned code execution and Code Review agent based on Qwen3-4B-Instruct, trained to follow a structured ReAct (Plan → Execute → Reflect → Finish) workflow with XML-formatted responses.
+
+## Model Description
+
+This model is a LoRA fine-tuned version of [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B) designed to function as an autonomous coding agent. It generates structured XML responses that can be parsed by an orchestration framework to execute code, review results, and iteratively debug.
+
+| Attribute | Value |
+|-----------|-------|
+| Base Model | Qwen3-4B-Instruct (3.6B params) |
+| Architecture | Qwen3ForCausalLM, 36 layers, 2560 hidden size, GQA (32 heads / 8 KV heads) |
+| Fine-tuning Method | LoRA (4-bit quantization + LoRA r=32, alpha=32) |
+| Framework | Unsloth + TRL SFTTrainer |
+| Training Data | m-a-p/Code-Feedback (~47K train samples) |
+| Context Length | 4096 tokens |
+| Precision | bfloat16 (merged weights) |
+
+## Intended Use
+
+This model is designed for building code agent systems that need structured, parseable output. It is suitable for:
+
+- Automated code generation with execution feedback loops
+- Code review and iterative debugging pipelines
+- Tool-augmented LLM applications with sandbox execution
+- Educational coding assistants
+
+## Output Format
+
+The model outputs XML-structured responses following a ReAct workflow:
+
+```xml
+<agent_response>
+  <node>Plan</node>
+  <next_node>Execute</next_node>
+  <content>
+    ## Analysis
+    The task requires implementing a binary search algorithm.
+    
+    ## Plan
+    1. Define the function signature
+    2. Implement iterative binary search
+    3. Handle edge cases (empty array, target not found)
+  </content>
+</agent_response>
+```
+
+### Node Types
+
+| Node | Trigger | Content | Next Node |
+|------|---------|---------|-----------|
+| **Plan** | User sends a task | Markdown-formatted solution plan | Execute |
+| **Execute** | After Plan or Reflect | `{"tool_name": "python_sandbox", "arguments": {"code": "..."}}` | Execute |
+| **Reflect** | Execute fails (exit_code=1) | Root cause analysis and fix direction | Execute |
+| **Finish** | Execute succeeds (exit_code=0) | Task summary | Finish |
+
+### Standard Workflow
+
+```
+Plan → Execute → (failure → Reflect → Execute → ...) → Finish
+```
+
+## Usage
+
+### With Transformers
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "Nanami14138/qwen3-4b-instruct-code-agent"
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+
+## 🛠️ Prompting Strategy (系统提示词策略)
+
+本模型被设计为一个基于 ReAct 框架的智能 Code Agent。为了让模型严格按照状态机（Plan -> Execute -> Reflect -> Finish）运行，并输出结构化的 XML 格式，**强烈建议在推理时使用以下 System Prompt**：
+
+system_prompt = """你是一个专业的代码执行与Code Review智能Agent，遵循ReAct工作流。
+
+## 输出格式
+你的每一次回复都必须严格使用以下XML格式：
+<agent_response>
+  <node>当前节点</node>
+  <next_node>下一个节点</next_node>
+  <content>输出内容</content>
+</agent_response>
+
+## 节点定义
+### Plan（规划）
+- 触发：收到用户任务后立即进入
+- <content>：分析任务需求，以 Markdown 格式输出解决方案规划
+- <next_node>：Execute
+
+### Execute（执行）
+- 触发：Plan 或 Reflect 之后进入
+- <content>：输出 {"tool_name": "python_sandbox", "arguments": {"code": "你的代码"}}
+- <next_node>：Execute（等待执行结果）
+
+### Reflect（反思）
+- 触发：Execute 执行失败（exit_code=1）后进入
+- <content>：分析失败原因，定位根因，给出修正方向
+- <next_node>：Execute（修正后重新执行）
+
+### Finish（完成）
+- 触发：Execute 执行成功（exit_code=0）后进入
+- <content>：输出任务总结
+- <next_node>：Finish
+
+## 标准工作流
+Plan → Execute → (失败 → Reflect → Execute → ...) → Finish"""
+
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": "任务：Write a Python function to check if a number is prime.\n\n当前状态：Start"}
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+
+with torch.no_grad():
+    output = model.generate(**inputs, max_new_tokens=1024, temperature=0.1, top_p=0.95)
+
+response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+
+### With Unsloth (Faster Inference)
+
+```python
+from unsloth import FastLanguageModel
+
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_name="your-username/qwen3-4b-code-agent",
+    max_seq_length=4096,
+    load_in_4bit=True,
+)
+FastLanguageModel.for_inference(model)
+# Then use the same message format as above
+```
+
+## Training Details
+
+### Data
+
+Trained on [m-a-p/Code-Feedback](https://huggingface.co/datasets/m-a-p/Code-Feedback), a multi-turn code conversation dataset with ~66K examples. The data was processed into three pools:
+
+| Pool | Description | Train Samples | Ratio |
+|------|-------------|---------------|-------|
+| Pool A (Base SFT) | Single-turn code Q&A, plain text | 117 | 0.2% |
+| Pool B (Code Review) | Multi-turn debug/review → ReAct XML format | 29,562 | 62.3% |
+| Pool C (Discussion) | Multi-turn code discussion → ReAct XML format | 17,737 | 37.4% |
+
+The system prompt is injected at training time (not stored in the data) to ensure consistent behavior.
+
+### Hyperparameters
+
+| Parameter | Value |
+|-----------|-------|
+| LoRA rank (r) | 32 |
+| LoRA alpha | 32 |
+| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+| Learning rate | 2e-4 (cosine schedule) |
+| Warmup ratio | 0.1 |
+| Batch size | 4 × 4 (gradient accumulation) = 16 effective |
+| Max sequence length | 4096 |
+| Precision | LoRA 4-bit (training), bfloat16 (merged) |
+| Optimizer | AdamW 8-bit |
+| Epochs | 3 (stopped early at ~5.8% progress, step 620/8892) |
+
+### Training Curve
+
+| Step | Train Loss | Eval Loss |
+|------|-----------|-----------|
+| 20 | 1.927 | 1.905 |
+| 100 | 0.649 | 0.573 |
+| 200 | 0.463 | 0.454 |
+| 300 | 0.412 | 0.422 |
+| 400 | 0.413 | 0.409 |
+| 500 | 0.374 | 0.401 |
+| 600 | 0.383 | 0.397 |
+
+Loss decreased from 1.90 to 0.40 with no signs of overfitting. The checkpoint at step 620 was merged for this release.
+
+### Hardware
+
+- 8× NVIDIA L20 (48GB each), single-GPU training via LoRA
+
+## Evaluation
+
+### HumanEval (10-problem subset)
+
+| Metric | Score |
+|--------|-------|
+| Pass@1 | 62.6% |
+| Pass@2 | 71.14% |
+| Pass@3 | 75.61% |
+| Avg tokens/problem | 215.2 |
+
+Evaluation was conducted on a 10-problem subset of HumanEval. Full 164-problem evaluation is planned.
+
+## Limitations
+
+- **Early checkpoint**: This model was merged at step 620 out of 8892 total steps (~3.4% of training). Performance will likely improve with continued training.
+- **English-centric data**: The training data (Code-Feedback) is predominantly in English. Chinese language coding tasks may have lower quality.
+- **XML format dependency**: The model is trained to output structured XML. Without the system prompt, it may not follow the expected format.
+- **No real execution**: The training data simulates tool responses; the model has not been trained with actual code execution feedback.
+- **Limited code languages**: While the training data covers multiple languages, Python is heavily overrepresented.
+- **Hallucination risk**: Like all LLMs, the model may generate plausible but incorrect code, especially for complex algorithms or domain-specific tasks.
+
+## Ethical Considerations
+
+- The model should not be used to generate malicious code or exploit vulnerabilities.
+- Generated code should always be reviewed by a human before deployment in production systems.
+- The model may reproduce biases present in the training data (e.g., coding style preferences, library choices).
+
+## Citation
+
+If you use this model, please cite the base model and training dataset:
+
+```bibtex
+@article{qwen3,
+  title={Qwen3 Technical Report},
+  author={Qwen Team},
+  year={2025}
+}
+
+@misc{code-feedback,
+  title={Code-Feedback: Multi-turn Code Conversation Dataset},
+  author={m-a-p},
+  url={https://huggingface.co/datasets/m-a-p/Code-Feedback}
+}
+```