初始化项目，由ModelHub XC社区提供模型

Model: IcyFish/Qwen3-4B-EnvTuning Source: Original Platform
2026-04-25 15:17:02 +08:00
commit 6290c9bb8f
12 changed files with 152515 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,193 @@
+---
+library_name: transformers
+license: apache-2.0
+license_link: https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507/blob/main/LICENSE
+base_model: Qwen/Qwen3-4B-Instruct-2507
+pipeline_tag: text-generation
+language:
+- en
+tags:
+- qwen3
+- text-generation
+- continued-pretraining
+- agent
+- tool-use
+---
+
+<div align="center">
+
+# Qwen3-4B-EnvTuning
+
+[![Hugging Face Model](https://img.shields.io/badge/Hugging%20Face-Model-orange)](https://huggingface.co/IcyFish/Qwen3-4B-EnvTuning)
+[![Base Model](https://img.shields.io/badge/Base%20Model-Qwen3--4B--Instruct--2507-blue)](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
+[![arXiv](https://img.shields.io/badge/arXiv-2510.10197-b31b1b.svg)](https://arxiv.org/abs/2510.10197)
+[![OpenReview](https://img.shields.io/badge/OpenReview-ICLR%202026%20Poster-8c1aff)](https://openreview.net/forum?id=nzodtGccEM)
+
+</div>
+
+## Overview
+
+`Qwen3-4B-EnvTuning` is a continued-training checkpoint built on top of [`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
+
+This model follows the training idea in the paper **Don't Just Fine-tune the Agent, Tune the Environment**, which shifts agent learning from static trajectory imitation to **environment-based exploration**. The core idea is to improve agent capability by tuning the learning environment itself, instead of relying only on fine-tuning the policy with pre-collected demonstrations.
+
+- Base model: `Qwen/Qwen3-4B-Instruct-2507`
+- Released model: [`IcyFish/Qwen3-4B-EnvTuning`](https://huggingface.co/IcyFish/Qwen3-4B-EnvTuning-Base)
+- Model type: Causal Language Model
+- Training style: continued training based on the Environment Tuning paradigm
+
+## Introduction
+
+The paper studies agent training under **extreme data scarcity**. In multi-turn tool-use settings, plain SFT on synthetic trajectories often overfits, while direct RL tends to suffer from cold-start and unstable optimization. Environment Tuning addresses this by redesigning the interaction loop between agent and environment so that exploration becomes more learnable.
+
+The method centers on three ingredients:
+
+- **Structured curriculum**: train the agent from easy skills to harder multi-turn tool-use behaviors.
+- **Actionable environment augmentation**: replace vague failures with corrective hints that reveal tool dependencies and constraints.
+- **Fine-grained progress rewards**: provide denser turn-level learning signals instead of only sparse episode-level success.
+
+The paper reports that this paradigm can train competitive agents from only a small number of problem instances, with better out-of-distribution generalization than pure SFT baselines.
+
+The original paper includes an introduction figure illustrating the difference between static SFT, standard RL, and Environment Tuning. To keep this Hugging Face repository lightweight and push-friendly, the figure is not embedded as a local binary asset here.
+
+## Training Pipeline
+
+This checkpoint is a Qwen3-4B-based release inspired by the training pipeline proposed in the paper. At a high level, the recipe consists of:
+
+1. Start from a strong instruction-tuned base model.
+2. Train with a staged curriculum rather than optimizing the full task from the beginning.
+3. Use augmented environment feedback in the middle stages to turn failed tool interactions into useful supervision.
+4. Use fine-grained progress rewards to stabilize long-horizon learning.
+5. Remove the extra environment assistance in the final stage to better match real evaluation conditions.
+
+The paper also provides a pipeline figure showing the curriculum stages, augmented feedback, and the agent learning loop. This repository keeps the README text-only for compatibility with the current Hugging Face push restrictions on binary assets.
+
+## Training Setup and Evaluation
+
+This checkpoint was **not** evaluated in the original paper. It is a follow-up model release that keeps the same training philosophy and core method, but uses a different concrete training setup.
+
+- Training data used for this checkpoint: **400 BFCL V3 training instances**
+- Evaluation setting: tested on **400 unseen BFCL V3 instances** that were not used for training
+
+| Category | Correct | Total | Accuracy |
+| --- | ---: | ---: | ---: |
+| `multi_turn_base` | 69 | 100 | 69.00% |
+| `multi_turn_long_context` | 65 | 100 | 65.00% |
+| `multi_turn_miss_func` | 64 | 100 | 64.00% |
+| `multi_turn_miss_param` | 56 | 100 | 56.00% |
+| `OVERALL` | 254 | 400 | 63.50% |
+
+These numbers should be understood as the evaluation results of **this released checkpoint**, rather than results reported in the original paper.
+
+## Model Details
+
+Unless otherwise noted, this checkpoint keeps the same underlying architecture as `Qwen3-4B-Instruct-2507`:
+
+- Architecture: `Qwen3ForCausalLM`
+- Parameters: 4.0B
+- Non-embedding parameters: 3.6B
+- Layers: 36
+- Attention heads: 32 for Q and 8 for KV
+- Native context length: 262,144
+
+For the original architecture and upstream model information, please refer to:
+
+- Qwen blog: https://qwenlm.github.io/blog/qwen3/
+- Qwen GitHub: https://github.com/QwenLM/Qwen3
+- Qwen documentation: https://qwen.readthedocs.io/en/latest/
+
+## Quick Start
+
+Use the model with the latest version of `transformers`. With `transformers<4.51.0`, you may encounter:
+
+```text
+KeyError: 'qwen3'
+```
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model_name = "IcyFish/Qwen3-4B-EnvTuning"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto",
+)
+
+messages = [
+    {"role": "user", "content": "Give me a short introduction to large language models."}
+]
+
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=16384,
+)
+
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
+content = tokenizer.decode(output_ids, skip_special_tokens=True)
+print(content)
+```
+
+For serving:
+
+```bash
+python -m sglang.launch_server \
+  --model-path IcyFish/Qwen3-4B-EnvTuning \
+  --context-length 262144
+```
+
+```bash
+vllm serve IcyFish/Qwen3-4B-EnvTuning --max-model-len 262144
+```
+
+If you encounter out-of-memory issues, consider reducing the effective context length, for example to `32768`.
+
+## Notes
+
+- This repository releases a **derived checkpoint**, not the original upstream Qwen release.
+- This checkpoint follows the **same method family** as the paper, but it is **not** one of the exact models reported in the paper's main experiments.
+- The figures from the paper are referenced conceptually in the README, but local binary image assets are intentionally omitted to keep the repository easy to publish on Hugging Face.
+- The BFCL V3 results reported above are model-specific numbers for this checkpoint and should not be confused with either upstream Qwen3 results or the original paper's reported models.
+
+## License
+
+This model is released under the same license link referenced from the upstream Qwen checkpoint:
+
+- https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507/blob/main/LICENSE
+
+Please review the upstream license terms before downstream use.
+
+## Citation
+
+If you use this model, please consider citing both the Environment Tuning paper and the original Qwen3 technical report.
+
+```bibtex
+@article{lu2025dont,
+  title={Don't Just Fine-tune the Agent, Tune the Environment},
+  author={Lu, Siyuan and Wang, Zechuan and Zhang, Hongxuan and Wu, Qintong and Gan, Leilei and Zhuang, Chenyi and Gu, Jinjie and Lin, Tao},
+  journal={arXiv preprint arXiv:2510.10197},
+  year={2025},
+  url={https://arxiv.org/abs/2510.10197}
+}
+```
+
+```bibtex
+@misc{qwen3technicalreport,
+  title={Qwen3 Technical Report},
+  author={Qwen Team},
+  year={2025},
+  eprint={2505.09388},
+  archivePrefix={arXiv},
+  primaryClass={cs.CL},
+  url={https://arxiv.org/abs/2505.09388}
+}
+```