ModelHub XC 4fbe6ba033 初始化项目,由ModelHub XC社区提供模型
Model: tclf90/QwenLong-L1-32B-AWQ
Source: Original Platform
2026-05-25 16:32:14 +08:00

library_name, pipeline_tag, tags, base_model, base_model_relation
library_name pipeline_tag tags base_model base_model_relation
transformers text-generation
QwenLong
AWQ
量化修复
vLLM
iic/QwenLong-L1-32B
quantized

通义智文-QwenLong-L1-32B-AWQ

基础型 iic/QwenLong-L1-32B

【模型更新日期】

2025-05-28
1. 首次commit

【依赖】

vllm==0.8.5
transformers==4.51.3
### 【💡新版 VLLM 注意事项💡

1. 需使用V0推理模式

启动vllm之前先设置环境变量

export VLLM_USE_V1=0

【模型列表】

文件大小 最近更新时间
19GB 2025-05-21

【模型下载】

from modelscope import snapshot_download
snapshot_download('tclf90/QwenLong-L1-32B-AWQ', cache_dir="本地路径")

【介绍】

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning


License arXiv GitHub ModelScope HuggingFace

Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li,

Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan

Tongyi Lab, Alibaba Group


🎉 News

  • May 26, 2025: 🔥 We release 🤗 QwenLong-L1-32B, which is the first long-context LRM trained with reinforcement learniing for long-context reasoning. Experiments on seven long-context DocQA benchmarks demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, achieving performance on par with Claude-3.7-Sonnet-Thinking, demonstrating leading performance among state-of-the-art LRMs.

  • May 26, 2025: 🔥 We release 🤗 DocQA-RL-1.6K, which is a specialized RL training dataset comprising 1.6K document question answering (DocQA) problems spanning mathematical, logical, and multi-hop reasoning domains.

📚 Introduction

In this work, we propose QwenLong-L1, a novel reinforcement learning (RL) framework designed to facilitate the transition of LRMs from short-context proficiency to robust long-context generalization. In our preliminary experiments, we illustrate the differences between the training dynamics of short-context and long-context reasoning RL.


Our framework enhances short-context LRMs through progressive context scaling during RL training. The framework comprises three core components: a warm-up supervised fine-tuning (SFT) phase to initialize a robust policy, a curriculum-guided RL phase that facilitates stable adaptation from short to long contexts, and a difficulty-aware retrospective sampling mechanism that adjusts training complexity across stages to incentivize policy exploration. Leveraging recent RL algorithms, including GRPO and DAPO, our framework integrates hybrid reward functions combining rule-based and model-based binary outcome rewards to balance precision and recall. Through strategic utilization of group relative advantages during policy optimization, it guides LRMs to learn effective reasoning patterns essential for robust long-context grounding and superior reasoning capabilities.


🎯 Model Release

We release 🤗 QwenLong-L1-32B, which is the first long-context LRM trained with reinforcement learniing for long-context reasoning. Experiments on seven long-context DocQA benchmarks demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, achieving performance on par with Claude-3.7-Sonnet-Thinking, demonstrating leading performance among state-of-the-art LRMs.

Here are the evaluation results.


📝 Citation

If you find this work is relevant with your research or applications, please feel free to cite our work!

@article{wan2025qwenlongl1,
  title={QwenLong-L1: : Towards Long-Context Large Reasoning Models with Reinforcement Learning},
  author={Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li, Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan},
  journal={arXiv preprint arXiv:2505.17667},
  year={2025}
}
Description
Model synced from source: tclf90/QwenLong-L1-32B-AWQ
Readme 7.9 MiB