Go to file

ModelHub XC 4fbe6ba033 初始化项目，由ModelHub XC社区提供模型

Model: tclf90/QwenLong-L1-32B-AWQ
Source: Original Platform

2026-05-25 16:32:14 +08:00

assets

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

configuration.json

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

model-00001-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

model-00002-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

model-00003-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

model-00004-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

model-00005-of-00005.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-25 16:32:14 +08:00

README.md

library_name, pipeline_tag, tags, base_model, base_model_relation

library_name

pipeline_tag

通义智文-QwenLong-L1-32B-AWQ

基础型 iic/QwenLong-L1-32B

【模型更新日期】

2025-05-28
1. 首次commit

【依赖】

vllm==0.8.5
transformers==4.51.3

### 【💡新版 VLLM 注意事项💡】

1. 需使用V0推理模式

启动vllm之前，先设置环境变量

export VLLM_USE_V1=0

【模型列表】

文件大小	最近更新时间
`19GB`	`2025-05-21`

【模型下载】

from modelscope import snapshot_download
snapshot_download('tclf90/QwenLong-L1-32B-AWQ', cache_dir="本地路径")

【介绍】

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li,

Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan

Tongyi Lab, Alibaba Group

🎉 News

May 26, 2025: 🔥 We release 🤗 QwenLong-L1-32B, which is the first long-context LRM trained with reinforcement learniing for long-context reasoning. Experiments on seven long-context DocQA benchmarks demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, achieving performance on par with Claude-3.7-Sonnet-Thinking, demonstrating leading performance among state-of-the-art LRMs.
May 26, 2025: 🔥 We release 🤗 DocQA-RL-1.6K, which is a specialized RL training dataset comprising 1.6K document question answering (DocQA) problems spanning mathematical, logical, and multi-hop reasoning domains.

📚 Introduction

In this work, we propose QwenLong-L1, a novel reinforcement learning (RL) framework designed to facilitate the transition of LRMs from short-context proficiency to robust long-context generalization. In our preliminary experiments, we illustrate the differences between the training dynamics of short-context and long-context reasoning RL.

Our framework enhances short-context LRMs through progressive context scaling during RL training. The framework comprises three core components: a warm-up supervised fine-tuning (SFT) phase to initialize a robust policy, a curriculum-guided RL phase that facilitates stable adaptation from short to long contexts, and a difficulty-aware retrospective sampling mechanism that adjusts training complexity across stages to incentivize policy exploration. Leveraging recent RL algorithms, including GRPO and DAPO, our framework integrates hybrid reward functions combining rule-based and model-based binary outcome rewards to balance precision and recall. Through strategic utilization of group relative advantages during policy optimization, it guides LRMs to learn effective reasoning patterns essential for robust long-context grounding and superior reasoning capabilities.

🎯 Model Release

We release 🤗 QwenLong-L1-32B, which is the first long-context LRM trained with reinforcement learniing for long-context reasoning. Experiments on seven long-context DocQA benchmarks demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, achieving performance on par with Claude-3.7-Sonnet-Thinking, demonstrating leading performance among state-of-the-art LRMs.

Here are the evaluation results.

📝 Citation

If you find this work is relevant with your research or applications, please feel free to cite our work!

@article{wan2025qwenlongl1,
  title={QwenLong-L1: : Towards Long-Context Large Reasoning Models with Reinforcement Learning},
  author={Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li, Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan},
  journal={arXiv preprint arXiv:2505.17667},
  year={2025}
}

README.md Unescape Escape

通义智文-QwenLong-L1-32B-AWQ

【模型更新日期】

【依赖】

1. 需使用V0推理模式

【模型列表】

【模型下载】

【介绍】

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

🎉 News

📚 Introduction

🎯 Model Release

📝 Citation

README.md