初始化项目,由ModelHub XC社区提供模型
Model: SEGAgentRL/LLDS-A-GSPO-Qwen2.5-3B-Ins Source: Original Platform
This commit is contained in:
4
.gitattributes
vendored
Normal file
4
.gitattributes
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.json filter=lfs diff=lfs merge=lfs -text
|
||||
75
README.md
Normal file
75
README.md
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
metrics:
|
||||
- accuracy
|
||||
base_model:
|
||||
- Qwen/Qwen2.5-3B-Ins
|
||||
pipeline_tag: reinforcement-learning
|
||||
tags:
|
||||
- Search
|
||||
- QuestionAnswering
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
<h1 align="center">On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral</h1>
|
||||
|
||||
|
||||
<p align="center">
|
||||
📃 <a href="https://arxiv.org/abs/2512.04220" target="_blank">Paper</a> </a> |🤗 <a href="https://huggingface.co/SEGAgentRL" target="_blank">LLDS-Huggingface</a> |🐙 <a href="https://github.com/vengdeng/LLDS-On-Group-Relative-Policy-Optimization-Collapse-in-Search-R1" target="_blank">GitHub</a>
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
## ⚡ Introduction
|
||||
|
||||
**LLDS** is a lightweight likelihood-preserving regularization designed to stabilize **tool-integrated reinforcement learning** (e.g., GRPO / Search-R1 style training).
|
||||
It prevents training collapse by regularizing **only when** the likelihood of (good) action decreases, and **only on** the tokens responsible for the decrease.
|
||||
|
||||
- We identify **Lazy Likelihood Displacement (LLD)** as a key mechanism behind collapse in tool-integrated GRPO training.
|
||||
- LLDS activates **selectively**: it penalizes likelihood reduction on a *preserving set* (e.g., non-negative-advantage actions).
|
||||
- We release our **LLDS-tuned Qwen2.5-3B-INS** checkpoint for searchs-integrated reasoning and QA.
|
||||
- **A refer to action-level gate**, R refer to response-level gate, **action (A) level gate achieve the best performance**.
|
||||
|
||||
|
||||
## 🔍 Tool-Integrated Search Inference (Search-R1 style)
|
||||
|
||||
We support tool-integrated inference using the same workflow as **[Search-R1](https://github.com/PeterGriffinJin/Search-R1)**, where the LLM interacts with a local retrieval server for multi-step reasoning.
|
||||
|
||||
The pipeline consists of two parts:
|
||||
|
||||
1. Launch a local retriever server
|
||||
2. Run inference with the LLDS model
|
||||
|
||||
---
|
||||
|
||||
### 1️⃣ Launch the local retrieval server
|
||||
|
||||
Search-R1 recommends running the retriever in a separate environment.
|
||||
|
||||
```bash
|
||||
conda activate retriever
|
||||
bash retrieval_launch.sh
|
||||
```
|
||||
### 2️⃣ Run inference with LLDS-A-GSPO-Qwen2.5-3B-Ins
|
||||
|
||||
|
||||
```bash
|
||||
conda activate searchr1
|
||||
python infer.py
|
||||
|
||||
MODEL_NAME = "<YOUR_ORG>/<YOUR_MODEL_NAME>" # e.g. my-org/LLDS-A-GSPO-Qwen2.5-3B-Ins
|
||||
|
||||
question = "Your question here"
|
||||
```
|
||||
|
||||
## 📖 Citation
|
||||
```
|
||||
@article{deng2025grpo,
|
||||
title={On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral},
|
||||
author={Deng, Wenlong and Li, Yushu and Gong, Boying and Ren, Yi and Thrampoulidis, Christos and Li, Xiaoxiao},
|
||||
journal={arXiv preprint arXiv:2512.04220},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
BIN
added_tokens.json
(Stored with Git LFS)
Normal file
BIN
added_tokens.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
config.json
(Stored with Git LFS)
Normal file
BIN
config.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
generation_config.json
(Stored with Git LFS)
Normal file
BIN
generation_config.json
(Stored with Git LFS)
Normal file
Binary file not shown.
151388
merges.txt
Normal file
151388
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00003.safetensors
Normal file
3
model-00001-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5346f346ba200f8b1a005f9ff48c3b56e48d4e6a8af5f3ae0ed0a2a3f68cba05
|
||||
size 4982131536
|
||||
3
model-00002-of-00003.safetensors
Normal file
3
model-00002-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a05fe89ac0907a242dd9d0a3e5a150f218d6bf5b70ee63a4ff65024f4e968599
|
||||
size 4932949336
|
||||
3
model-00003-of-00003.safetensors
Normal file
3
model-00003-of-00003.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:7383875e850448a048e7c2def5400cd3091099d7b3e850eb240ed15f47d04aab
|
||||
size 3673383040
|
||||
BIN
model.safetensors.index.json
(Stored with Git LFS)
Normal file
BIN
model.safetensors.index.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
special_tokens_map.json
(Stored with Git LFS)
Normal file
BIN
special_tokens_map.json
(Stored with Git LFS)
Normal file
Binary file not shown.
3
tokenizer.json
Normal file
3
tokenizer.json
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
|
||||
size 11421896
|
||||
BIN
tokenizer_config.json
(Stored with Git LFS)
Normal file
BIN
tokenizer_config.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
vocab.json
(Stored with Git LFS)
Normal file
BIN
vocab.json
(Stored with Git LFS)
Normal file
Binary file not shown.
Reference in New Issue
Block a user