[Doc][Misc] Refactor skill documentation and add Claude support instructions (#6817)

### What this PR does / why we need it?
This PR refactors the documentation for vLLM Ascend skills.
- It renames and moves the `vllm-ascend-model-adapter` skill's README to
serve as a new top-level README for the `.agents` directory.
- It adds instructions on how to use the Ascend skills with Claude,
including a new README in the `.claude` directory.
- It updates `.gitignore` to exclude skills copied for Claude's use.
- Add main2main skill

This improves the documentation structure, making it more organized and
providing clear instructions for developers using these skills with
different tools.

### Does this PR introduce _any_ user-facing change?
No, this PR contains only documentation and repository configuration
changes. It does not affect any user-facing code functionality.

### How was this patch tested?
These changes are documentation-only and do not require specific
testing. The correctness of the instructions is being verified through
this review.

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2026-02-26 14:42:59 +08:00
committed by GitHub
parent e76b69b9ef
commit c9d05d10aa
4 changed files with 317 additions and 6 deletions

View File

@@ -1,9 +1,20 @@
# vLLM Ascend Model Adapter Skill
# vLLM Ascend skills
This directory contains the skills for vLLM Ascend.
Note: Please copy the skills directory `.agents/skills` to `.claude/skills` if you want to use the skills in this repo with Claude code.
## Table of Contents
- [vLLM Ascend Model Adapter Skill](#vllm-ascend-model-adapter-skill)
- [vLLM Ascend main2main Skill](#vllm-ascend-main2main-skill)
## vLLM Ascend Model Adapter Skill
Adapt and debug models for vLLM on Ascend NPU — covering both already-supported
architectures and new models not yet registered in vLLM.
## What it does
### What it does
This skill guides an AI agent through a deterministic workflow to:
@@ -12,7 +23,7 @@ This skill guides an AI agent through a deterministic workflow to:
3. Validate via a two-stage gate (dummy fast gate + real-weight mandatory gate).
4. Deliver one signed commit with code, test config, and tutorial doc.
## File layout
### File layout
| File | Purpose |
| ---- | ------- |
@@ -23,24 +34,43 @@ This skill guides an AI agent through a deterministic workflow to:
| `references/multimodal-ep-aclgraph-lessons.md` | VL, EP, and ACLGraph patterns |
| `references/deliverables.md` | Required outputs and commit discipline |
## Quick start
### Quick start
1. Open a conversation with the AI agent inside the vllm-ascend dev container.
2. Invoke the skill (e.g. `/vllm-ascend-model-adapter`).
3. Provide the model path (default `/models/<model-name>`) and the originating issue number.
4. The agent follows the playbook in `SKILL.md` and produces a ready-to-merge commit.
## Key constraints
### Key constraints
- Never upgrade `transformers`.
- Start `vllm serve` from `/workspace` (direct command, port 8000).
- Dummy-only evidence is not sufficient — real-weight validation is mandatory.
- Final delivery is exactly one signed commit in the current repo.
## Two-stage validation
### Two-stage validation
- **Stage A (dummy)**: fast architecture / operator / API path check with `--load-format dummy`.
- **Stage B (real)**: real-weight loading, fp8/quant path, KV sharding, runtime stability.
Both stages require request-level verification (`/v1/models` + at least one chat request),
not just startup success.
## vLLM Ascend main2main Skill
Migrate changes from the main vLLM repository to the vLLM Ascend repository, ensuring compatibility and performance optimizations for Ascend NPUs.
### What it does
This skill facilitates the process of:
1. Identifying changes in the main vLLM repository.
2. Applying necessary modifications for Ascend support.
3. Validating the changes in an Ascend environment.
4. Delivering a ready-to-merge commit with optimized code and configurations.
### Quick start
1. Open a conversation with the AI agent inside the vllm-ascend dev container.
2. Invoke the skill (e.g. `/main2main`).
3. The agent follows the playbook and produces a ready-to-merge commit.

View File

@@ -0,0 +1,277 @@
---
name: main2main
description: "The main2main skill guides an AI agent to adapt the latest vLLM main branch code for vLLM Ascend project."
---
# main2main Skill
This skill guides AI agents to adapt the latest vLLM main branch code for the vLLM Ascend project.
## Workflow
### 1. Get Current vLLM Version Information for vLLM Ascend
Find the vLLM version information for the **main branch** in `docs/source/community/versioning_policy.md` under the `Release compatibility matrix` section:
- **Current adapted vLLM commit**: Format like `83b47f67b1dfad505606070ae4d9f83e50ad4ebd, v0.15.0 tag`
- **Compatible vLLM version**: From the table, e.g., `v0.15.0`
### 2. Get the Latest vLLM Code
Retrieve the latest commit from the local vLLM git repository:
```bash
# The vLLM git repository is typically located in the parent directory
cd ../vllm
git log -1 --format="%H %s"
```
If the vLLM repository is not found at the default location, prompt the user to specify the exact path to the vLLM git repository.
### 3. Compare vLLM Changes
Compare the differences between the vLLM commit currently adapted by vLLM Ascend and the latest commit:
```bash
# View file changes between two commits
git diff <old_commit> <new_commit> --name-only
# View detailed code changes
git log --oneline <old_commit>..<new_commit>
```
### 4. Analyze vLLM Changes and Generate Change Report
Create a file named `vllm_changes.md` to save the list of changes in vLLM that are relevant to vLLM Ascend. This file will be used to guide the adaptation process and should be removed after all work is done.
#### 4.1 Identify Key vLLM Source Files
Focus on vLLM source files under `vllm/vllm/` directory, especially:
```bash
# Get changed files in vLLM source code
git diff <old_commit> <new_commit> --name-only | grep -E "^vllm/" | head -200
# Count total changes
git diff <old_commit> <new_commit> --name-only | wc -l
```
#### 4.2 Categorize Changes by Priority
When analyzing changes, categorize them into the following priority levels:
| Priority | Category | Description |
|----------|----------|-------------|
| **P0** | Breaking Changes | API changes that will cause runtime errors if not adapted |
| **P1** | Important Changes | Changes that affect functionality or performance |
| **P2** | Moderate Changes | Changes that may need review for compatibility |
| **P3** | Model Changes | New models or model updates |
| **P4** | Minor Changes | Configuration, documentation, or minor refactoring |
#### 4.3 Key Areas to Focus On
When analyzing vLLM changes, pay special attention to these areas that typically require vLLM Ascend adaptation:
1. **Platform Interface** (`vllm/platforms/`)
- New abstract methods that must be implemented
- Method signature changes
- New platform features
2. **MoE (Mixture of Experts)** (`vllm/model_executor/layers/fused_moe/`)
- FusedMoE layer changes
- Activation function changes
- Router changes
3. **Attention** (`vllm/model_executor/layers/attention/`)
- Attention backend changes
- New parameters or interfaces
- MLA (Multi-Head Latent Attention) updates
4. **Speculative Decoding** (`vllm/v1/worker/gpu/spec_decode/`, `vllm/config/speculative.py`)
- Import path changes
- Config field changes
- New speculative methods
5. **Distributed** (`vllm/distributed/`)
- Parallel state changes
- KV transfer changes
- Device communicator updates
6. **Models** (`vllm/model_executor/models/`)
- New model architectures
- Model interface changes
7. **Worker/Model Runner** (`vllm/v1/worker/gpu/model_runner.py`)
- New worker methods
- Model runner changes
8. **Quantization** (`vllm/model_executor/layers/quantization/`)
- Quantization config changes
- compress-tensor method changes
#### 4.4 vllm_changes.md Template
Use the following template structure for `vllm_changes.md`:
```markdown
# vLLM Changes Relevant to vLLM Ascend
# Generated: <DATE>
# Old commit: <OLD_COMMIT_HASH> (<OLD_VERSION>)
# New commit: <NEW_COMMIT_HASH>
# Total commits: <COUNT>
================================================================================
## P0 - Breaking Changes (Must Adapt)
================================================================================
### <INDEX>. <CHANGE_TITLE>
FILE: <VLLM_FILE_PATH>
CHANGE: <DESCRIPTION_OF_CHANGE>
IMPACT: <WHAT_BREAKS_IF_NOT_ADAPTED>
VLLM_ASCEND_FILES:
- <PATH_TO_ASCEND_FILE_1>
- <PATH_TO_ASCEND_FILE_2>
================================================================================
## P1 - Important Changes (Should Adapt)
================================================================================
...
================================================================================
## P2 - Moderate Changes (Review Needed)
================================================================================
...
================================================================================
## P3 - Model Changes
================================================================================
...
================================================================================
## P4 - Configuration/Minor Changes
================================================================================
...
================================================================================
## Files/Directories Renamed
================================================================================
<LIST_OF_RENAMED_FILES>
================================================================================
## END OF CHANGES
================================================================================
```
#### 4.5 Commands to Analyze Specific Changes
```bash
# Check for breaking changes in commit messages
git log --oneline <old_commit>..<new_commit> | grep -iE "(refactor|breaking|api|rename|remove|deprecate)"
# View specific file changes
git diff <old_commit> <new_commit> -- <FILE_PATH>
# Check for renamed/moved files
git diff <old_commit> <new_commit> --name-status | grep -E "^R"
# Check platform interface changes
git diff <old_commit> <new_commit> -- vllm/platforms/
# Check MoE changes
git diff <old_commit> <new_commit> -- vllm/model_executor/layers/fused_moe/
# Check attention changes
git diff <old_commit> <new_commit> -- vllm/model_executor/layers/attention/
# Check speculative decoding changes
git diff <old_commit> <new_commit> -- vllm/v1/worker/gpu/spec_decode/ vllm/config/speculative.py
```
### 5. Adapt vLLM Ascend Project
For each related change in vLLM from the file `vllm_changes.md`, evaluate whether adaptation in vLLM Ascend is needed:
#### 5.1 Internal Architecture Changes
- Check internal interfaces of vLLM core modules (scheduler, executor, model runner, etc.)
- Update vLLM Ascend's Ascend-specific implementations (e.g., NPU worker/model runner, custom attention、custom ops)
- Preserve vLLM Ascend specific modifications (e.g., code under `vllm_ascend/`)
#### 5.2 Dependency Changes
- Check for dependency version changes in `pyproject.toml` or `setup.py`
- Update dependency declarations in vLLM Ascend
### 5. Test and Verify
- Run vLLM Ascend's CI/CD pipeline
- Verify core functionality (text generation, batching, NPU memory management)
- Ensure backward compatibility: test compatibility with older vLLM versions
## Key File Locations
| Project | Path |
|---------|------|
| vLLM Ascend version compatibility | `docs/source/community/versioning_policy.md` |
| vLLM Ascend source code | `vllm_ascend/` |
| **Core Modules** | |
| Ascend-specific attention | `vllm_ascend/attention/` |
| Ascend-specific executor | `vllm_ascend/worker/` |
| Ascend-specific ops | `vllm_ascend/ops/` |
| **Specialized Implementations** | |
| Ascend 310P specific | `vllm_ascend/_310p/` |
| EPLB load balancing | `vllm_ascend/eplb/` |
| XLite compiler | `vllm_ascend/xlite/` |
| **Compilation & Fusion** | |
| Graph fusion pass manager | `vllm_ascend/compilation/` |
| Compilation passes | `vllm_ascend/compilation/passes/` |
| **Quantization** | |
| Quantization methods | `vllm_ascend/quantization/` |
| ModelSlim integration | `vllm_ascend/quantization/methods/modelslim/` |
| **Distributed & KV Cache** | |
| KV transfer | `vllm_ascend/distributed/kv_transfer/` |
| Device communicators | `vllm_ascend/distributed/device_communicators/` |
| **Speculative Decoding** | |
| MTP proposer | `vllm_ascend/spec_decode/mtp_proposer.py` |
| Eagle proposer | `vllm_ascend/spec_decode/eagle_proposer.py` |
| **Utility Modules** | |
| Common utilities | `vllm_ascend/utils.py` |
| Ascend config | `vllm_ascend/ascend_config.py` |
| Platform detection | `vllm_ascend/platform.py` |
| Environment variables | `vllm_ascend/envs.py` |
## Important Notes
1. **Version Checking**: vLLM Ascend uses version checking to maintain compatibility with multiple vLLM versions. Preserve or update related logic when adapting.
2. **Test Verification**: After adaptation, tests must verify:
- Compatibility with the latest vLLM version
- Backward compatibility with older vLLM versions
- Ascend NPU functionality works correctly
3. **Documentation Sync**: If vLLM documentation has significant changes, update vLLM Ascend's documentation accordingly.
4. **Backward Compatibility**:
- Maintain compatibility from the version currently adapted by vLLM Ascend to the latest version
- Use version checking to handle code branches for different versions:
```python
from vllm_ascend.utils import vllm_version_is
if vllm_version_is("0.15.0"):
# Use API for v0.15.0
else:
# Use API for other versions
```
5. Do not forget to update the vLLM version is `.github` for CI files.
6. **Change Logging**: After adaptation, clearly document in the commit message:
- The range of adapted vLLM commits
- Main changes made
- Test results
7. the vLLM python code is under `vllm/vllm` folder.
## Reference
- [Versioning Policy](../../../docs/source/community/versioning_policy.md) - vLLM Ascend versioning strategy

1
.claude/README.md Normal file
View File

@@ -0,0 +1 @@
If you want to use the skills in this repo with Claude code, please copy the skills directory `.agents/skills` to this directory.

3
.gitignore vendored
View File

@@ -210,3 +210,6 @@ kernel_meta/
# generated by CANN
fusion_result.json
csrc/output/
# claude code skills
.claude/skills/*