add env vars & misc

update other platforms' Dockerfile
support multi npu partially
2026-02-11 06:27:58 +00:00 · 2026-01-23 03:24:25 +00:00 · 2026-01-09 04:36:39 +00:00 · 2026-01-07 07:42:30 +00:00 · 2026-01-05 20:33:31 +08:00 · 2026-01-05 11:31:07 +00:00
1511 changed files with 69835 additions and 226881 deletions
--- a/.agents/README.md
+++ b/.agents/README.md
@@ -1,115 +0,0 @@
-# vLLM Ascend skills
-
-This directory contains the skills for vLLM Ascend.
-
-Note: Please copy the skills directory `.agents/skills` to `.claude/skills` if you want to use the skills in this repo with Claude code.
-
-## Table of Contents
-
- [vLLM Ascend Model Adapter Skill](#vllm-ascend-model-adapter-skill)
- [vLLM Ascend main2main Skill](#vllm-ascend-main2main-skill)
- [vLLM Ascend Release Note Writer Skill](#vllm-ascend-release-note-writer-skill)
-
-## vLLM Ascend Model Adapter Skill
-
-Adapt and debug models for vLLM on Ascend NPU — covering both already-supported
-architectures and new models not yet registered in vLLM.
-
-### What it does
-
-This skill guides an AI agent through a deterministic workflow to:
-
-1. Triage a model checkpoint (architecture, quant type, multimodal capability).
-2. Implement minimal code changes in `/vllm-workspace/vllm` and `/vllm-workspace/vllm-ascend`.
-3. Validate via a two-stage gate (dummy fast gate + real-weight mandatory gate).
-4. Deliver one signed commit with code, test config, and tutorial doc.
-
-### File layout
-
-| File | Purpose |
-| ---- | ------- |
-| `SKILL.md` | Skill definition, constraints, and execution playbook |
-| `references/workflow-checklist.md` | Step-by-step commands and templates |
-| `references/troubleshooting.md` | Symptom-action pairs for common failures |
-| `references/fp8-on-npu-lessons.md` | FP8 checkpoint handling on Ascend |
-| `references/multimodal-ep-aclgraph-lessons.md` | VL, EP, and ACLGraph patterns |
-| `references/deliverables.md` | Required outputs and commit discipline |
-
-### Quick start
-
-1. Open a conversation with the AI agent inside the vllm-ascend dev container.
-2. Invoke the skill (e.g. `/vllm-ascend-model-adapter`).
-3. Provide the model path (default `/models/<model-name>`) and the originating issue number.
-4. The agent follows the playbook in `SKILL.md` and produces a ready-to-merge commit.
-
-### Key constraints
-
- Never upgrade `transformers`.
- Start `vllm serve` from `/workspace` (direct command, port 8000).
- Dummy-only evidence is not sufficient — real-weight validation is mandatory.
- Final delivery is exactly one signed commit in the current repo.
-
-### Two-stage validation
-
- **Stage A (dummy)**: fast architecture / operator / API path check with `--load-format dummy`.
- **Stage B (real)**: real-weight loading, fp8/quant path, KV sharding, runtime stability.
-
-Both stages require request-level verification (`/v1/models` + at least one chat request),
-not just startup success.
-
-## vLLM Ascend main2main Skill
-
-Migrate changes from the main vLLM repository to the vLLM Ascend repository, ensuring compatibility and performance optimizations for Ascend NPUs.
-
-### What it does
-
-This skill facilitates the process of:
-
-1. Identifying changes in the main vLLM repository.
-2. Applying necessary modifications for Ascend support.
-3. Validating the changes in an Ascend environment.
-4. Delivering a ready-to-merge commit with optimized code and configurations.
-
-### Quick start
-
-1. Open a conversation with the AI agent inside the vllm-ascend dev container.
-2. Invoke the skill (e.g. `/main2main`).
-3. The agent follows the playbook and produces a ready-to-merge commit.
-
-## vLLM Ascend Release Note Writer Skill
-
-You just need to say: `Please help me write a 0.13.0 release note based on commits from v0.11.0 and releases/v0.13.0`
-
-### What it does
-
-This skill guides you through a structured workflow to:
-
-1. Fetch commits between two versions using the provided script.
-2. Analyze and categorize each commit in a CSV workspace.
-3. Draft highlights and write polished release notes.
-4. Generate release notes organized by category (Features, Hardware Support, Performance, Dependencies, etc.).
-
-### File layout
-
-| File | Purpose |
-| ---- | ------- |
-| `SKILL.md` | Skill definition, workflow, and writing guidelines |
-| `references/ref-past-release-notes-highlight.md` | Style and category reference for release notes |
-| `scripts/fetch_commits-optimize.py` | Script to fetch commits between versions |
-
-### Quick start
-
-1. Open a conversation with the AI agent.
-2. Invoke the skill (e.g. `/vllm-ascend-release-note-writer`).
-3. Follow the workflow steps:
-   - Fetch commits between versions
-   - Analyze commits in CSV format
-   - Draft and edit highlights
-4. Output files are saved to `vllm-ascend-release-note/output/$version`
-
-### Key guidelines
-
- Use one-level headings (###) for sections in a specific order: Highlights, Features, Hardware and Operator Support, Performance, Dependencies, Deprecation & Breaking Changes, Documentation, Others.
- Focus on user-facing impact and include context for practical usage.
- Verify details by checking linked PRs (use GitHub API for descriptions if needed).
- Keep notes concise and avoid unnecessary technical details.
--- a/.agents/skills/main2main/SKILL.md
+++ b/.agents/skills/main2main/SKILL.md
@@ -1,277 +0,0 @@
---
-name: main2main
-description: "The main2main skill guides an AI agent to adapt the latest vLLM main branch code for vLLM Ascend project."
---
-
-# main2main Skill
-
-This skill guides AI agents to adapt the latest vLLM main branch code for the vLLM Ascend project.
-
-## Workflow
-
-### 1. Get Current vLLM Version Information for vLLM Ascend
-
-Find the vLLM version information for the **main branch** in `docs/source/community/versioning_policy.md` under the `Release compatibility matrix` section:
-
- **Current adapted vLLM commit**: Format like `83b47f67b1dfad505606070ae4d9f83e50ad4ebd, v0.15.0 tag`
- **Compatible vLLM version**: From the table, e.g., `v0.15.0`
-
-### 2. Get the Latest vLLM Code
-
-Retrieve the latest commit from the local vLLM git repository:
-
-```bash
-# The vLLM git repository is typically located in the parent directory
-cd ../vllm
-git log -1 --format="%H %s"
-```
-
-If the vLLM repository is not found at the default location, prompt the user to specify the exact path to the vLLM git repository.
-
-### 3. Compare vLLM Changes
-
-Compare the differences between the vLLM commit currently adapted by vLLM Ascend and the latest commit:
-
-```bash
-# View file changes between two commits
-git diff <old_commit> <new_commit> --name-only
-
-# View detailed code changes
-git log --oneline <old_commit>..<new_commit>
-```
-
-### 4. Analyze vLLM Changes and Generate Change Report
-
-Create a file named `vllm_changes.md` to save the list of changes in vLLM that are relevant to vLLM Ascend. This file will be used to guide the adaptation process and should be removed after all work is done.
-
-#### 4.1 Identify Key vLLM Source Files
-
-Focus on vLLM source files under `vllm/vllm/` directory, especially:
-
-```bash
-# Get changed files in vLLM source code
-git diff <old_commit> <new_commit> --name-only | grep -E "^vllm/" | head -200
-
-# Count total changes
-git diff <old_commit> <new_commit> --name-only | wc -l
-```
-
-#### 4.2 Categorize Changes by Priority
-
-When analyzing changes, categorize them into the following priority levels:
-
-| Priority | Category | Description |
-|----------|----------|-------------|
-| **P0** | Breaking Changes | API changes that will cause runtime errors if not adapted |
-| **P1** | Important Changes | Changes that affect functionality or performance |
-| **P2** | Moderate Changes | Changes that may need review for compatibility |
-| **P3** | Model Changes | New models or model updates |
-| **P4** | Minor Changes | Configuration, documentation, or minor refactoring |
-
-#### 4.3 Key Areas to Focus On
-
-When analyzing vLLM changes, pay special attention to these areas that typically require vLLM Ascend adaptation:
-
-1. **Platform Interface** (`vllm/platforms/`)
-   - New abstract methods that must be implemented
-   - Method signature changes
-   - New platform features
-
-2. **MoE (Mixture of Experts)** (`vllm/model_executor/layers/fused_moe/`)
-   - FusedMoE layer changes
-   - Activation function changes
-   - Router changes
-
-3. **Attention** (`vllm/model_executor/layers/attention/`)
-   - Attention backend changes
-   - New parameters or interfaces
-   - MLA (Multi-Head Latent Attention) updates
-
-4. **Speculative Decoding** (`vllm/v1/worker/gpu/spec_decode/`, `vllm/config/speculative.py`)
-   - Import path changes
-   - Config field changes
-   - New speculative methods
-
-5. **Distributed** (`vllm/distributed/`)
-   - Parallel state changes
-   - KV transfer changes
-   - Device communicator updates
-
-6. **Models** (`vllm/model_executor/models/`)
-   - New model architectures
-   - Model interface changes
-
-7. **Worker/Model Runner** (`vllm/v1/worker/gpu/model_runner.py`)
-   - New worker methods
-   - Model runner changes
-
-8. **Quantization** (`vllm/model_executor/layers/quantization/`)
-   - Quantization config changes
-   - compress-tensor method changes
-
-#### 4.4 vllm_changes.md Template
-
-Use the following template structure for `vllm_changes.md`:
-
-```markdown
-# vLLM Changes Relevant to vLLM Ascend
-# Generated: <DATE>
-# Old commit: <OLD_COMMIT_HASH> (<OLD_VERSION>)
-# New commit: <NEW_COMMIT_HASH>
-# Total commits: <COUNT>
-
-================================================================================
-## P0 - Breaking Changes (Must Adapt)
-================================================================================
-
-### <INDEX>. <CHANGE_TITLE>
-FILE: <VLLM_FILE_PATH>
-CHANGE: <DESCRIPTION_OF_CHANGE>
-IMPACT: <WHAT_BREAKS_IF_NOT_ADAPTED>
-VLLM_ASCEND_FILES:
-  - <PATH_TO_ASCEND_FILE_1>
-  - <PATH_TO_ASCEND_FILE_2>
-
-================================================================================
-## P1 - Important Changes (Should Adapt)
-================================================================================
-...
-
-================================================================================
-## P2 - Moderate Changes (Review Needed)
-================================================================================
-...
-
-================================================================================
-## P3 - Model Changes
-================================================================================
-...
-
-================================================================================
-## P4 - Configuration/Minor Changes
-================================================================================
-...
-
-================================================================================
-## Files/Directories Renamed
-================================================================================
-<LIST_OF_RENAMED_FILES>
-
-================================================================================
-## END OF CHANGES
-================================================================================
-```
-
-#### 4.5 Commands to Analyze Specific Changes
-
-```bash
-# Check for breaking changes in commit messages
-git log --oneline <old_commit>..<new_commit> | grep -iE "(refactor|breaking|api|rename|remove|deprecate)"
-
-# View specific file changes
-git diff <old_commit> <new_commit> -- <FILE_PATH>
-
-# Check for renamed/moved files
-git diff <old_commit> <new_commit> --name-status | grep -E "^R"
-
-# Check platform interface changes
-git diff <old_commit> <new_commit> -- vllm/platforms/
-
-# Check MoE changes
-git diff <old_commit> <new_commit> -- vllm/model_executor/layers/fused_moe/
-
-# Check attention changes
-git diff <old_commit> <new_commit> -- vllm/model_executor/layers/attention/
-
-# Check speculative decoding changes
-git diff <old_commit> <new_commit> -- vllm/v1/worker/gpu/spec_decode/ vllm/config/speculative.py
-```
-
-### 5. Adapt vLLM Ascend Project
-
-For each related change in vLLM from the file `vllm_changes.md`, evaluate whether adaptation in vLLM Ascend is needed:
-
-#### 5.1 Internal Architecture Changes
-
- Check internal interfaces of vLLM core modules (scheduler, executor, model runner, etc.)
- Update vLLM Ascend's Ascend-specific implementations (e.g., NPU worker/model runner, custom attention、custom ops)
- Preserve vLLM Ascend specific modifications (e.g., code under `vllm_ascend/`)
-
-#### 5.2 Dependency Changes
-
- Check for dependency version changes in `pyproject.toml` or `setup.py`
- Update dependency declarations in vLLM Ascend
-
-### 5. Test and Verify
-
- Run vLLM Ascend's CI/CD pipeline
- Verify core functionality (text generation, batching, NPU memory management)
- Ensure backward compatibility: test compatibility with older vLLM versions
-
-## Key File Locations
-
-| Project | Path |
-|---------|------|
-| vLLM Ascend version compatibility | `docs/source/community/versioning_policy.md` |
-| vLLM Ascend source code | `vllm_ascend/` |
-| **Core Modules** | |
-| Ascend-specific attention | `vllm_ascend/attention/` |
-| Ascend-specific executor | `vllm_ascend/worker/` |
-| Ascend-specific ops | `vllm_ascend/ops/` |
-| **Specialized Implementations** | |
-| Ascend 310P specific | `vllm_ascend/_310p/` |
-| EPLB load balancing | `vllm_ascend/eplb/` |
-| XLite compiler | `vllm_ascend/xlite/` |
-| **Compilation & Fusion** | |
-| Graph fusion pass manager | `vllm_ascend/compilation/` |
-| Compilation passes | `vllm_ascend/compilation/passes/` |
-| **Quantization** | |
-| Quantization methods | `vllm_ascend/quantization/` |
-| ModelSlim integration | `vllm_ascend/quantization/methods/modelslim/` |
-| **Distributed & KV Cache** | |
-| KV transfer | `vllm_ascend/distributed/kv_transfer/` |
-| Device communicators | `vllm_ascend/distributed/device_communicators/` |
-| **Speculative Decoding** | |
-| MTP proposer | `vllm_ascend/spec_decode/mtp_proposer.py` |
-| Eagle proposer | `vllm_ascend/spec_decode/eagle_proposer.py` |
-| **Utility Modules** | |
-| Common utilities | `vllm_ascend/utils.py` |
-| Ascend config | `vllm_ascend/ascend_config.py` |
-| Platform detection | `vllm_ascend/platform.py` |
-| Environment variables | `vllm_ascend/envs.py` |
-
-## Important Notes
-
-1. **Version Checking**: vLLM Ascend uses version checking to maintain compatibility with multiple vLLM versions. Preserve or update related logic when adapting.
-
-2. **Test Verification**: After adaptation, tests must verify:
-    - Compatibility with the latest vLLM version
-    - Backward compatibility with older vLLM versions
-    - Ascend NPU functionality works correctly
-
-3. **Documentation Sync**: If vLLM documentation has significant changes, update vLLM Ascend's documentation accordingly.
-
-4. **Backward Compatibility**:
-    - Maintain compatibility from the version currently adapted by vLLM Ascend to the latest version
-    - Use version checking to handle code branches for different versions:
-    ```python
-    from vllm_ascend.utils import vllm_version_is
-
-    if vllm_version_is("0.15.0"):
-        # Use API for v0.15.0
-    else:
-        # Use API for other versions
-    ```
-
-5. Do not forget to update the vLLM version is `.github` for CI files.
-
-6. **Change Logging**: After adaptation, clearly document in the commit message:
-   - The range of adapted vLLM commits
-   - Main changes made
-   - Test results
-
-7. the vLLM python code is under `vllm/vllm` folder.
-
-## Reference
-
- [Versioning Policy](../../../docs/source/community/versioning_policy.md) - vLLM Ascend versioning strategy
--- a/.agents/skills/vllm-ascend-model-adapter/SKILL.md
+++ b/.agents/skills/vllm-ascend-model-adapter/SKILL.md
@@ -1,140 +0,0 @@
---
-name: vllm-ascend-model-adapter
-description: "Adapt and debug existing or new models for vLLM on Ascend NPU. Implement in /vllm-workspace/vllm and /vllm-workspace/vllm-ascend, validate via direct vllm serve from /workspace, and deliver one signed commit in the current repo."
---
-
-# vLLM Ascend Model Adapter
-
-## Overview
-
-Adapt Hugging Face or local models to run on `vllm-ascend` with minimal changes, deterministic validation, and single-commit delivery. This skill is for both already-supported models and new architectures not yet registered in vLLM.
-
-## Read order
-
-1. Start with `references/workflow-checklist.md`.
-2. Read `references/multimodal-ep-aclgraph-lessons.md` (feature-first checklist).
-3. If startup/inference fails, read `references/troubleshooting.md`.
-4. If checkpoint is fp8-on-NPU, read `references/fp8-on-npu-lessons.md`.
-5. Before handoff, read `references/deliverables.md`.
-
-## Hard constraints
-
- Never upgrade `transformers`.
- Primary implementation roots are fixed by Dockerfile:
-    - `/vllm-workspace/vllm`
-    - `/vllm-workspace/vllm-ascend`
- Start `vllm serve` from `/workspace` with direct command by default.
- Default API port is `8000` unless user explicitly asks otherwise.
- Feature-first default: try best to validate ACLGraph / EP / flashcomm1 / MTP / multimodal out-of-box.
- `--enable-expert-parallel` and flashcomm1 checks are MoE-only; for non-MoE models mark as not-applicable with evidence.
- If any feature cannot be enabled, keep evidence and explain reason in final report.
- Do not rely on `PYTHONPATH=<modified-src>:$PYTHONPATH` unless debugging fallback is strictly needed.
- Keep code changes minimal and focused on the target model.
- Final deliverable commit must be one single signed commit in the current working repo (`git commit -sm ...`).
- Keep final docs in Chinese and compact.
- **Dummy-first is encouraged for speed, but dummy is NOT fully equivalent to real weights.**
- **Never sign off adaptation using dummy-only evidence; real-weight gate is mandatory.**
-
-## Execution playbook
-
-### 1) Collect context
-
- Confirm model path (default `/models/<model-name>`; if environment differs, confirm with user explicitly).
- Confirm implementation roots (`/vllm-workspace/vllm`, `/vllm-workspace/vllm-ascend`).
- Confirm delivery root (the current git repo where the final commit is expected).
- Confirm runtime import path points to `/vllm-workspace/*` install.
- Use default expected feature set: ACLGraph + EP + flashcomm1 + MTP + multimodal (if model has VL capability).
- User requirements extend this baseline, not replace it.
-
-### 2) Analyze model first
-
- Inspect `config.json`, processor files, modeling files, tokenizer files.
- Identify architecture class, attention variant, quantization type, and multimodal requirements.
- Check state-dict key prefixes (and safetensors index) to infer mapping needs.
- Decide whether support already exists in `vllm/model_executor/models/registry.py`.
-
-### 3) Choose adaptation strategy (new-model capable)
-
- Reuse existing vLLM architecture if compatible.
- If architecture is missing or incompatible, implement native support:
-    - add model adapter under `vllm/model_executor/models/`;
-    - add processor under `vllm/transformers_utils/processors/` when needed;
-    - register architecture in `vllm/model_executor/models/registry.py`;
-    - implement explicit weight loading/remap rules (including fp8 scale pairing, KV/QK norm sharding, rope variants).
- If remote code needs newer transformers symbols, do not upgrade dependency.
- If unavoidable, copy required modeling files from sibling transformers source and keep scope explicit.
- If failure is backend-specific (kernel/op/platform), patch minimal required code in `/vllm-workspace/vllm-ascend`.
-
-### 4) Implement minimal code changes (in implementation roots)
-
- Touch only files required for this model adaptation.
- Keep weight mapping explicit and auditable.
- Avoid unrelated refactors.
-
-### 5) Two-stage validation on Ascend (direct run)
-
-#### Stage A: dummy fast gate (recommended first)
-
- Run from `/workspace` with `--load-format dummy`.
- Goal: fast validate architecture path / operator path / API path.
- Do not treat `Application startup complete` as pass by itself; request smoke is mandatory.
- Require at least:
-    - startup readiness (`/v1/models` 200),
-    - one text request 200,
-    - if VL model, one text+image request 200,
-    - ACLGraph evidence where expected.
-
-#### Stage B: real-weight mandatory gate (must pass before sign-off)
-
- Remove `--load-format dummy` and validate with real checkpoint.
- Goal: validate real-only risks:
-    - weight key mapping,
-    - fp8/fp4 dequantization path,
-    - KV/QK norm sharding with real tensor shapes,
-    - load-time/runtime stability.
- Require HTTP 200 and non-empty output before declaring success.
- Do not pass Stage B on startup-only evidence.
-
-### 6) Validate inference and features
-
- Send `GET /v1/models` first.
- Send at least one OpenAI-compatible text request.
- For multimodal models, require at least one text+image request.
- Validate architecture registration and loader path with logs (no unresolved architecture, no fatal missing-key errors).
- Try feature-first validation: EP + ACLGraph path first; eager path as fallback/isolation.
- If startup succeeds but first request crashes (false-ready), treat as runtime failure and continue root-cause isolation.
- For `torch._dynamo` + `interpolate` + `NPU contiguous` failures on VL paths, try `TORCHDYNAMO_DISABLE=1` as diagnostic/stability fallback.
- For multimodal processor API mismatch (for example `skip_tensor_conversion` signature mismatch), use text-only isolation (`--limit-mm-per-prompt` set image/video/audio to 0) to separate processor issues from core weight loading issues.
- Capacity baseline by default (single machine): `max-model-len=128k` + `max-num-seqs=16`.
- Then expand concurrency (e.g., 32/64) if requested or feasible.
-
-### 7) Backport, generate artifacts, and commit in delivery repo
-
- If implementation happened in `/vllm-workspace/*`, backport minimal final diff to current working repo.
- Generate test config YAML at `tests/e2e/models/configs/<ModelName>.yaml` following the schema of existing configs (must include `model_name`, `hardware`, `tasks` with accuracy metrics, and `num_fewshot`). Use accuracy results from evaluation to populate metric values.
- Generate tutorial markdown at `docs/source/tutorials/models/<ModelName>.md` following the standard template (Introduction, Supported Features, Environment Preparation with docker tabs, Deployment with serve script, Functional Verification with curl example, Accuracy Evaluation, Performance). Fill in model-specific details: HF path, hardware requirements, TP size, max-model-len, served-model-name, sample curl, and accuracy table.
- Update `docs/source/tutorials/models/index.md` to include the new tutorial.
- Confirm test config YAML and tutorial doc are included in the staged files.
- Commit code changes once (single signed commit).
-
-### 8) Prepare handoff artifacts
-
- Write comprehensive Chinese analysis report.
- Write compact Chinese runbook for server startup and validation commands.
- Include feature status matrix (supported / unsupported / checkpoint-missing / not-applicable).
- Include dummy-vs-real validation matrix and explicit non-equivalence notes.
- Include changed-file list, key logs, and final commit hash.
- Post the SKILL.md content (or a link to it) as a comment on the originating GitHub issue to document the AI-assisted workflow.
-
-## Quality gate before final answer
-
- Service starts successfully from `/workspace` with direct command.
- OpenAI-compatible inference request succeeds (not startup-only).
- Key feature set is attempted and reported: ACLGraph / EP / flashcomm1 / MTP / multimodal.
- Capacity baseline (`128k + bs16`) result is reported, or explicit reason why not feasible.
- **Dummy stage evidence is present (if used), and real-weight stage evidence is present (mandatory).**
- Test config YAML exists at `tests/e2e/models/configs/<ModelName>.yaml` and follows the established schema (`model_name`, `hardware`, `tasks`, `num_fewshot`).
- Tutorial doc exists at `docs/source/tutorials/models/<ModelName>.md` and follows the standard template (Introduction, Supported Features, Environment Preparation, Deployment, Functional Verification, Accuracy Evaluation, Performance).
- Tutorial index at `docs/source/tutorials/models/index.md` includes the new model entry.
- Exactly one signed commit contains all code changes in current working repo.
- Final response includes commit hash, file paths, key commands, known limits, and failure reasons where applicable.
--- a/.agents/skills/vllm-ascend-model-adapter/references/deliverables.md
+++ b/.agents/skills/vllm-ascend-model-adapter/references/deliverables.md
@@ -1,47 +0,0 @@
-# Deliverables
-
-## Required outputs in current repo
-
-1. One final signed commit (`git commit -sm ...`) containing the adaptation changes.
-2. Chinese analysis report（精简但完整）:
-   - model architecture summary
-   - incompatibility root causes
-   - code changes and rationale
-   - startup and inference verification evidence
-   - feature status matrix（supported / unsupported / checkpoint-missing / not-applicable）
-   - max model len: config theoretical vs runtime practical
-   - dummy-vs-real validation matrix（what dummy proved / what only real proved）
-   - false-ready cases and final resolution path（if any）
-   - fallback ladder evidence（which fallback was tried, what changed）
-3. Chinese compact runbook:
-   - how to start server in `/workspace` (direct command, default `:8000`)
-   - how to run OpenAI-compatible validation
-   - optional eager fallback command
-   - optional `TORCHDYNAMO_DISABLE=1` fallback command (if relevant)
-4. Test config YAML at `tests/e2e/models/configs/<ModelName>.yaml` — must include `model_name`, `hardware`, `tasks` with accuracy metrics (name + value), and `num_fewshot`. Use accuracy results from evaluation to populate metric values. Follow the schema of existing configs (e.g. `Qwen3-8B.yaml`).
-5. Tutorial doc at `docs/source/tutorials/models/<ModelName>.md` — must follow the standard template: Introduction, Supported Features, Environment Preparation (with docker tabs for A2/A3), Deployment (with serve script), Functional Verification (with curl example), Accuracy Evaluation, Performance. Fill in model-specific details (HF path, hardware requirements, TP size, max-model-len, served-model-name, sample curl, accuracy table).
-6. Post SKILL.md content or AI-assisted workflow summary as a comment on the originating GitHub issue.
-
-## Commit discipline
-
- Keep one signed commit for code changes in the current working repo.
- If implementation occurred in `/vllm-workspace/*`, backport minimal final diff to current repo before commit.
- Keep diff scoped to target model adaptation.
-
-## Validation discipline
-
- Always provide log file paths for key claims.
- Keep docs synchronized with latest successful test mode (do not leave stale command variants as default).
- Final report must include pass/fail reason for each key feature attempt: ACLGraph / EP / flashcomm1 / MTP / multimodal.
- EP and flashcomm1 are MoE-only checks; for non-MoE models mark as not-applicable with evidence.
- Final report should include baseline capacity result (`128k + bs16`) or explicit reason if not feasible.
- Dummy-first can be used to speed up iterations, but real-weight gate is mandatory before final sign-off.
- Startup-only evidence is insufficient; include first-request smoke results.
-
-## Suggested final response structure
-
- What changed
- What went well / what went wrong
- Validation performed
- Commit hash and changed files
- Optional next step
--- a/.agents/skills/vllm-ascend-model-adapter/references/fp8-on-npu-lessons.md
+++ b/.agents/skills/vllm-ascend-model-adapter/references/fp8-on-npu-lessons.md
@@ -1,57 +0,0 @@
-# FP8-on-NPU Lessons
-
-## 1) Recommended debug order
-
-1. Start with `--load-format dummy` to quickly verify architecture path.
-2. Run with real weights to validate weight mapping and load-time stability.
-3. If blocked by fp8 execution limits on NPU, use fp8->bf16 dequantization loading path.
-4. Validate `/v1/models`, then one text request, then one VL request (if multimodal).
-
-## 2) FP8 checkpoint on NPU
-
-Common symptom:
-
- `fp8 quantization is currently not supported in npu`.
-
-Recommended pattern:
-
- do not force fp8 execution kernels on NPU;
- dequantize fp8 weights to bf16 during loading using paired tensors:
-    - `*.weight`
-    - `*.weight_scale_inv`
- keep strict unpaired scale/weight checks to avoid silent corruption.
-
-## 3) Typical real-only risks (dummy may not expose)
-
- missing fp8 scale keys during real shard loading;
- wrong weight remap path only triggered by real checkpoints;
- KV/QK norm sharding mismatch under TP + replicated KV heads.
-
-## 4) KV replication + TP pitfalls
-
-Typical symptom:
-
- shape mismatch like `128 vs 64` when `tp_size > num_key_value_heads`.
-
-Recommended pattern:
-
- detect KV-head replication explicitly;
- use local norm/shard loader path for replicated KV heads;
- avoid assuming uniform divisibility for all head dimensions.
-
-## 5) ACLGraph stability for fp8-origin checkpoints
-
-Recommended pattern:
-
- prefer `HCCL_OP_EXPANSION_MODE=AIV` when using graph mode;
- keep practical capture sizes and re-test from small, stable shapes;
- use `--enforce-eager` only as temporary isolation fallback.
-
-## 6) Reporting discipline
-
-Always report both:
-
- what dummy validated (fast gate), and
- what only real weights validated (mandatory gate).
-
-Do not sign off fp8-on-NPU adaptation with dummy-only evidence.
--- a/.agents/skills/vllm-ascend-model-adapter/references/multimodal-ep-aclgraph-lessons.md
+++ b/.agents/skills/vllm-ascend-model-adapter/references/multimodal-ep-aclgraph-lessons.md
@@ -1,64 +0,0 @@
-# Multimodal + EP + ACLGraph Lessons
-
-This note captures practical patterns that repeatedly matter for VL checkpoints on Ascend.
-
-## 1) Out-of-box feature expectation
-
-Try best to validate key features by default:
-
- ACLGraph
- MTP
- multimodal (if model supports VL)
- EP (MoE models only)
- flashcomm1 (MoE models only)
-
-If any feature fails, keep logs and explain the reason in the final report.
-For non-MoE models, EP/flashcomm1 should be marked not-applicable.
-
-## 2) Validate in this order
-
-1. Single text request success (`/v1/models` + `/v1/chat/completions`).
-2. Single text+image request success.
-3. Graph evidence (`Replaying aclgraph`) when graph mode is expected.
-4. Capacity baseline: `128k + bs16`.
-5. Concurrency expansion if needed (`32/64` suggested).
-
-## 3) EP + graph startup expectations
-
- Startup latency is much higher than eager due to:
-    - compile warmup
-    - graph capture rounds
-    - multimodal encoder profiling
- Do not treat slow startup as failure unless logs show hard errors.
-
-## 4) Always distinguish two max lengths
-
- **Theoretical max**: from model config (`max_position_embeddings`).
- **Practical max**: largest value that actually starts and serves on current hardware + TP/EP settings.
-
-Report both values explicitly.
-
-## 5) Multimodal testing with temporary layer reduction
-
- Reducing `num_hidden_layers` can speed smoke tests.
- This does **not** remove ViT structure itself.
- Still require one full-layer validation before final sign-off.
-
-## 6) Feature-status semantics
-
-Use four categories:
-
- ✅ supported and verified
- ❌ framework-level unsupported
- ⚠️ checkpoint missing (weights/config do not provide feature)
- N/A not-applicable (for example EP/flashcomm1 on non-MoE models)
-
-Typical examples:
-
- flashcomm1 on non-MoE VL models is often N/A or ❌ depending on framework gate.
- MTP may be ⚠️ checkpoint missing even if framework has code paths.
-
-## 7) Keep docs and defaults aligned with latest success path
-
- If EP+graph is validated and requested/expected, it should be the default runbook path.
- Eager mode should be documented as fallback/troubleshooting only.
--- a/.agents/skills/vllm-ascend-model-adapter/references/troubleshooting.md
+++ b/.agents/skills/vllm-ascend-model-adapter/references/troubleshooting.md
@@ -1,229 +0,0 @@
-# Troubleshooting
-
-## Direct run doesn't pick your code changes
-
-Symptoms:
-
- `vllm serve` behavior still old after code edits.
-
-Actions:
-
-1. Check runtime import path:
-   ```bash
-   python - <<'PY'
-   import vllm
-   print(vllm.__file__)
-   PY
-   ```
-2. Ensure edits were made under `/vllm-workspace/vllm` and/or `/vllm-workspace/vllm-ascend`.
-3. Avoid PYTHONPATH-overlay workflow unless as temporary debugging fallback.
-
-## Server fails to bind on `:8000` or fails with HCCL bind errors
-
-Symptoms:
-
- Port bind fail on startup.
- HCCL error like `Communication_Error_Bind_IP_Port(EJ0003)`.
-
-Actions:
-
-1. Kill stale `vllm serve` processes.
-2. Ensure `:8000` is free.
-3. Retry clean startup before changing code.
-
-## Startup appears "stuck" in graph mode
-
-Symptoms:
-
- Process alive, but `curl /v1/models` not ready yet.
- Logs show compile/graph capture messages for a long time.
-
-Actions:
-
-1. Keep waiting until graph capture completes.
-2. Look for `Capturing CUDA graphs ...` and `Graph capturing finished`.
-3. Only declare failure after an explicit error or timeout window.
-
-## False-ready: startup succeeds but first request crashes
-
-Symptoms:
-
- `Application startup complete` exists.
- `GET /v1/models` may return 200.
- First text or VL request crashes workers/engine.
-
-Actions:
-
-1. Always run at least one text smoke request immediately after ready.
-2. For VL models, always run one text+image smoke request as well.
-3. Treat first-request crash as runtime failure (do not mark as success).
-4. Capture first runtime error signature and branch to targeted fallback.
-
-## Architecture not recognized
-
-Symptoms:
-
- `ValueError` or log shows unresolved architecture.
-
-Actions:
-
-1. Verify `architectures` in model `config.json`.
-2. Add mapping to `vllm/model_executor/models/registry.py`.
-3. Ensure module and class names exactly match.
-
-## Remote code import fails on transformers symbols
-
-Symptoms:
-
- Missing class/function in current `transformers`.
-
-Actions:
-
-1. Do not upgrade `transformers`.
-2. Prefer native vLLM implementation.
-3. If unavoidable, copy required modeling files from sibling transformers source.
-
-## Weight loading key mismatch
-
-Symptoms:
-
- Missing/unexpected key warnings during load.
-
-Actions:
-
-1. Inspect checkpoint key prefixes.
-2. Add explicit mapping logic.
-3. Keep mapping minimal and auditable.
-4. Re-test with full shards, not only tiny-layer smoke runs.
-
-## FP8 checkpoint on Ascend A2/A3 (must dequant to bf16)
-
-Symptoms:
-
- fp8 kernels unsupported or unstable on Ascend A2/A3.
-
-Actions:
-
-1. Do not force fp8 quantization kernels on Ascend.
-2. Use load-time fp8->bf16 dequantization path (weight + scale pairing).
-3. Add strict unpaired scale/weight checks to avoid silent corruption.
-
-## QK norm mismatch (KV heads / TP / head divisibility)
-
-Symptoms:
-
- Shape mismatch like `128 vs 64` when `tp_size > num_key_value_heads`.
- Similar mismatch when head topology is not cleanly divisible.
-
-Actions:
-
-1. Detect KV-head replication case.
-2. Use local `k_norm` shard path for replicated KV heads.
-3. Avoid assumptions that all head dimensions split evenly under current TP.
-4. Validate both normal and edge topology cases explicitly.
-
-## MLA attention runtime failures after ready
-
-Symptoms:
-
- First request fails with signatures like `AtbRingMLAGetWorkspaceSize` / `AtbRingMLA`.
- May also show `aclnnFusedInferAttentionScoreV3 ... error code 561002`.
-
-Actions:
-
-1. Reproduce with one minimal text request (deterministic payload).
-2. Try eager isolation (`--enforce-eager`) once to verify whether issue is graph-only.
-3. If eager still fails, prioritize model/backend code fix path (not runtime flags only).
-4. Check `vllm-ascend` MLA/rope/platform implementation used by known-good runs.
-
-## VL + TorchDynamo interpolate contiguous failure
-
-Symptoms:
-
- `torch._dynamo.exc.TorchRuntimeError`.
- Stack contains `torch.nn.functional.interpolate`.
- Error contains `NPU contiguous operator only supported contiguous memory format`.
-
-Actions:
-
-1. Add `TORCHDYNAMO_DISABLE=1` and retry with same serve args.
-2. Validate both text and text+image after startup.
-3. If this stabilizes startup and inference, record it as current fallback path.
-4. Keep code-level fix exploration as next step, but do not block delivery if fallback is accepted.
-
-## Multimodal processor signature mismatch (`skip_tensor_conversion`)
-
-Symptoms:
-
- Early failure before engine ready.
- `convert_to_tensors() got an unexpected keyword argument 'skip_tensor_conversion'`.
-
-Actions:
-
-1. Identify processor compatibility mismatch (HF remote processor vs current transformers API).
-2. Use text-only isolation (`--limit-mm-per-prompt '{"image":0,"video":0,"audio":0}'`) only to separate layers, not as final fix.
-3. Expect potential follow-up core failures after bypassing processor path; keep logs for both layers.
-4. Align to known-good model dispatch and processor compatibility implementation.
-
-## Text-only isolation triggers meta tensor load errors
-
-Symptoms:
-
- `NotImplementedError: Cannot copy out of meta tensor; no data!`
- May occur after disabling multimodal prompt items.
-
-Actions:
-
-1. Treat as secondary failure signature (after bypassing earlier MM-processor failure).
-2. Do not assume text-only isolation is universally safe for all VL models.
-3. Return to model-specific code-fix path with captured signatures.
-
-## Config max length works on paper but not in runtime
-
-Symptoms:
-
- `max_position_embeddings` is large, but service fails or OOM with that value.
-
-Actions:
-
-1. Record config max (theoretical).
-2. Find practical max by successful startup + serving under target TP/EP setup.
-3. Report both values explicitly in docs.
-
-## flashcomm1 / MTP confusion on VL checkpoints
-
-Symptoms:
-
- flashcomm1 enabled but startup fails.
- MTP expected but no effect.
-
-Actions:
-
-1. Only validate flashcomm1 for MoE models; non-MoE mark as not-applicable.
-2. Verify MTP from both config and weight index (`mtp/nextn` keys).
-3. Mark unsupported vs checkpoint-missing clearly.
-
-## ACL graph capture fails (507903)
-
-Symptoms:
-
- `AclmdlRICaptureEnd ... 507903`
- `rtStreamEndCapture ... invalidated stream capture sequence`
-
-Actions:
-
-1. Prefer `HCCL_OP_EXPANSION_MODE=AIV` for graph capture stability.
-2. Reduce shape pressure (`--max-model-len`) and retry.
-3. Temporarily fallback `--enforce-eager` for isolation.
-
-## API reachable but output quality odd
-
-Symptoms:
-
- `/v1/models` works but output has template artifacts.
-
-Actions:
-
-1. Use deterministic request (`temperature=0`, bounded `max_tokens`).
-2. Verify endpoint (`/v1/chat/completions` vs `/v1/completions`) matches model template.
-3. Confirm non-empty output and HTTP 200 before success declaration.
--- a/.agents/skills/vllm-ascend-model-adapter/references/workflow-checklist.md
+++ b/.agents/skills/vllm-ascend-model-adapter/references/workflow-checklist.md
@@ -1,255 +0,0 @@
-# Workflow Checklist
-
-## 0) Environment prerequisites
-
-Set these once per session. Defaults match the official vllm-ascend Docker image.
-
-```bash
-# --- configurable paths (adjust if your layout differs) ---
-VLLM_SRC=/vllm-workspace/vllm              # vLLM source root
-VLLM_ASCEND_SRC=/vllm-workspace/vllm-ascend # vllm-ascend source root
-WORK_DIR=/workspace                         # directory to run vllm serve from
-MODEL_ROOT=/models                          # parent directory of model checkpoints
-```
-
-Expected environment:
-
- Hardware: Ascend A2 or A3 server
- Software: official vllm-ascend Docker image (see `./Dockerfile` for full contents)
- TP=16 typical for A3 (16-NPU), TP=8 typical for A2 (8-NPU)
-
-## 1) Fast triage commands
-
-```bash
-MODEL_PATH=${MODEL_ROOT}/<model-name>
-echo "MODEL_PATH=$MODEL_PATH"
-
-# model inventory
-ls -la "$MODEL_PATH"
-
-# architecture + quant hints
-rg -n "architectures|model_type|quantization_config|torch_dtype|max_position_embeddings|num_nextn_predict_layers|version|num_attention_heads|num_key_value_heads|num_experts" "$MODEL_PATH/config.json"
-
-# state-dict key layout hints (if index exists)
-ls -la "$MODEL_PATH"/*index*.json 2>/dev/null || true
-
-# model custom code (if exists)
-ls -la "$MODEL_PATH"/*.py 2>/dev/null || true
-```
-
-## 2) Confirm implementation and delivery roots
-
-```bash
-# implementation roots (fixed by Dockerfile)
-cd "$VLLM_SRC" && git status -s
-cd "$VLLM_ASCEND_SRC" && git status -s
-
-# runtime import source check (expect vllm-workspace path)
-python - <<'PY'
-import vllm
-print(vllm.__file__)
-PY
-
-# direct-run working directory
-cd "$WORK_DIR" && pwd
-
-# delivery root (current repo)
-cd <current-repo>
-git status -s
-```
-
-## 3) Session hygiene (before rerun)
-
-```bash
-# stop stale servers
-pkill -f "vllm serve|api_server|EngineCore" || true
-
-# confirm port 8000 is free
-netstat -ltnp 2>/dev/null | rg ':8000' || true
-```
-
-When user explicitly requests reset:
-
-```bash
-cd "$VLLM_SRC" && git reset --hard && git clean -fd
-cd "$VLLM_ASCEND_SRC" && git reset --hard && git clean -fd
-```
-
-## 4) New model onboarding checklist
-
-```bash
-# architecture mapping check in vLLM
-rg -n "<ArchitectureClass>|registry" "$VLLM_SRC"/vllm/model_executor/models/registry.py
-
-# optional: inspect model config and weight index quickly
-cat "$MODEL_PATH/config.json"
-cat "$MODEL_PATH"/*index*.json 2>/dev/null || true
-```
-
-If architecture is missing/incompatible, minimally do:
-
-1. Add model adapter under `$VLLM_SRC/vllm/model_executor/models/<new_model>.py`.
-2. Add processor under `$VLLM_SRC/vllm/transformers_utils/processors/<new_model>.py` when needed.
-3. Register architecture in `$VLLM_SRC/vllm/model_executor/models/registry.py`.
-4. Add explicit loader/remap rules for checkpoint key patterns (qkv/norm/rope/fp8 scales).
-5. Touch `$VLLM_ASCEND_SRC` only when backend-specific errors are confirmed.
-
-## 5) Typical implementation touch points
-
- `$VLLM_SRC/vllm/model_executor/models/<new_model>.py`
- `$VLLM_SRC/vllm/transformers_utils/processors/<new_model>.py`
- `$VLLM_SRC/vllm/model_executor/models/registry.py`
- `$VLLM_ASCEND_SRC/vllm_ascend/...` (only if backend behavior requires it)
-
-## 6) Syntax sanity checks
-
-```bash
-python -m py_compile \
-  "$VLLM_SRC"/vllm/model_executor/models/<new_model>.py
-
-python -m py_compile \
-  "$VLLM_SRC"/vllm/transformers_utils/processors/<new_model>.py 2>/dev/null || true
-```
-
-## 7) Two-stage serve templates (direct run, default `:8000`)
-
-### Stage A: dummy fast gate (first try)
-
-```bash
-cd "$WORK_DIR"
-MODEL_PATH=${MODEL_ROOT}/<model-name>
-
-HCCL_OP_EXPANSION_MODE=AIV \
-VLLM_ASCEND_ENABLE_FLASHCOMM1=0 \
-vllm serve "$MODEL_PATH" \
-  --served-model-name <served-name> \
-  --trust-remote-code \
-  --dtype bfloat16 \
-  --max-model-len <practical-max-len-or-131072> \
-  --tensor-parallel-size <TP-size> \
-  --max-num-seqs 16 \
-  --load-format dummy \
-  --port 8000
-```
-
-### Stage B: real-weight mandatory gate
-
-```bash
-# remove this from Stage A:
--load-format dummy
-```
-
-> Note: dummy is not equivalent to real weights. Real gate is mandatory before sign-off.
-
-### EP + ACLGraph (feature-first, MoE only)
-
-```bash
-# add to Stage B when model is MoE and validating EP:
--enable-expert-parallel
-```
-
-### flashcomm1 check (MoE only)
-
-```bash
-# only evaluate flashcomm1 when model is MoE
-VLLM_ASCEND_ENABLE_FLASHCOMM1=1
-```
-
-### Eager fallback (isolation)
-
-```bash
-# add to command for isolation only:
--enforce-eager
-```
-
-### TorchDynamo fallback (for VL interpolate-contiguous failures)
-
-```bash
-# add env var when logs contain:
-# torch._dynamo.exc.TorchRuntimeError + interpolate +
-# "NPU contiguous operator only supported contiguous memory format"
-TORCHDYNAMO_DISABLE=1
-```
-
-## 8) Readiness + smoke checks (must verify true-ready)
-
-```bash
-# readiness
-for i in $(seq 1 200); do
-  curl -sf http://127.0.0.1:8000/v1/models >/tmp/models.json && break
-  sleep 3
-done
-
-# text smoke (required)
-curl -s http://127.0.0.1:8000/v1/chat/completions \
-  -H 'Content-Type: application/json' \
-  -d '{"model":"<served-name>","messages":[{"role":"user","content":"say hi"}],"temperature":0,"max_tokens":16}'
-
-# VL smoke (required for multimodal models)
-# send one text+image OpenAI-compatible request and require non-empty choices.
-```
-
-> `Application startup complete` alone is not success. If first request crashes, treat as runtime failure (false-ready).
-
-## 9) Feature validation checklist (default out-of-box)
-
-1. `GET /v1/models` returns 200.
-2. Text request returns 200 and non-empty output.
-3. If VL model: text+image request returns 200.
-4. ACLGraph evidence exists (`Replaying aclgraph`) where expected.
-5. EP path is validated only for MoE models; non-MoE must be marked not-applicable.
-6. flashcomm1 is validated only for MoE models; non-MoE must be marked not-applicable.
-7. MTP status verified from config + weight index (enabled vs checkpoint-missing).
-8. Dummy-vs-real differences are explicitly reported (if any).
-9. Any false-ready case is explicitly marked as failure (with log signature).
-
-## 10) Fallback ladder (recommended order)
-
-1. Keep same params and reproduce once to ensure deterministic failure signature.
-2. Add `--enforce-eager` to isolate graph-capture influence.
-3. For VL + dynamo/interpolate/contiguous failures, add `TORCHDYNAMO_DISABLE=1`.
-4. For multimodal-processor suspicion, isolate text-only by:
-   - `--limit-mm-per-prompt '{"image":0,"video":0,"audio":0}'`
-   - then check whether failure moves from processor layer to model core.
-5. If issue persists, map failure signature to known-good implementation and patch minimal code.
-
-## 11) Capacity baseline + sweep
-
- Baseline (single machine): **`max-model-len=128k` + `max-num-seqs=16`**.
- If baseline passes, expand to `max-num-seqs=32/64` when requested.
- If baseline cannot pass due hardware/runtime limits, report explicit root cause.
-
-## 12) Delivery checklist
-
-```bash
-# in current working repo (delivery root)
-git add <changed-files>
-git commit -sm "<message>"
-```
-
-Confirm:
-
- one signed commit only
- Chinese analysis + Chinese runbook present
- feature status matrix included with pass/fail reason
- dummy stage and real stage validation evidence included
- false-ready cases (if any) documented with final fallback status
-
-### Test config generation
-
- Generate `tests/e2e/models/configs/<ModelName>.yaml` using accuracy results from evaluation.
- Must include: `model_name` (HF path), `hardware` (e.g. "Atlas A2 Series"), `tasks` (list with `name` and `metrics` containing `name` + `value`), `num_fewshot`.
- Follow the schema of existing configs (e.g. `Qwen3-8B.yaml`).
-
-### Tutorial doc generation
-
- Generate `docs/source/tutorials/models/<ModelName>.md` from the standard template.
- Fill in model-specific details: HF path, hardware requirements, TP size, max-model-len, served-model-name, sample curl request, accuracy table.
- Must include sections: Introduction, Supported Features, Environment Preparation (with docker tabs for A2/A3), Deployment (with serve script), Functional Verification (with curl example), Accuracy Evaluation, Performance.
- Update `docs/source/tutorials/models/index.md` to include the new tutorial entry.
-
-### GitHub issue comment
-
- Post SKILL.md content or AI-assisted workflow summary as a comment on the originating GitHub issue.
-
-Confirm both test config YAML and tutorial doc are included in the signed commit.
--- a/.agents/skills/vllm-ascend-release-note-writer/SKILL.md
+++ b/.agents/skills/vllm-ascend-release-note-writer/SKILL.md
@@ -1,79 +0,0 @@
---
-name: vLLM Ascend Release Note Writer
-description: You are a release note writer for vLLM Ascend project (vllm-project/vllm-ascend). You are responsible for writing release notes for vLLM Ascend.
---
-
-# vLLM Ascend release Note Writer Skill
-
-## Overview
-
-You should use the `ref-past-release-notes-highlight.md` as style and category reference. Always read these first.
-
-## When to use this skill
-
-When a new version of vLLM Ascend is released, you should use this skill to write the release notes.
-
-## How to use it
-
-0. all output files should be saved under `vllm-ascend-release-note/output/$version` folder
-
-1. Use the `fetch_commits-optimize.py` script to fetch the commits between the previous and current version.
-
-```bash
-uv run python fetch_commits-optimize.py --base-tag $LAST_TAG --head-tag $NEW_TAG --output 0-current-raw-commits.md
-```
-
-`0-current-raw-commits.md` is your raw data input.
-
-2. Use the `commit-analysis-draft.csv` tool to analyze the commits and put them into the correct section.
-`1-commit-analysis-draft.csv` is your workspace for commit by commit analysis for which commit goes into which section, whether can be ignored, and why. You can create auxilariy files in `tmp` folder.
-    * You should check each commit. They are put into rows in the CSV file.
-    * The CSV should have headers `title`, `pr number`, `user facing impact/summary`, `category`, `decision`, `reason`. Please brainstorm other fields as you see fit.
-
-3. Draft the highlights note, and save it to `2-highlights-note-draft.md`.
-4. Edit the draft highlights note in `2-highlights-note-draft.md`, and save it to `3-highlights-note-edit.md`. You should double and triple check with the raw commits + analysis. You can leave any uncertainty and doubts in the file, and we will discuss them together.
-5. Use the format `This is the $NUMBER release candidate of $VERSION for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.`.
-
-## Writing style
-
-1. To keep simple, you should only save one level of headings, starting with ###, which may include the following categories follow below order:
-
-### Highlights
-
-### Features
-
-### Hardware and Operator Support
-
-### Performance
-
-### Dependencies
-
-### Deprecation & Breaking Changes
-
-### Documentation
-
-### Others
-
-2. Additional Inclusion Criteria
-
-* User experience improvements (CLI enhancements, better error messages, configuration flexibility)
-* Core feature (PD Disaggregation, KVCaceh, Graph mode, CP/SP, quantization)
-* Breaking changes and deprecations (always include with clear impact description)
-* Significant infrastructure changes (elastic scaling, distributed serving, hardware support)
-* Major dependency updates (CANN/torch_npu/triton-ascend/MoonCake/Ray/transformers versions, critical library updates)
-* Binary/deployment improvements (size reductions, Docker enhancements)
-* Default behavior changes (default models, configuration changes that affect all users)
-* Hardware compatibility expansions (310P, A2, A3, A5 support)
-In the end we don't want to miss any important changes. But also don't want to spam the notes with unnecessary details.
-
-3. Section Organization Guidelines
-
-* **Model Support first**: Most immediately visible to users, should lead the highlights
-* **Group by user impact**: Hardware/performance should focus on what users experience, not internal optimizations
-* **Provide usage context**: Include relevant flags, configuration options, and practical usage information
-* **Technical detail level**: Explain what features enable rather than just listing technical changes
-
-4. Writing Tips
-
-* Look up the PR if you are not sure about the details. The PR number at the end (#12345) can be looked up via vllm-project/vllm#12345. To get the description, you just need to call <https://api.github.com/repos/vllm-project/vllm/pulls/12345> and look at the body field.
-* When writing the highlights, don't be too verbose. Focus exclusively on what users should know.
--- a/.agents/skills/vllm-ascend-release-note-writer/references/ref-past-release-notes-highlight.md
+++ b/.agents/skills/vllm-ascend-release-note-writer/references/ref-past-release-notes-highlight.md
@@ -1,198 +0,0 @@
-## v0.14.0rc1 - 2026.01.26
-
-This is the first release candidate of v0.14.0 for vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started. This release includes all the changes in v0.13.0rc2. So We just list the differences from v0.13.0rc2. If you are upgrading from v0.13.0rc1, please read both v0.14.0rc1 and v0.13.0rc2 release notes.
-
-### Highlights
-
- 310P support is back now. In this release, only basic dense and vl models are supported with eager mode. We'll keep improving and maintaining the support for 310P. [#5776](https://github.com/vllm-project/vllm-ascend/pull/5776)
- Support compressed tensors moe w8a8-int8 quantization. [#5718](https://github.com/vllm-project/vllm-ascend/pull/5718)
- Support Medusa speculative decoding. [#5668](https://github.com/vllm-project/vllm-ascend/pull/5668)
- Support Eagle3 speculative decoding for Qwen3vl. [#4848](https://github.com/vllm-project/vllm-ascend/pull/4848)
-
-### Features
-
- Xlite Backend supports Qwen3 MoE now. [#5951](https://github.com/vllm-project/vllm-ascend/pull/5951)
- Support DSA-CP for PD-mix deployment case. [#5702](https://github.com/vllm-project/vllm-ascend/pull/5702)
- Add support of new W4A4_LAOS_DYNAMIC quantization method. [#5143](https://github.com/vllm-project/vllm-ascend/pull/5143)
-
-### Performance
-
- The performance of Qwen3-next has been improved. [#5664](https://github.com/vllm-project/vllm-ascend/pull/5664) [#5984](https://github.com/vllm-project/vllm-ascend/pull/5984) [#5765](https://github.com/vllm-project/vllm-ascend/pull/5765)
- The CPU bind logic and performance has been improved. [#5555](https://github.com/vllm-project/vllm-ascend/pull/5555)
- Merge Q/K split to simplify AscendApplyRotaryEmb for better performance. [#5799](https://github.com/vllm-project/vllm-ascend/pull/5799)
- Add Matmul Allreduce Rmsnorm fusion Pass. It's disabled by default. Set `fuse_allreduce_rms=True` in `--additional_config` to enable it. [#5034](https://github.com/vllm-project/vllm-ascend/pull/5034)
- Optimize rope embedding with triton kernel for huge performance gain. [#5918](https://github.com/vllm-project/vllm-ascend/pull/5918)
- support advanced apply_top_k_top_p without top_k constraint. [#6098](https://github.com/vllm-project/vllm-ascend/pull/6098)
- Parallelize Q/K/V padding in AscendMMEncoderAttention for better performance. [#6204](https://github.com/vllm-project/vllm-ascend/pull/6204)
-
-### Others
-
- model runner v2 support triton of penalty. [#5854](https://github.com/vllm-project/vllm-ascend/pull/5854)
- model runner v2 support eagle spec decoding. [#5840](https://github.com/vllm-project/vllm-ascend/pull/5840)
- Fix multi-modal inference OOM issues by setting `expandable_segments:True` by default. [#5855](https://github.com/vllm-project/vllm-ascend/pull/5855)
- `VLLM_ASCEND_ENABLE_MLAPO` is set to `True` by default. It's enabled automatically on decode node in PD deployment case. Please note that this feature will cost more memory. If you are memory sensitive, please set it to False. [#5952](https://github.com/vllm-project/vllm-ascend/pull/5952)
- SSL config can be set to kv_extra_config for PD deployment with mooncake layerwise connector. [#5875](https://github.com/vllm-project/vllm-ascend/pull/5875)
- support `--max_model_len=auto`. [#6193](https://github.com/vllm-project/vllm-ascend/pull/6193)
-
-### Dependencies
-
- torch-npu is upgraded to 2.9.0 [#6112](https://github.com/vllm-project/vllm-ascend/pull/6112)
-
-### Deprecation & Breaking Changes
-
- EPLB config options is moved to `eplb_config` in [additional config](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/configuration/additional_config.html). The old ones are removed in this release.
- The profiler envs, such as `VLLM_TORCH_PROFILER_DIR` and `VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY` do not work with vLLM Ascend now. Please use vLLM `--profiler-config` parameters instead. [#5928](https://github.com/vllm-project/vllm-ascend/pull/5928)
-
-### Known Issues
-
- If you hit the pickle error from `EngineCore` process sometimes, please cherry-pick the [PR](https://github.com/vllm-project/vllm/pull/32022) into your local vLLM code. This known issue will be fixed in vLLM in the next release.
-
-## v0.13.0rc2 - 2026.01.24
-
-This is the second release candidate of v0.13.0 for vLLM Ascend. In this rc release, we fixed lots of bugs and improved the performance of many models. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/v0.13.0/) to get started. Any feedback is welcome to help us to improve the final version of v0.13.0.
-
-### Highlights
-
-We mainly focus on quality and performance improvement in this release. The spec decode, graph mode, context parallel and EPLB have been improved significantly. A lot of bugs have been fixed and the performance has been improved for DeepSeek3.1/3.2, Qwen3 Dense/MOE models.
-
-### Features
-
- implement basic framework for batch invariant [#5517](https://github.com/vllm-project/vllm-ascend/pull/5517)
- Eagle spec decode feature now works with full graph mode. [#5118](https://github.com/vllm-project/vllm-ascend/pull/5118)
- Context Parallel(PCP&DCP) feature is more stable now. And it works for most case. Please try it out.
- MTP and eagle spec decode feature now works in most cases. And it's suggested to use them in most cases.
- EPLB feature more stable now. Many bugs have been fixed. Mix placement works now [#6086](https://github.com/vllm-project/vllm-ascend/pull/6086)
- Support kv nz feature for DeepSeek decode node in disagg-prefill scenario [#3072](https://github.com/vllm-project/vllm-ascend/pull/3072)
-
-### Model Support
-
- LongCat-Flash is supported now.[#3833](https://github.com/vllm-project/vllm-ascend/pull/3833)
- minimax_m2 is supported now. [#5624](https://github.com/vllm-project/vllm-ascend/pull/5624)
- Support for cross-attention and whisper models [#5592](https://github.com/vllm-project/vllm-ascend/pull/5592)
-
-### Performance
-
- Many custom ops and triton kernels are added in this release to speed up the performance of models. Such as `RejectSampler`, `MoeInitRoutingCustom`, `DispatchFFNCombine` and so on.
- Improved the performance of Layerwise Connector [#5303](https://github.com/vllm-project/vllm-ascend/pull/5303)
-
-### Others
-
- Basic support Model Runner v2. Model Runner V2 is the next generation of vLLM. It will be used by default in the future release. [#5210](https://github.com/vllm-project/vllm-ascend/pull/5210)
- Fixed a bug that the zmq send/receive may failed [#5503](https://github.com/vllm-project/vllm-ascend/pull/5503)
- Supported to use full-graph with Qwen3-Next-MTP [#5477](https://github.com/vllm-project/vllm-ascend/pull/5477)
- Fix weight transpose in RL scenarios [#5567](https://github.com/vllm-project/vllm-ascend/pull/5567)
- Adapted SP to eagle3 [#5562](https://github.com/vllm-project/vllm-ascend/pull/5562)
- Context Parallel(PCP&DCP) support mlapo [#5672](https://github.com/vllm-project/vllm-ascend/pull/5672)
- GLM4.6 support mtp with fullgraph [#5460](https://github.com/vllm-project/vllm-ascend/pull/5460)
- Flashcomm2 now works with oshard generalized feature [#4723](https://github.com/vllm-project/vllm-ascend/pull/4723)
- Support setting tp=1 for the Eagle draft model [#5804](https://github.com/vllm-project/vllm-ascend/pull/5804)
- Flashcomm1 feature now works with qwen3-vl [#5848](https://github.com/vllm-project/vllm-ascend/pull/5848)
- Support fine-grained shared expert overlap [#5962](https://github.com/vllm-project/vllm-ascend/pull/5962)
-
-### Dependencies
-
- CANN is upgraded to 8.5.0
- torch-npu is upgraded to 2.8.0.post1. Please note that the post version will not be installed by default. Please install it by hand from [pypi mirror](https://mirrors.huaweicloud.com/ascend/repos/pypi/torch-npu/).
- triton-ascend is upgraded to 3.2.0
-
-### Deprecation & Breaking Changes
-
- `CPUOffloadingConnector` is deprecated. We'll remove it in the next release. It'll be replaced by CPUOffload feature from vLLM in the future.
- eplb config options is moved to `eplb_config` in [additional config](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/configuration/additional_config.html). The old ones will be removed in the next release.
- `ProfileExecuteDuration` [feature](https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/performance_and_debug/profile_execute_duration.html) is deprecated. It's replaced by `ObservabilityConfig` from vLLM.
- The value of `VLLM_ASCEND_ENABLE_MLAPO` env will be set to True by default in the next release. It'll be enabled in decode node by default. Please note that this feature will cost more memory. If you are memory sensitive, please set it to False.
-
-## v0.13.0rc1 - 2025.12.27
-
-This is the first release candidate of v0.13.0 for vLLM Ascend. We landed lots of bug fix, performance improvement and feature support in this release. Any feedback is welcome to help us to improve vLLM Ascend. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/latest) to get started.
-
-### Highlights
-
- Improved the performance of DeepSeek V3.2, please refer to [tutorials](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/DeepSeek-V3.2.html)
- Qwen3-Next MTP with chunked prefill is supported now [#4770](https://github.com/vllm-project/vllm-ascend/pull/4770), please refer to [tutorials](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/Qwen3-Next.html)
- [Experimental] Prefill Context Parallel and Decode Context Parallel are supported, but notice that it is an experimental feature now, welcome any feedback. please refer to [context parallel feature guide](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/context_parallel.html)
-
-### Features
-
- Support openPangu Ultra MoE [4615](https://github.com/vllm-project/vllm-ascend/pull/4615)
- A new quantization method W8A16 is supported now. [#4541](https://github.com/vllm-project/vllm-ascend/pull/4541)
- Cross-machine Disaggregated Prefill is supported now. [#5008](https://github.com/vllm-project/vllm-ascend/pull/5008)
- Add UCMConnector for KV Cache Offloading. [#4411](https://github.com/vllm-project/vllm-ascend/pull/4411)
- Support async_scheduler and disable_padded_drafter_batch in eagle. [#4893](https://github.com/vllm-project/vllm-ascend/pull/4893)
- Support pcp + mtp in full graph mode. [#4572](https://github.com/vllm-project/vllm-ascend/pull/4572)
- Enhance all-reduce skipping logic for MoE models in NPUModelRunner [#5329](https://github.com/vllm-project/vllm-ascend/pull/5329)
-
-### Performance
-
-Some general performance improvement:
-
- Add l2norm triton kernel [#4595](https://github.com/vllm-project/vllm-ascend/pull/4595)
- Add new pattern for AddRmsnormQuant with SP, which could only take effect in graph mode. [#5077](https://github.com/vllm-project/vllm-ascend/pull/5077)
- Add async exponential while model executing. [#4501](https://github.com/vllm-project/vllm-ascend/pull/4501)
- Remove the transpose step after attention and switch to transpose_batchmatmul [#5390](https://github.com/vllm-project/vllm-ascend/pull/5390)
- To optimize the performance in small batch size scenario, an attention operator with flash decoding function is offered, please refer to item 22 in [FAQs](https://docs.vllm.ai/projects/ascend/en/latest/faqs.html) to enable it.
-
-### Other
-
- OOM error on VL models is fixed now. We're keeping observing it, if you hit OOM problem again, please submit an issue. [#5136](https://github.com/vllm-project/vllm-ascend/pull/5136)
- Fixed an accuracy bug of Qwen3-Next-MTP when batched inferring. [#4932](https://github.com/vllm-project/vllm-ascend/pull/4932)
- Fix npu-cpu offloading interface change bug. [#5290](https://github.com/vllm-project/vllm-ascend/pull/5290)
- Fix MHA model runtime error in aclgraph mode [#5397](https://github.com/vllm-project/vllm-ascend/pull/5397)
- Fix unsuitable moe_comm_type under ep=1 scenario [#5388](https://github.com/vllm-project/vllm-ascend/pull/5388)
-
-### Deprecation & Breaking Changes
-
- `VLLM_ASCEND_ENABLE_DENSE_OPTIMIZE` is removed and `VLLM_ASCEND_ENABLE_PREFETCH_MLP` is recommend to replace as they always be enabled together. [#5272](https://github.com/vllm-project/vllm-ascend/pull/5272)
- `VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP` is dropped now. [#5270](https://github.com/vllm-project/vllm-ascend/pull/5270)
- `VLLM_ASCEND_ENABLE_NZ` is disabled for float weight case, since we notice that the performance is not good in some float case. Feel free to set it to 2 if you make sure it works for your case. [#4878](https://github.com/vllm-project/vllm-ascend/pull/4878)
- `chunked_prefill_for_mla` in `additional_config` is dropped now. [#5296](https://github.com/vllm-project/vllm-ascend/pull/5296)
- `dump_config` in `additional_config` is renamed to `dump_config_path` and the type is change from `dict` to `string`. [#5296](https://github.com/vllm-project/vllm-ascend/pull/5296)
-
-### Dependencies
-
- vLLM version has been upgraded to 0.13.0 and drop 0.12.0 support. [#5146](https://github.com/vllm-project/vllm-ascend/pull/5146)
- Transformer version has been upgraded >= 4.57.3 [#5250](https://github.com/vllm-project/vllm-ascend/pull/5250)
-
-### Known Issues
-
- Qwen3-Next doesn't support long sequence scenario, and we should limit `gpu-memory-utilization` according to the doc to run Qwen3-Next. We'll improve it in the next release
- The functional break on Qwen3-Next when the input/output is around 3.5k/1.5k is fixed, but it introduces a regression on performance. We'll fix it in next release. [#5357](https://github.com/vllm-project/vllm-ascend/issues/5357)
- There is a precision issue with curl on ultra-short sequences in DeepSeek-V3.2. We'll fix it in next release. [#5370](https://github.com/vllm-project/vllm-ascend/issues/5370)
-
-## v0.11.0 - 2025.12.16
-
-We're excited to announce the release of v0.11.0 for vLLM Ascend. This is the official release for v0.11.0. Please follow the [official doc](https://docs.vllm.ai/projects/ascend/en/v0.11.0) to get started. We'll consider to release post version in the future if needed. This release note will only contain the important change and note from v0.11.0rc3.
-
-### Highlights
-
- Improved the performance for deepseek 3/3.1. [#3995](https://github.com/vllm-project/vllm-ascend/pull/3995)
- Fixed the accuracy bug for qwen3-vl. [#4811](https://github.com/vllm-project/vllm-ascend/pull/4811)
- Improved the performance of sample. [#4153](https://github.com/vllm-project/vllm-ascend/pull/4153)
- Eagle3 is back now. [#4721](https://github.com/vllm-project/vllm-ascend/pull/4721)
-
-### Other
-
- Improved the performance for kimi-k2.  [#4555](https://github.com/vllm-project/vllm-ascend/pull/4555)
- Fixed a quantization bug for deepseek3.2-exp. [#4797](https://github.com/vllm-project/vllm-ascend/pull/4797)
- Fixed qwen3-vl-moe bug under high concurrency. [#4658](https://github.com/vllm-project/vllm-ascend/pull/4658)
- Fixed an accuracy bug for Prefill Decode disaggregation case. [#4437](https://github.com/vllm-project/vllm-ascend/pull/4437)
- Fixed some bugs for EPLB [#4576](https://github.com/vllm-project/vllm-ascend/pull/4576) [#4777](https://github.com/vllm-project/vllm-ascend/pull/4777)
- Fixed the version incompatibility issue for openEuler docker image. [#4745](https://github.com/vllm-project/vllm-ascend/pull/4745)
-
-### Deprecation announcement
-
- LLMdatadist connector has been deprecated, it'll be removed in v0.12.0rc1
- Torchair graph has been deprecated, it'll be removed in v0.12.0rc1
- Ascend scheduler has been deprecated, it'll be removed in v0.12.0rc1
-
-### Upgrade notice
-
- torch-npu is upgraded to 2.7.1.post1. Please note that the package is pushed to [pypi mirror](https://mirrors.huaweicloud.com/ascend/repos/pypi/torch-npu/). So it's hard to add it to auto dependence. Please install it by yourself.
- CANN is upgraded to 8.3.rc2.
-
-### Known Issues
-
- Qwen3-Next doesn't support expert parallel and MTP features in this release. And it'll be oom if the input is too long. We'll improve it in the next release
- Deepseek 3.2 only work with torchair graph mode in this release. We'll make it work with aclgraph mode in the next release.
- Qwen2-audio doesn't work by default. Temporary solution is to set `--gpu-memory-utilization` to a suitable value, such as 0.8.
- CPU bind feature doesn't work if more than one vLLM instance is running on the same node.
--- a/.agents/skills/vllm-ascend-release-note-writer/scripts/fetch_commits-optimize.py
+++ b/.agents/skills/vllm-ascend-release-note-writer/scripts/fetch_commits-optimize.py
--- a/.clang-format
+++ b/.clang-format
@@ -1,26 +0,0 @@
-BasedOnStyle: Google
-UseTab: Never
-IndentWidth: 2
-ColumnLimit: 120
-
-# Force pointers to the type for C++.
-DerivePointerAlignment: false
-PointerAlignment: Left
-
-# Reordering #include statements can (and currently will) introduce errors
-SortIncludes: false
-
-# Style choices
-AlignConsecutiveAssignments: false
-AlignConsecutiveDeclarations: false
-IndentPPDirectives: BeforeHash
-
-IncludeCategories:
-  - Regex:           '^<'
-    Priority:        4
-  - Regex:           '^"(llvm|llvm-c|clang|clang-c|mlir|mlir-c)/'
-    Priority:        3
-  - Regex:           '^"(qoda|\.\.)/'
-    Priority:        2
-  - Regex:           '.*'
-    Priority:        1
--- a/.claude/README.md
+++ b/.claude/README.md
@@ -1 +0,0 @@
-If you want to use the skills in this repo with Claude code, please copy the skills directory `.agents/skills` to this directory.
--- a/.gemini/config.yaml
+++ b/.gemini/config.yaml
@@ -1,9 +1,6 @@
 # https://developers.google.com/gemini-code-assist/docs/customize-gemini-behavior-github
 have_fun: false  # Just review the code
-memory_config:
-  disabled: false
 code_review:
  comment_severity_threshold: HIGH  # Reduce quantity of comments
  pull_request_opened:
-    help: true  # Add a help comment to the PR
-    summary: true  # Summarize the PR in a separate comment
+    summary: false  # Don't summarize the PR in a separate comment
--- a/.gemini/styleguide.md
+++ b/.gemini/styleguide.md
@@ -1,90 +0,0 @@
-# Pull Request Summary Style Guide
-
-## Output Instructions
-
-**IMPORTANT**: When doing PR review, you MUST output them in markdown code blocks so users can easily copy them:
-
-1. **PR Title**: Output the generated title in a code block with triple backticks
-2. **PR Summary**: Output the generated summary in a markdown code block with triple backticks
-
-This allows users to directly copy the content without manual formatting.
-
-## Pull Request Summary Format
-
-The summary should follow the format:
-
-   ```markdown
-    ### What this PR does / why we need it?
-    <!--
-    - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
-    If possible, please consider writing useful notes for better and faster reviews in your PR.
-
-    - Please clarify why the changes are needed. For instance, the use case and bug description.
-
-    - Fixes #
-    -->
-
-    ### Does this PR introduce _any_ user-facing change?
-    <!--
-    Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes.
-    Documentation-only updates are not considered user-facing changes.
-    -->
-
-    ### How was this patch tested?
-    <!--
-    CI passed with new added/existing test.
-    If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
-    If tests were not added, please describe why they were not added and/or why it was difficult to add.
-    -->
-   ```
-
-## Pull Request Title Format
-
-The summary should also refresh the Pull Request Title to follow the format:
-
-    ```txt
-    [Branch][Module][Action] Pull Request Title
-    ```
-
- Branch: The branch name where the PR is based. If the base branch is main, this prefix can be omitted.
- Module: The module or component being changed. It includes but is not limited to the following:
-    - [Attention]
-    - [Ops]
-    - [Doc]
-    - [Test]
-    - [CI]
-    - [Benchmark]
- Action: The action being performed. It includes but is not limited to the following:
-    - [BugFix]
-    - [Feature]
-    - [Misc]
-
-## Example Output Format
-
-When providing a PR review, format your response like this:
-
-**Suggested PR Title:**
-
-```markdown
-[Branch][Module][Action] Your generated title here
-```
-
-**Suggested PR Summary:**
-
-```markdown
-### What this PR does / why we need it?
-
-Your analysis of what the PR does and why it's needed.
-
-Fixes #issue_number
-
-### Does this PR introduce _any_ user-facing change?
-
-Your assessment of user-facing changes.
-
-### How was this patch tested?
-
-Your description of testing approach.
-```
-
-And please print your review suggestion in markdown format no matter the pull request description is empty or not.
--- a/.github/workflows/dockerfiles/Dockerfile.buildwheel.a2
+++ b/.github/workflows/dockerfiles/Dockerfile.buildwheel.a2
@@ -15,13 +15,13 @@
 # This file is a part of the vllm-ascend project.
 #
 ARG PY_VERSION=3.11
-FROM quay.io/ascend/manylinux:8.5.1-910b-manylinux_2_28-py${PY_VERSION}
+FROM quay.io/ascend/manylinux:8.2.rc1-910b-manylinux_2_28-py${PY_VERSION}

-ARG SOC_VERSION="ascend910b1"
+ARG COMPILE_CUSTOM_KERNELS=1

 # Define environments
 ENV DEBIAN_FRONTEND=noninteractive
-ENV SOC_VERSION=$SOC_VERSION
+ENV COMPILE_CUSTOM_KERNELS=${COMPILE_CUSTOM_KERNELS}
 RUN yum update -y && \
    yum install -y python3-pip git vim wget net-tools gcc gcc-c++ make cmake numactl-devel && \
    rm -rf /var/cache/yum
@@ -32,7 +32,7 @@ COPY . /workspace/vllm-ascend/

 # Install req
 RUN python3 -m pip install -r vllm-ascend/requirements.txt --extra-index https://download.pytorch.org/whl/cpu/ && \
-    python3 -m pip install twine attrs psutil
+    python3 -m pip install twine

 # Install vllm-ascend
 RUN source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
--- a/.github.backup/ISSUE_TEMPLATE/100-documentation.yml
+++ b/.github.backup/ISSUE_TEMPLATE/100-documentation.yml
--- a/.github.backup/ISSUE_TEMPLATE/110-user-story.yml
+++ b/.github.backup/ISSUE_TEMPLATE/110-user-story.yml
@@ -1,5 +1,5 @@
 name: 📚 User Story
-description: Apply for an user story to be displayed on https://docs.vllm.ai/projects/ascend/en/latest/community/user_stories/index.html
+description: Apply for an user story to be displayed on https://vllm-ascend.readthedocs.io/en/latest/community/user_stories/index.html
 title: "[User Story]: "
 labels: ["user-story"]

@@ -18,7 +18,7 @@ body:
      A brief introduction about the background of your use case, like your scenario, hardware size etc.
 - type: textarea
  attributes:
-    label: Business Challenges
+    label: Bussiness Challenges
    description: >
      Tell us how what kind of challenge you faced in this user story.
 - type: textarea
@@ -30,7 +30,7 @@ body:
  attributes:
    label: Extra Info
    description: >
-      Any extra information you want to include in this story
+      Any extra infomation you want to include in this story
 - type: markdown
  attributes:
    value: >
--- a/.github.backup/ISSUE_TEMPLATE/200-installation.yml
+++ b/.github.backup/ISSUE_TEMPLATE/200-installation.yml
--- a/.github.backup/ISSUE_TEMPLATE/300-usage.yml
+++ b/.github.backup/ISSUE_TEMPLATE/300-usage.yml
--- a/.github.backup/ISSUE_TEMPLATE/400-bug-report.yml
+++ b/.github.backup/ISSUE_TEMPLATE/400-bug-report.yml
--- a/.github.backup/ISSUE_TEMPLATE/500-feature-request.yml
+++ b/.github.backup/ISSUE_TEMPLATE/500-feature-request.yml
--- a/.github.backup/ISSUE_TEMPLATE/600-new-model.yml
+++ b/.github.backup/ISSUE_TEMPLATE/600-new-model.yml
@@ -9,7 +9,7 @@ body:
    value: >
      #### Before submitting an issue, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue+sort%3Acreated-desc+).

-      #### We also highly recommend you read https://docs.vllm.ai/projects/ascend/en/latest/user_guide/supported_models.html first to know which model already supported.
+      #### We also highly recommend you read https://vllm-ascend.readthedocs.io/en/latest/user_guide/supported_models.html first to know which model already supported.
 - type: textarea
  attributes:
    label: The model to consider.
@@ -21,7 +21,7 @@ body:
  attributes:
    label: The closest model vllm already supports.
    description: >
-      Here is the list of models already supported by vllm: https://docs.vllm.ai/projects/ascend/en/latest/user_guide/supported_models.html . Which model is the most similar to the model you want to add support for?
+      Here is the list of models already supported by vllm: https://vllm-ascend.readthedocs.io/en/latest/user_guide/supported_models.html . Which model is the most similar to the model you want to add support for?
 - type: textarea
  attributes:
    label: What's your difficulty of supporting the model you want?
--- a/.github.backup/ISSUE_TEMPLATE/700-performance-discussion.yml
+++ b/.github.backup/ISSUE_TEMPLATE/700-performance-discussion.yml
--- a/.github.backup/ISSUE_TEMPLATE/750-RFC.yml
+++ b/.github.backup/ISSUE_TEMPLATE/750-RFC.yml
--- a/.github.backup/ISSUE_TEMPLATE/800-others.yml
+++ b/.github.backup/ISSUE_TEMPLATE/800-others.yml
--- a/.github.backup/ISSUE_TEMPLATE/900-release-checklist.yml
+++ b/.github.backup/ISSUE_TEMPLATE/900-release-checklist.yml
@@ -32,9 +32,9 @@ body:

        - [ ] Add release note to docs/source/user_guide/release_notes.md

-        - [ ] Update release version in README.md and README.zh.md (Getting Started and Branch section)
+        - [ ] Update release version in README.md and README.zh.md

-        - [ ] Update version info in docs/source/community/versioning_policy.md(Release compatibility matrix, Release window and Branch states section)
+        - [ ] Update version info in docs/source/community/versioning_policy.md

        - [ ] Update contributor info in docs/source/community/contributors.md

--- a/.github.backup/ISSUE_TEMPLATE/config.yml
+++ b/.github.backup/ISSUE_TEMPLATE/config.yml
--- a/.github.backup/PULL_REQUEST_TEMPLATE.md
+++ b/.github.backup/PULL_REQUEST_TEMPLATE.md
--- a/.github.backup/actionlint.yaml
+++ b/.github.backup/actionlint.yaml
@@ -1,18 +1,21 @@
 self-hosted-runner:
  # Labels of self-hosted runner in array of strings.
  labels:
+    - linux-aarch64-a2-0
+    - linux-aarch64-a2-1
+    - linux-aarch64-a2-2
+    - linux-aarch64-a2-4
+    - linux-aarch64-a2-8
    - linux-arm64-npu-static-8
    - linux-aarch64-310p-1
    - linux-aarch64-310p-2
    - linux-aarch64-310p-4
+    - ubuntu-24.04-arm
    - linux-aarch64-a3-1
    - linux-aarch64-a3-2
    - linux-aarch64-a3-4
    - linux-aarch64-a3-8
+    - linux-amd64-cpu-0
+    - linux-amd64-cpu-8
+    - linux-amd64-cpu-16
    - linux-aarch64-a3-0
-    - linux-amd64-cpu-8-hk
-    - linux-amd64-cpu-16-hk
-    - linux-aarch64-a2b3-0
-    - linux-aarch64-a2b3-1
-    - linux-aarch64-a2b3-2
-    - linux-aarch64-a2b3-4
--- a/.github.backup/dependabot.yml
+++ b/.github.backup/dependabot.yml
--- a/.github.backup/format_pr_body.sh
+++ b/.github.backup/format_pr_body.sh
@@ -0,0 +1,59 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# This file is a part of the vllm-ascend project.
+# Adapted from vllm/.github/scripts/cleanup_pr_body.sh
+
+#!/bin/bash
+
+set -eux
+
+# ensure 2 argument is passed
+if [ "$#" -ne 3 ]; then
+    echo "Usage: $0 <pr_number> <vllm_version> <vllm_commit>"
+    exit 1
+fi
+
+PR_NUMBER=$1
+VLLM_VERSION=$2
+VLLM_COMMIT=$3
+OLD=/tmp/orig_pr_body.txt
+NEW=/tmp/new_pr_body.txt
+FINAL=/tmp/final_pr_body.txt
+
+gh pr view --json body --template "{{.body}}" "${PR_NUMBER}" > "${OLD}"
+cp "${OLD}" "${NEW}"
+
+# Remove notes in pr description and add vLLM version and commit
+sed -i '/<!--/,/-->/d' "${NEW}"
+sed -i '/- vLLM .*$/d' "${NEW}"
+{
+    echo ""
+    echo "- vLLM version: $VLLM_VERSION"
+    echo "- vLLM main: $VLLM_COMMIT"
+} >> "${NEW}"
+
+# Remove redundant empty lines
+uniq "${NEW}" > "${FINAL}"
+
+# Run this only if ${NEW} is different than ${OLD}
+if ! cmp -s "${OLD}" "${FINAL}"; then
+    echo
+    echo "Updating PR body:"
+    echo
+    cat "${NEW}"
+    gh pr edit --body-file "${FINAL}" "${PR_NUMBER}"
+else
+    echo "No changes needed"
+fi
--- a/.github.backup/labeler.yml
+++ b/.github.backup/labeler.yml
@@ -8,8 +8,8 @@ documentation:
 ci/build:
  - changed-files:
      - any-glob-to-any-file:
-          - '.github/actions/*.yaml'
-          - '.github/workflows/*.yaml'
+          - '.github/actions/*.yml'
+          - '.github/workflows/*.yml'

 'module:tests':
  - changed-files:
--- a/.github.backup/workflows/_accuracy_test.yaml
+++ b/.github.backup/workflows/_accuracy_test.yaml
@@ -0,0 +1,175 @@
+name: 'accuracy test'
+
+on:
+  workflow_call:
+    inputs:
+      vllm:
+        required: true
+        type: string
+      vllm-ascend:
+        required: false
+        type: string
+        default: main
+      runner:
+        required: true
+        type: string
+      image:
+        required: true
+        type: string
+      model_name:
+        required: true
+        type: string
+      upload:
+        required: false
+        type: boolean
+        default: false
+
+jobs:
+  accuracy_tests:
+
+    runs-on: ${{ inputs.runner }}
+    name: ${{ inputs.model_name }} accuracy
+    container:
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-910b-ubuntu22.04-py3.11
+      env:
+        VLLM_USE_MODELSCOPE: True
+        # 1. If version specified (work_dispatch), do specified branch accuracy test
+        # 2. If no version (labeled PR), do accuracy test by default ref:
+        # The branch, tag or SHA to checkout. When checking out the repository that
+        # triggered a workflow, this defaults to the reference or SHA for that event.
+        # Otherwise, uses the default branch.
+        GHA_VLLM_ASCEND_VERSION: ${{ inputs.vllm-ascend }}
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Set model name as output
+        id: set_output
+        run: |
+          echo "model_name=${{ inputs.model_name }}" >> $GITHUB_OUTPUT
+
+      - name: Config mirrors
+        run: |
+          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
+          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
+          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
+          apt-get update -y
+          apt install git -y
+
+      - name: Install system dependencies
+        run: |
+          apt-get -y install `cat packages.txt`
+          apt-get -y install gcc g++ cmake libnuma-dev
+
+      - name: Checkout vllm-project/vllm repo
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-project/vllm
+          ref: ${{ inputs.vllm }}
+          path: ./vllm-empty
+
+      - name: Install vllm-project/vllm from source
+        working-directory: ./vllm-empty
+        run: |
+          VLLM_TARGET_DEVICE=empty pip install -e .
+
+      - name: Resolve vllm-ascend version
+        run: |
+          VERSION_INPUT="${{ inputs.vllm-ascend }}"
+          
+          if [[ "$VERSION_INPUT" == "latest" ]]; then
+            TAGS=$(git ls-remote --tags --sort=-v:refname https://github.com/vllm-project/vllm-ascend "v*" | cut -f2 | sed 's|refs/tags/||')
+            LATEST_TAG=$(echo "$TAGS" | head -n1)
+            if [[ -z "$LATEST_TAG" ]]; then
+              RESOLVED_VERSION="main"
+            else
+              RESOLVED_VERSION="$LATEST_TAG"
+            fi
+          else
+            RESOLVED_VERSION="$VERSION_INPUT"
+          fi
+          echo "GHA_VLLM_ASCEND_VERSION=$RESOLVED_VERSION" >> $GITHUB_ENV
+
+      - name: Checkout vllm-project/vllm-ascend repo
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-project/vllm-ascend
+          path: ./vllm-ascend
+          ref: ${{ env.GHA_VLLM_ASCEND_VERSION }}
+
+      - name: Install vllm-project/vllm-ascend
+        working-directory: ./vllm-ascend
+        env:
+          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
+        run: |
+          pip install -r requirements-dev.txt
+          pip install -v -e .
+
+      - name: Get vLLM commit hash and URL
+        working-directory: ./vllm-empty
+        run: |
+          VLLM_COMMIT=$(git rev-parse --short=7 HEAD)
+          echo "VLLM_COMMIT=$VLLM_COMMIT" >> $GITHUB_ENV
+
+      - name: Get vLLM-Ascend commit hash and URL
+        working-directory: ./vllm-ascend
+        run: |
+          VLLM_ASCEND_COMMIT=$(git rev-parse --short=7 HEAD)
+          echo "VLLM_ASCEND_COMMIT=$VLLM_ASCEND_COMMIT" >> $GITHUB_ENV
+
+      - name: Collect version info
+        run: |
+          for dir in /usr/local/Ascend/ascend-toolkit/*; do
+            dname=$(basename "$dir")
+            if [ "$dname" != "latest" ]; then
+              TOOLKIT_DIR="$dname"
+              break
+            fi
+          done
+          INFO_FILE="/usr/local/Ascend/ascend-toolkit/${TOOLKIT_DIR}/$(uname -i)-linux/ascend_toolkit_install.info"
+          GHA_CANN_VERSION=$(grep "version=" "$INFO_FILE" \
+                           | head -n1 \
+                           | cut -d'=' -f2 \
+                           | tr -d '"')
+          {
+            echo "GHA_CANN_VERSION=$GHA_CANN_VERSION"
+            pip show torch | grep "Version:" | awk '{print "GHA_TORCH_VERSION="$2}'
+            pip show torch_npu | grep "Version:" | awk '{print "GHA_TORCH_NPU_VERSION="$2}'
+            pip show vllm | grep "Version:" | awk '{print "GHA_VLLM_VERSION="$2}' | sed 's/+.*//'
+          } >> "$GITHUB_ENV"
+
+      - name: Run accuracy test
+        id: report
+        env:
+          VLLM_WORKER_MULTIPROC_METHOD: spawn
+          VLLM_USE_MODELSCOPE: True
+          VLLM_VERSION: ${{ env.GHA_VLLM_VERSION }}
+          VLLM_COMMIT: ${{ env.VLLM_COMMIT }}
+          VLLM_ASCEND_VERSION: ${{ env.GHA_VLLM_ASCEND_VERSION || github.ref }}
+          VLLM_ASCEND_COMMIT: ${{ env.VLLM_ASCEND_COMMIT }}
+          CANN_VERSION: ${{ env.GHA_CANN_VERSION }}
+          TORCH_VERSION: ${{ env.GHA_TORCH_VERSION }}
+          TORCH_NPU_VERSION: ${{ env.GHA_TORCH_NPU_VERSION }}
+        run: |
+          model_base_name=$(basename ${{ inputs.model_name }})
+          markdown_name="${model_base_name}"
+          echo "markdown_name=$markdown_name" >> $GITHUB_OUTPUT
+          mkdir -p ./benchmarks/accuracy
+          pytest -sv ./tests/e2e/models/test_lm_eval_correctness.py \
+          --config ./tests/e2e/models/configs/${{ inputs.model_name }}.yaml
+
+      - name: Generate step summary
+        if: ${{ always() }}
+        run: |
+          cat ./benchmarks/accuracy/${{ steps.report.outputs.markdown_name }}.md >> $GITHUB_STEP_SUMMARY
+
+      - name: Upload Report
+        if: ${{ inputs.upload == true }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: "report-${{ env.GHA_VLLM_ASCEND_VERSION }}-${{ steps.report.outputs.markdown_name }}"
+          path: ./benchmarks/accuracy/${{ steps.report.outputs.markdown_name }}.md
+          if-no-files-found: warn
+          retention-days: 90
+          overwrite: true
--- a/.github.backup/workflows/_e2e_test.yaml
+++ b/.github.backup/workflows/_e2e_test.yaml
@@ -0,0 +1,199 @@
+name: 'e2e test'
+
+on:
+  workflow_call:
+    inputs:
+      vllm:
+        required: true
+        type: string
+      runner:
+        required: true
+        type: string
+      image:
+        required: true
+        type: string
+      type:
+        required: true
+        type: string
+
+jobs:
+  e2e:
+    name: singlecard
+    runs-on: ${{ inputs.runner }}-1
+    container:
+      image: ${{ inputs.image }}
+      env:
+        VLLM_LOGGING_LEVEL: ERROR
+        VLLM_USE_MODELSCOPE: True
+    steps:
+      - name: Check npu and CANN info
+        run: |
+          npu-smi info
+          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
+
+      - name: Config mirrors
+        run: |
+          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
+          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
+          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
+          apt-get update -y
+          apt install git -y
+
+      - name: Checkout vllm-project/vllm-ascend repo
+        uses: actions/checkout@v4
+
+      - name: Install system dependencies
+        run: |
+          apt-get -y install `cat packages.txt`
+          apt-get -y install gcc g++ cmake libnuma-dev
+
+      - name: Checkout vllm-project/vllm repo
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-project/vllm
+          ref: ${{ inputs.vllm }}
+          path: ./vllm-empty
+          fetch-depth: 1
+
+      - name: Install vllm-project/vllm from source
+        working-directory: ./vllm-empty
+        run: |
+          VLLM_TARGET_DEVICE=empty pip install -e .
+
+      - name: Install vllm-project/vllm-ascend
+        env:
+          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
+        run: |
+          pip install -r requirements-dev.txt
+          pip install -v -e .
+
+      - name: Run vllm-project/vllm-ascend test
+        env:
+          VLLM_WORKER_MULTIPROC_METHOD: spawn
+          VLLM_USE_MODELSCOPE: True
+          PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
+        if: ${{ inputs.type == 'light' }}
+        run: |
+          pytest -sv tests/e2e/singlecard/test_aclgraph.py
+          pytest -sv tests/e2e/singlecard/test_quantization.py
+          pytest -sv tests/e2e/singlecard/test_vlm.py::test_multimodal_vl
+
+      - name: Run e2e test
+        env:
+          VLLM_WORKER_MULTIPROC_METHOD: spawn
+          VLLM_USE_MODELSCOPE: True
+          PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
+        if: ${{ inputs.type == 'full' }}
+        run: |
+          # We found that if running aclgraph tests in batch, it will cause AclmdlRICaptureBegin error. So we run
+          # the test separately.
+
+          pytest -sv tests/e2e/singlecard/test_aclgraph.py
+          pytest -sv tests/e2e/singlecard/test_aclgraph_mem.py
+          pytest -sv tests/e2e/singlecard/test_ascend_scheduler.py
+          pytest -sv tests/e2e/singlecard/test_bge_model.py
+          pytest -sv tests/e2e/singlecard/test_camem.py
+          pytest -sv tests/e2e/singlecard/test_chunked.py
+          pytest -sv tests/e2e/singlecard/test_embedding.py
+          pytest -sv tests/e2e/singlecard/test_embedding_aclgraph.py
+          pytest -sv tests/e2e/singlecard/test_guided_decoding.py
+          pytest -sv tests/e2e/singlecard/test_ilama_lora.py
+          pytest -sv tests/e2e/singlecard/test_profile_execute_duration.py
+          pytest -sv tests/e2e/singlecard/test_quantization.py
+          pytest -sv tests/e2e/singlecard/test_sampler.py
+          pytest -sv tests/e2e/singlecard/test_vlm.py
+
+          # ------------------------------------ v1 spec decode test ------------------------------------ #
+          pytest -sv tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_correctness.py
+          pytest -sv tests/e2e/singlecard/spec_decode_v1/test_v1_mtp_torchair_correctness.py
+          # Fix me: test_eagle_correctness OOM error
+          pytest -sv tests/e2e/singlecard/spec_decode_v1/test_v1_spec_decode.py
+
+          pytest -sv tests/e2e/singlecard/ops/
+
+  e2e-2-cards:
+    name: multicard
+    runs-on: ${{ inputs.runner }}-2
+    container:
+      image: ${{ inputs.image }}
+      env:
+        VLLM_LOGGING_LEVEL: ERROR
+        VLLM_USE_MODELSCOPE: True
+    steps:
+      - name: Check npu and CANN info
+        run: |
+          npu-smi info
+          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
+
+      - name: Config mirrors
+        run: |
+          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
+          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
+          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
+          apt-get update -y
+          apt install git -y
+
+      - name: Checkout vllm-project/vllm-ascend repo
+        uses: actions/checkout@v4
+
+      - name: Install system dependencies
+        run: |
+          apt-get -y install `cat packages.txt`
+          apt-get -y install gcc g++ cmake libnuma-dev
+
+      - name: Checkout vllm-project/vllm repo
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-project/vllm
+          ref: ${{ inputs.vllm }}
+          path: ./vllm-empty
+          fetch-depth: 1
+
+      - name: Install vllm-project/vllm from source
+        working-directory: ./vllm-empty
+        run: |
+          VLLM_TARGET_DEVICE=empty pip install -e .
+
+      - name: Install vllm-project/vllm-ascend
+        env:
+          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
+        run: |
+          pip install -r requirements-dev.txt
+          pip install -v -e .
+
+      - name: Run vllm-project/vllm-ascend test (light)
+        env:
+          VLLM_WORKER_MULTIPROC_METHOD: spawn
+          VLLM_USE_MODELSCOPE: True
+        if: ${{ inputs.type == 'light' }}
+        run: |
+          pytest -sv tests/e2e/multicard/test_qwen3_moe.py::test_models_distributed_Qwen3_MOE_TP2_WITH_EP
+
+      - name: Run vllm-project/vllm-ascend test (full)
+        env:
+          VLLM_WORKER_MULTIPROC_METHOD: spawn
+          VLLM_USE_MODELSCOPE: True
+        if: ${{ inputs.type == 'full' }}
+        run: |
+          pytest -sv tests/e2e/multicard/test_data_parallel.py
+          pytest -sv tests/e2e/multicard/test_expert_parallel.py
+          pytest -sv tests/e2e/multicard/test_external_launcher.py
+          pytest -sv tests/e2e/multicard/test_single_request_aclgraph.py
+          pytest -sv tests/e2e/multicard/test_fused_moe_allgather_ep.py
+          pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py
+
+          # To avoid oom, we need to run the test in a single process.
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_QwQ
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek_multistream_moe
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_Qwen3_W8A8
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_Qwen3_W4A8DYNAMIC_new_version
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_Qwen3_W4A8DYNAMIC_old_version
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek_W4A8DYNAMIC
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_sp_for_qwen3_moe
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_Qwen_Dense_with_flashcomm_v1
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_Qwen_Dense_with_prefetch_mlp_weight
+
+          pytest -sv tests/e2e/multicard/test_pipeline_parallel.py
+          pytest -sv tests/e2e/multicard/test_prefix_caching.py
+          pytest -sv tests/e2e/multicard/test_qwen3_moe.py
+          pytest -sv tests/e2e/multicard/test_torchair_graph_mode.py
--- a/.github.backup/workflows/accuracy_test.yaml
+++ b/.github.backup/workflows/accuracy_test.yaml
@@ -0,0 +1,72 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# This file is a part of the vllm-ascend project.
+#
+
+# This test will be triggered:
+# - PR labeled with: 'accuracy-test' & 'ready-for-test'
+name: ascend test / accuracy
+
+on:
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    types: [ labeled, synchronize ]
+
+# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
+# declared as "shell: bash -el {0}" on steps that need to be properly activated.
+# It's used to activate ascend-toolkit environment variables.
+defaults:
+  run:
+    shell: bash -el {0}
+
+# only cancel in-progress runs of the same workflow
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  run:
+    name: ""
+    strategy:
+      matrix:
+        # Only top series models should be listed in here
+        include:
+          - runner: a2-1
+            model_name: Qwen3-8B
+          - runner: a2-1
+            model_name: Qwen2.5-VL-7B-Instruct
+          - runner: a2-1
+            model_name: Qwen2-Audio-7B-Instruct
+          - runner: a2-2
+            model_name: Qwen3-30B-A3B
+          - runner: a2-2
+            model_name: Qwen3-VL-30B-A3B-Instruct
+          - runner: a2-2
+            model_name: DeepSeek-V2-Lite
+      fail-fast: false
+    # test will be triggered when tag 'accuracy-test' & 'ready-for-test'
+    if:  >-
+      ${{
+      contains(github.event.pull_request.labels.*.name, 'accuracy-test') &&
+      contains(github.event.pull_request.labels.*.name, 'ready-for-test')
+      }}
+    uses: ./.github/workflows/_accuracy_test.yaml
+    with:
+      vllm: v0.11.0
+      runner:  linux-aarch64-${{ matrix.runner }}
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-910b-ubuntu22.04-py3.11
+      model_name: ${{ matrix.model_name }}
--- a/.github.backup/workflows/format_pr_body.yaml
+++ b/.github.backup/workflows/format_pr_body.yaml
@@ -0,0 +1,57 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# This file is a part of the vllm-ascend project.
+#
+
+name: format / pr body
+
+on:
+  # The PR updated when PR opened and push new commits
+  pull_request_target:
+    types: [opened, synchronize]
+    branches:
+      - 'main'
+
+permissions:
+  pull-requests: write
+
+jobs:
+  update-description:
+    name: update vLLM version
+    runs-on: ubuntu-latest
+
+    steps:
+
+      - name: Get vLLM version
+        run: |
+          VLLM_COMMIT=v0.11.0
+          echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> $GITHUB_ENV
+
+      - name: Checkout repository
+        uses: actions/checkout@ff7abcd0c3c05ccf6adc123a8cd1fd4fb30fb493 # v4.2.2
+
+      - name: Set up Python
+        uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
+
+      - name: Get vLLM release version
+        run: |
+          VLLM_VERSION=$(python3 docs/source/conf.py | jq .ci_vllm_version | tr -d '"')
+          echo "VLLM_VERSION=$VLLM_VERSION" >> $GITHUB_ENV
+
+      - name: Update PR description
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          bash .github/format_pr_body.sh "${{ github.event.number }}" "${{ env.VLLM_VERSION }}" "${{ env.VLLM_COMMIT }}"
--- a/.github.backup/workflows/image_310p_openeuler.yml
+++ b/.github.backup/workflows/image_310p_openeuler.yml
@@ -0,0 +1,135 @@
+name: 'image / openEuler / 310p'
+# This is a docker build check and publish job:
+# 1. PR Triggered docker image build check
+#   - is for image build check
+#   - Enable on main/*-dev branch
+#   - push: ${{ github.event_name != 'pull_request' }} ==> false
+# 2. branches push trigger image publish
+#   - is for branch/dev/nightly image
+#   - commits are merge into main/*-dev  ==> vllm-ascend:main-310p-openeuler / vllm-ascend:*-dev-310p-openeuler
+# 3. tags push trigger image publish
+#   - is for final release image
+#   - Publish when tag with v* (pep440 version)  ===>  vllm-ascend:v1.2.3-310p-openeuler / vllm-ascend:v1.2.3rc1-310p-openeuler
+on:
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/image_310p_openeuler.yml'
+      - 'Dockerfile.310p.openEuler'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+    types: [ labeled ]
+  push:
+    # Publish image when tagging, the Dockerfile in tag will be build as tag image
+    branches:
+      - 'main'
+      - '*-dev'
+    tags:
+      - 'v*'
+    paths:
+      - '.github/workflows/image_310p_openeuler.yml'
+      - 'Dockerfile.310p.openEuler'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+
+# only cancel in-progress runs of the same workflow
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  build:
+    name: vllm-ascend image build
+    # Only arm64 build on openEuler arm64, only amd64 build on Ubuntu amd64
+    # Push event or PR with both 'ready' and 'ready-for-test' labels
+    runs-on: >-
+      ${{
+          github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+          'ubuntu-latest' ||
+          'ubuntu-24.04-arm'
+      }}
+    if: ${{ github.event_name == 'push' || (contains(github.event.pull_request.labels.*.name, 'ready') && contains(github.event.pull_request.labels.*.name, 'ready-for-test')) }}
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        fetch-depth: 0
+        persist-credentials: false
+
+    - name: Print
+      run: |
+        lscpu
+
+    - name: Docker meta
+      id: meta
+      uses: docker/metadata-action@v5
+      with:
+        # TODO(yikun): add more hub image and a note on release policy for container image
+        images: |
+          quay.io/ascend/vllm-ascend
+        # Note for test case
+        # https://github.com/marketplace/actions/docker-metadata-action#typeref
+        # 1. branch job pulish per main/*-dev branch commits
+        # 2. main and dev pull_request is build only, so the tag pr-N-310p-openeuler is fine
+        # 3. only pep440 matched tag will be published:
+        #    - v0.7.1 --> v0.7.1-310p-openeuler
+        #    - pre/post/dev: v0.7.1rc1-310p-openeuler/v0.7.1rc1-310p-openeuler/v0.7.1rc1.dev1-310p-openeuler/v0.7.1.post1-310p-openeuler, no latest
+        #      which follow the rule from vLLM with prefix v
+        # TODO(yikun): the post release might be considered as latest release
+        tags: |
+          type=ref,event=branch,suffix=-310p-openeuler
+          type=ref,event=pr,suffix=-310p-openeuler
+          type=pep440,pattern={{raw}},suffix=-310p-openeuler
+        flavor:
+          latest=false
+
+    - name: Free up disk space
+      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+      with:
+        tool-cache: true
+        docker-images: false
+
+    - name: Build - Set up QEMU
+      uses: docker/setup-qemu-action@v3
+
+    - name: Build - Set up Docker Buildx
+      uses: docker/setup-buildx-action@v3
+
+    - name: Publish - Login to Quay Container Registry
+      if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ vars.QUAY_USERNAME }}
+        password: ${{ secrets.QUAY_PASSWORD }}
+
+    - name: Build and push 310p
+      uses: docker/build-push-action@v6
+      with:
+        platforms: >-
+          ${{
+              github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+              'linux/amd64,linux/arm64' ||
+              'linux/arm64'
+          }}
+        # use the current repo path as the build context, ensure .git is contained
+        context: .
+        # only trigger when tag, branch/main push
+        push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+        labels: ${{ steps.meta.outputs.labels }}
+        tags: ${{ steps.meta.outputs.tags }}
+        file: Dockerfile.310p.openEuler
+        build-args: |
+          PIP_INDEX_URL=https://pypi.org/simple
+        provenance: false
--- a/.github.backup/workflows/image_310p_ubuntu.yml
+++ b/.github.backup/workflows/image_310p_ubuntu.yml
@@ -0,0 +1,131 @@
+name: 'image / Ubuntu / 310p'
+# This is a docker build check and publish job:
+# 1. PR Triggered docker image build check
+#   - is for image build check
+#   - Enable on main/*-dev branch
+#   - push: ${{ github.event_name != 'pull_request' }} ==> false
+# 2. branches push trigger image publish
+#   - is for branch/dev/nightly image
+#   - commits are merge into main/*-dev  ==> vllm-ascend:main-310p / vllm-ascend:*-dev-310p
+# 3. tags push trigger image publish
+#   - is for final release image
+#   - Publish when tag with v* (pep440 version)  ===>  vllm-ascend:v1.2.3-310p / vllm-ascend:v1.2.3rc1-310p
+on:
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/image_310p_ubuntu.yml'
+      - 'Dockerfile.310p'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+    types: [ labeled ]
+  push:
+    # Publish image when tagging, the Dockerfile in tag will be build as tag image
+    branches:
+      - 'main'
+      - '*-dev'
+    tags:
+      - 'v*'
+    paths:
+      - '.github/workflows/image_310p_ubuntu.yml'
+      - 'Dockerfile.310p'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+
+# only cancel in-progress runs of the same workflow
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+
+  build:
+    name: vllm-ascend image build
+    # Only arm64 build on openEuler arm64, only amd64 build on Ubuntu amd64
+    # Push event or PR with both 'ready' and 'ready-for-test' labels
+    runs-on: ubuntu-latest
+    if: ${{ github.event_name == 'push' || (contains(github.event.pull_request.labels.*.name, 'ready') && contains(github.event.pull_request.labels.*.name, 'ready-for-test')) }}
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        fetch-depth: 0
+        persist-credentials: false
+
+    - name: Print
+      run: |
+        lscpu
+
+    - name: Docker meta
+      id: meta
+      uses: docker/metadata-action@v5
+      with:
+        # TODO(yikun): add more hub image and a note on release policy for container image
+        images: |
+          quay.io/ascend/vllm-ascend
+        # Note for test case
+        # https://github.com/marketplace/actions/docker-metadata-action#typeref
+        # 1. branch job pulish per main/*-dev branch commits
+        # 2. main and dev pull_request is build only, so the tag pr-N is fine
+        # 3. only pep440 matched tag will be published:
+        #    - v0.7.1 --> v0.7.1-310p
+        #    - pre/post/dev: v0.7.1rc1-310p/v0.7.1rc1-310p/v0.7.1rc1.dev1-310p/v0.7.1.post1-310p, no latest
+        #      which follow the rule from vLLM with prefix v
+        # TODO(yikun): the post release might be considered as latest release
+        tags: |
+          type=ref,event=branch,suffix=-310p
+          type=ref,event=pr,suffix=-310p
+          type=pep440,pattern={{raw}},suffix=-310p
+        flavor:
+          latest=false
+
+    - name: Free up disk space
+      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+      with:
+        tool-cache: true
+        docker-images: false
+
+    - name: Build - Set up QEMU
+      uses: docker/setup-qemu-action@v3
+
+    - name: Build - Set up Docker Buildx
+      uses: docker/setup-buildx-action@v3
+
+    - name: Publish - Login to Quay Container Registry
+      if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ vars.QUAY_USERNAME }}
+        password: ${{ secrets.QUAY_PASSWORD }}
+
+    - name: Build and push 310p
+      uses: docker/build-push-action@v6
+      with:
+        platforms: >-
+          ${{
+              github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+              'linux/amd64,linux/arm64' ||
+              'linux/amd64'
+          }}
+        # use the current repo path as the build context, ensure .git is contained
+        context: .
+        file: Dockerfile.310p
+        # only trigger when tag, branch/main push
+        push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+        labels: ${{ steps.meta.outputs.labels }}
+        tags: ${{ steps.meta.outputs.tags }}
+        build-args: |
+          PIP_INDEX_URL=https://pypi.org/simple
+        provenance: false
--- a/.github.backup/workflows/image_a3_openeuler.yml
+++ b/.github.backup/workflows/image_a3_openeuler.yml
@@ -0,0 +1,135 @@
+name: 'image / openEuler / a3'
+# This is a docker build check and publish job:
+# 1. PR Triggered docker image build check
+#   - is for image build check
+#   - Enable on main/*-dev branch
+#   - push: ${{ github.event_name != 'pull_request' }} ==> false
+# 2. branches push trigger image publish
+#   - is for branch/dev/nightly image
+#   - commits are merge into main/*-dev  ==> vllm-ascend:main / vllm-ascend:*-dev
+# 3. tags push trigger image publish
+#   - is for final release image
+#   - Publish when tag with v* (pep440 version)  ===>  vllm-ascend:v1.2.3-a3-openeuler / vllm-ascend:v1.2.3rc1-a3-openeuler
+on:
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/image_a3_openeuler.yml'
+      - 'Dockerfile.a3.openEuler'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+    types: [ labeled ]
+  push:
+    # Publish image when tagging, the Dockerfile in tag will be build as tag image
+    branches:
+      - 'main'
+      - '*-dev'
+    tags:
+      - 'v*'
+    paths:
+      - '.github/workflows/image_a3_openeuler.yml'
+      - 'Dockerfile.a3.openEuler'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+
+# only cancel in-progress runs of the same workflow
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  build:
+    name: vllm-ascend image build
+    # Only arm64 build on openEuler arm64, only amd64 build on Ubuntu amd64
+    # Push event or PR with both 'ready' and 'ready-for-test' labels
+    runs-on: >-
+      ${{
+          github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+          'ubuntu-latest' ||
+          'ubuntu-24.04-arm'
+      }}
+    if: ${{ github.event_name == 'push' || (contains(github.event.pull_request.labels.*.name, 'ready') && contains(github.event.pull_request.labels.*.name, 'ready-for-test')) }}
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        fetch-depth: 0
+        persist-credentials: false
+
+    - name: Print
+      run: |
+        lscpu
+    - name: Docker meta
+      id: meta
+      uses: docker/metadata-action@v5
+      with:
+        # TODO(yikun): add more hub image and a note on release policy for container image
+        images: |
+          quay.io/ascend/vllm-ascend
+        # Note for test case
+        # https://github.com/marketplace/actions/docker-metadata-action#typeref
+        # 1. branch job pulish per main/*-dev branch commits
+        # 2. main and dev pull_request is build only, so the tag pr-N-a3-openeuler is fine
+        # 3. only pep440 matched tag will be published:
+        #    - v0.7.1 --> v0.7.1-a3-openeuler
+        #    - pre/post/dev: v0.7.1rc1-a3-openeuler/v0.7.1rc1-a3-openeuler/v0.7.1rc1.dev1-a3-openeuler/v0.7.1.post1-a3-openeuler, no latest
+        #      which follow the rule from vLLM with prefix v
+        # TODO(yikun): the post release might be considered as latest release
+        tags: |
+          type=ref,event=branch,suffix=-a3-openeuler
+          type=ref,event=pr,suffix=-a3-openeuler
+          type=pep440,pattern={{raw}},suffix=-a3-openeuler
+        flavor:
+          latest=false
+
+    - name: Free up disk space
+      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+      with:
+        tool-cache: true
+        docker-images: false
+
+    - name: Build - Set up QEMU
+      uses: docker/setup-qemu-action@v3
+
+    - name: Build - Set up Docker Buildx
+      uses: docker/setup-buildx-action@v3
+
+    - name: Publish - Login to Quay Container Registry
+      if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ vars.QUAY_USERNAME }}
+        password: ${{ secrets.QUAY_PASSWORD }}
+
+    - name: Build and push a3
+      uses: docker/build-push-action@v6
+      with:
+        platforms: >-
+          ${{
+              github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+              'linux/amd64,linux/arm64' ||
+              'linux/arm64'
+          }}
+        # use the current repo path as the build context, ensure .git is contained
+        context: .
+        # only trigger when tag, branch/main push
+        push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+        labels: ${{ steps.meta.outputs.labels }}
+        tags: ${{ steps.meta.outputs.tags }}
+        file: Dockerfile.a3.openEuler
+        build-args: |
+          PIP_INDEX_URL=https://pypi.org/simple
+        provenance: false
+
--- a/.github.backup/workflows/image_a3_ubuntu.yml
+++ b/.github.backup/workflows/image_a3_ubuntu.yml
@@ -0,0 +1,131 @@
+name: 'image / Ubuntu / a3'
+# This is a docker build check and publish job:
+# 1. PR Triggered docker image build check
+#   - is for image build check
+#   - Enable on main/*-dev branch
+#   - push: ${{ github.event_name != 'pull_request' }} ==> false
+# 2. branches push trigger image publish
+#   - is for branch/dev/nightly image
+#   - commits are merge into main/*-dev  ==> vllm-ascend:main / vllm-ascend:*-dev
+# 3. tags push trigger image publish
+#   - is for final release image
+#   - Publish when tag with v* (pep440 version)  ===>  vllm-ascend:v1.2.3-a3|vllm-ascend:v1.2.3rc1-a3
+on:
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/image_a3_ubuntu.yml'
+      - 'Dockerfile.a3'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+    types: [ labeled ]
+  push:
+    # Publish image when tagging, the Dockerfile in tag will be build as tag image
+    branches:
+      - 'main'
+      - '*-dev'
+    tags:
+      - 'v*'
+    paths:
+      - '.github/workflows/image_a3_ubuntu.yml'
+      - 'Dockerfile.a3'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+
+# only cancel in-progress runs of the same workflow
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+
+  build:
+    name: vllm-ascend image build
+    # Only arm64 build on openEuler arm64, only amd64 build on Ubuntu amd64
+    # Push event or PR with both 'ready' and 'ready-for-test' labels
+    runs-on: ubuntu-latest
+    if: ${{ github.event_name == 'push' || (contains(github.event.pull_request.labels.*.name, 'ready') && contains(github.event.pull_request.labels.*.name, 'ready-for-test')) }}
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        fetch-depth: 0
+        persist-credentials: false
+
+    - name: Print
+      run: |
+        lscpu
+    - name: Docker meta
+      id: meta
+      uses: docker/metadata-action@v5
+      with:
+        # TODO(yikun): add more hub image and a note on release policy for container image
+        images: |
+          quay.io/ascend/vllm-ascend
+        # Note for test case
+        # https://github.com/marketplace/actions/docker-metadata-action#typeref
+        # 1. branch job pulish per main/*-dev branch commits
+        # 2. main and dev pull_request is build only, so the tag pr-N-a3 is fine
+        # 3. only pep440 matched tag will be published:
+        #    - v0.7.1 --> v0.7.1-a3
+        #    - pre/post/dev: v0.7.1rc1-a3/v0.7.1rc1-a3/v0.7.1rc1.dev1-a3/v0.7.1.post1-a3, no latest
+        #      which follow the rule from vLLM with prefix v
+        # TODO(yikun): the post release might be considered as latest release
+        tags: |
+          type=ref,event=branch,suffix=-a3
+          type=ref,event=pr,suffix=-a3
+          type=pep440,pattern={{raw}},suffix=-a3
+        flavor:
+          latest=false
+
+    - name: Free up disk space
+      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+      with:
+        tool-cache: true
+        docker-images: false
+
+    - name: Build - Set up QEMU
+      uses: docker/setup-qemu-action@v3
+
+    - name: Build - Set up Docker Buildx
+      uses: docker/setup-buildx-action@v3
+
+    - name: Publish - Login to Quay Container Registry
+      if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ vars.QUAY_USERNAME }}
+        password: ${{ secrets.QUAY_PASSWORD }}
+
+    - name: Build and push a3
+      uses: docker/build-push-action@v6
+      with:
+        platforms: >-
+          ${{
+              github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+              'linux/amd64,linux/arm64' ||
+              'linux/amd64'
+          }}
+        # use the current repo path as the build context, ensure .git is contained
+        context: .
+        file: Dockerfile.a3
+        # only trigger when tag, branch/main push
+        push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+        labels: ${{ steps.meta.outputs.labels }}
+        tags: ${{ steps.meta.outputs.tags }}
+        build-args: |
+          PIP_INDEX_URL=https://pypi.org/simple
+        provenance: false
+
--- a/.github.backup/workflows/image_openeuler.yml
+++ b/.github.backup/workflows/image_openeuler.yml
@@ -0,0 +1,134 @@
+name: 'image / openEuler'
+# This is a docker build check and publish job:
+# 1. PR Triggered docker image build check
+#   - is for image build check
+#   - Enable on main/*-dev branch
+#   - push: ${{ github.event_name != 'pull_request' }} ==> false
+# 2. branches push trigger image publish
+#   - is for branch/dev/nightly image
+#   - commits are merge into main/*-dev  ==> vllm-ascend:main-openeuler / vllm-ascend:*-dev-openeuler
+#   - is for final release image
+#   - Publish when tag with v* (pep440 version)  ===>  vllm-ascend:v1.2.3-openeuler / vllm-ascend:v1.2.3rc1-openeuler
+on:
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/image_openeuler.yml'
+      - 'Dockerfile.openEuler'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+    types: [ labeled ]
+  push:
+    # Publish image when tagging, the Dockerfile in tag will be build as tag image
+    branches:
+      - 'main'
+      - '*-dev'
+    tags:
+      - 'v*'
+    paths:
+      - '.github/workflows/image_openeuler.yml'
+      - 'Dockerfile.openEuler'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+
+# only cancel in-progress runs of the same workflow
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  build:
+    name: vllm-ascend image build
+    # Only arm64 build on openEuler arm64, only amd64 build on Ubuntu amd64
+    # Push event or PR with both 'ready' and 'ready-for-test' labels
+    runs-on: >-
+      ${{
+          github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+          'ubuntu-latest' ||
+          'ubuntu-24.04-arm'
+      }}
+    if: ${{ github.event_name == 'push' || (contains(github.event.pull_request.labels.*.name, 'ready') && contains(github.event.pull_request.labels.*.name, 'ready-for-test')) }}
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        fetch-depth: 0
+        persist-credentials: false
+
+    - name: Print
+      run: |
+        lscpu
+
+    - name: Docker meta
+      id: meta
+      uses: docker/metadata-action@v5
+      with:
+        # TODO(yikun): add more hub image and a note on release policy for container image
+        images: |
+          quay.io/ascend/vllm-ascend
+        # Note for test case
+        # https://github.com/marketplace/actions/docker-metadata-action#typeref
+        # 1. branch job pulish per main/*-dev branch commits
+        # 2. main and dev pull_request is build only, so the tag pr-N-openeuler is fine
+        # 3. only pep440 matched tag will be published:
+        #    - v0.7.1 --> v0.7.1-openeuler
+        #    - pre/post/dev: v0.7.1rc1-openeuler/v0.7.1rc1-openeuler/v0.7.1rc1.dev1-openeuler/v0.7.1.post1-openeuler, no latest
+        #      which follow the rule from vLLM with prefix v
+        # TODO(yikun): the post release might be considered as latest release
+        tags: |
+          type=ref,event=branch,suffix=-openeuler
+          type=ref,event=pr,suffix=-openeuler
+          type=pep440,pattern={{raw}},suffix=-openeuler
+        flavor:
+          latest=true
+
+    - name: Free up disk space
+      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+      with:
+        tool-cache: true
+        docker-images: false
+
+    - name: Build - Set up QEMU
+      uses: docker/setup-qemu-action@v3
+
+    - name: Build - Set up Docker Buildx
+      uses: docker/setup-buildx-action@v3
+
+    - name: Publish - Login to Quay Container Registry
+      if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ vars.QUAY_USERNAME }}
+        password: ${{ secrets.QUAY_PASSWORD }}
+
+    - name: Build and push 910b
+      uses: docker/build-push-action@v6
+      with:
+        platforms: >-
+          ${{
+              github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+              'linux/amd64,linux/arm64' ||
+              'linux/arm64'
+          }}
+        # use the current repo path as the build context, ensure .git is contained
+        context: .
+        # only trigger when tag, branch/main push
+        push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+        labels: ${{ steps.meta.outputs.labels }}
+        tags: ${{ steps.meta.outputs.tags }}
+        file: Dockerfile.openEuler
+        build-args: |
+          PIP_INDEX_URL=https://pypi.org/simple
+        provenance: false
--- a/.github.backup/workflows/image_ubuntu.yml
+++ b/.github.backup/workflows/image_ubuntu.yml
@@ -0,0 +1,131 @@
+name: 'image / Ubuntu'
+# This is a docker build check and publish job:
+# 1. PR Triggered docker image build check
+#   - is for image build check
+#   - Enable on main/*-dev branch
+#   - push: ${{ github.event_name != 'pull_request' }} ==> false
+# 2. branches push trigger image publish
+#   - is for branch/dev/nightly image
+#   - commits are merge into main/*-dev  ==> vllm-ascend:main / vllm-ascend:*-dev
+# 3. tags push trigger image publish
+#   - is for final release image
+#   - Publish when tag with v* (pep440 version)  ===>  vllm-ascend:v1.2.3 / vllm-ascend:v1.2.3rc1
+on:
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/image_ubuntu.yml'
+      - 'Dockerfile'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+    types: [ labeled ]
+  push:
+    # Publish image when tagging, the Dockerfile in tag will be build as tag image
+    branches:
+      - 'main'
+      - '*-dev'
+    tags:
+      - 'v*'
+    paths:
+      - '.github/workflows/image_ubuntu.yml'
+      - 'Dockerfile'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+
+# only cancel in-progress runs of the same workflow
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+
+  build:
+    name: vllm-ascend image build
+    # Only arm64 build on openEuler arm64, only amd64 build on Ubuntu amd64
+    # Push event or PR with both 'ready' and 'ready-for-test' labels
+    runs-on: ubuntu-latest
+    if: ${{ github.event_name == 'push' || (contains(github.event.pull_request.labels.*.name, 'ready') && contains(github.event.pull_request.labels.*.name, 'ready-for-test')) }}
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        fetch-depth: 0
+        persist-credentials: false
+
+    - name: Print
+      run: |
+        lscpu
+
+    - name: Docker meta
+      id: meta
+      uses: docker/metadata-action@v5
+      with:
+        # TODO(yikun): add more hub image and a note on release policy for container image
+        images: |
+          quay.io/ascend/vllm-ascend
+        # Note for test case
+        # https://github.com/marketplace/actions/docker-metadata-action#typeref
+        # 1. branch job pulish per main/*-dev branch commits
+        # 2. main and dev pull_request is build only, so the tag pr-N is fine
+        # 3. only pep440 matched tag will be published:
+        #    - v0.7.1 --> v0.7.1, latest
+        #    - pre/post/dev: v0.7.1rc1/v0.7.1rc1/v0.7.1rc1.dev1/v0.7.1.post1, no latest
+        #      which follow the rule from vLLM with prefix v
+        # TODO(yikun): the post release might be considered as latest release
+        tags: |
+            type=ref,event=branch
+            type=ref,event=pr
+            type=pep440,pattern={{raw}}
+        flavor:
+          latest=true
+
+    - name: Free up disk space
+      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+      with:
+        tool-cache: true
+        docker-images: false
+
+    - name: Build - Set up QEMU
+      uses: docker/setup-qemu-action@v3
+
+    - name: Build - Set up Docker Buildx
+      uses: docker/setup-buildx-action@v3
+
+    - name: Publish - Login to Quay Container Registry
+      if: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ vars.QUAY_USERNAME }}
+        password: ${{ secrets.QUAY_PASSWORD }}
+
+    - name: Build and push 910b
+      uses: docker/build-push-action@v6
+      with:
+        platforms: >-
+          ${{
+              github.event_name == 'push' && github.repository_owner == 'vllm-project' &&
+              'linux/amd64,linux/arm64' ||
+              'linux/amd64'
+          }}
+        # use the current repo path as the build context, ensure .git is contained
+        context: .
+        file: Dockerfile
+        # only trigger when tag, branch/main push
+        push: ${{ github.event_name == 'push' && github.repository_owner == 'vllm-project' }}
+        labels: ${{ steps.meta.outputs.labels }}
+        tags: ${{ steps.meta.outputs.tags }}
+        build-args: |
+          PIP_INDEX_URL=https://pypi.org/simple
+        provenance: false
--- a/.github.backup/workflows/label_merge_conflict.yml
+++ b/.github.backup/workflows/label_merge_conflict.yml
@@ -1,4 +1,4 @@
-name: Merge Conflict Labeler
+name: "Merge Conflict Labeler"
 on:
  # So that PRs touching the same files as the push are updated
  push:
--- a/.github.backup/workflows/labeler.yml
+++ b/.github.backup/workflows/labeler.yml
@@ -0,0 +1,18 @@
+name: Pull Request Labeler
+
+on: pull_request_target
+
+jobs:
+  label:
+    name: Label
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
+    steps:
+      - name: Label the PR
+        uses: actions/labeler@v6
+        with:
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
+          configuration-path: .github/labeler.yml
+          sync-labels: true
--- a/.github.backup/workflows/matchers/actionlint.json
+++ b/.github.backup/workflows/matchers/actionlint.json
--- a/.github.backup/workflows/matchers/mypy.json
+++ b/.github.backup/workflows/matchers/mypy.json
--- a/.github.backup/workflows/matchers/ruff.json
+++ b/.github.backup/workflows/matchers/ruff.json
@@ -0,0 +1,17 @@
+{
+    "problemMatcher": [
+      {
+        "owner": "ruff",
+        "pattern": [
+          {
+            "regexp": "^(.+?):(\\d+):(\\d+): (\\w+): (.+)$",
+            "file": 1,
+            "line": 2,
+            "column": 3,
+            "code": 4,
+            "message": 5
+          }
+        ]
+      }
+    ]
+  }
--- a/.github.backup/workflows/multi_node_test.yaml
+++ b/.github.backup/workflows/multi_node_test.yaml
@@ -0,0 +1,118 @@
+name: 'e2e test / multi-dp'
+
+on:
+    schedule:
+      - cron: "0 */4 * * *"
+    workflow_dispatch:
+
+# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
+# declared as "shell: bash -el {0}" on steps that need to be properly activated.
+# It's used to activate ascend-toolkit environment variables.
+defaults:
+  run:
+    shell: bash -el {0}
+
+# only cancel in-progress runs of the same workflow
+# and ignore the lint / 8 cards test type
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  e2e:
+    # This is a runner with no NPU for k8s controller
+    runs-on: linux-aarch64-a3-0
+    container:
+      image: m.daocloud.io/quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11
+      env:
+        KUBECONFIG: /tmp/kubeconfig
+        KUBECTL: /root/.cache/.kube/kubectl
+        NAMESPACE: vllm-project
+        LEADER_POD: vllm-0
+    steps:
+        - name: Install system denpendencies
+          run: |
+           # configure apt and pip source
+           sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
+           pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
+
+           apt-get update -y && apt-get install -y git curl
+
+           TOKEN=`echo -n "x-access-token:${{ secrets.ADMIN_PTA }}" | base64`
+           git config --global http.https://gh-proxy.test.osinfra.cn/.extraheader "AUTHORIZATION: basic $TOKEN"
+
+        - name: Install kubectl
+          run: |
+            install -o root -g root -m 0755 $KUBECTL /usr/local/bin/kubectl
+
+            # get kubeconfig from secret
+            echo "${{ secrets.KUBECONFIG_B64 }}" | base64 -d > $KUBECONFIG
+
+        - name: Checkout code
+          uses: actions/checkout@v4
+
+        - name: Prepare scripts
+          run: |
+            # prepare for lws entrypoint scripts
+            install -D tests/e2e/multi_node/scripts/run.sh /root/.cache/tests/run.sh
+
+        - name: Launch cluster
+          run: |
+            kubectl apply -f tests/e2e/multi_node/scripts/lws.yaml
+          
+        - name: Waiting for pod ready
+          run: |
+            echo "waiting for Pod [$LEADER_POD] in namespace [$NAMESPACE] to Ready..."
+
+            while true; do
+              # get pod status
+              READY_STATUS=$(kubectl get pod "$LEADER_POD" -n "$NAMESPACE" -o jsonpath='{.status.containerStatuses[*].ready}')
+
+              if [[ "$READY_STATUS" == "true" ]]; then
+                echo "✅ Pod [$LEADER_POD] is Ready!"
+                break
+              else
+                echo "Pod [$LEADER_POD] not ready, waiting..."
+                sleep 3
+              fi
+            done
+
+        - name: Stream logs and monitor pod health
+          run: |
+            set -euo pipefail
+
+            echo "🚀 Start streaming logs for Pod [$LEADER_POD] ..."
+            kubectl logs -f "$LEADER_POD" -n "$NAMESPACE" &
+            LOG_PID=$!
+
+            echo "Start monitoring Pod [$LEADER_POD] status ..."
+            while true; do
+              STATUS=$(kubectl get pod "$LEADER_POD" -n "$NAMESPACE" -o jsonpath='{.status.phase}')
+              if [[ "$STATUS" != "Running" && "$STATUS" != "Succeeded" ]]; then
+                echo "❌ Pod [$LEADER_POD] exited abnormally with status: $STATUS"
+                kubectl describe pod "$LEADER_POD" -n "$NAMESPACE" || true
+                kubectl logs "$LEADER_POD" -n "$NAMESPACE" --previous --all-containers || true
+                kill $LOG_PID || true
+                exit 1
+              fi
+              sleep 5
+            done &
+
+            MONITOR_PID=$!
+            wait $LOG_PID || true
+            kill $MONITOR_PID || true
+
+        - name: Generate summary
+          if: always()
+          run: |
+            if [ -f "/root/.cache/test_summary.md" ]; then
+              cat /root/.cache/test_summary.md >> "$GITHUB_STEP_SUMMARY"
+            else
+              echo "No summary file found." >> "$GITHUB_STEP_SUMMARY"
+            fi
+
+        - name: Post process
+          if: always()
+          run: |
+            kubectl get pods -n $NAMESPACE
+            kubectl delete -f tests/e2e/multi_node/scripts/lws.yaml
--- a/.github.backup/workflows/nightly_benchmarks.yaml
+++ b/.github.backup/workflows/nightly_benchmarks.yaml
@@ -15,7 +15,7 @@
 # limitations under the License.
 #

-name: Performance Schedule Test
+name: 'ascend test / performance'
 # This workflow runs nightly benchmarks for vllm-ascend.

 on:
@@ -46,16 +46,17 @@ jobs:
  test:
    if: ${{ contains(github.event.pull_request.labels.*.name, 'performance-test') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}

-    name: Benchmarks/vLLM=${{ matrix.vllm_branch }}, vLLM-Ascend=${{ matrix.vllm_ascend_branch }}
+    name: Benchmarks/vLLM=${{ matrix.vllm_branch }}, vLLM-Ascend=${{ matrix.vllm_ascend_branch }}, use_v1=${{ matrix.vllm_use_v1 }}
    runs-on: 'linux-arm64-npu-static-8'
    strategy:
      matrix:
        include:
-          - vllm_branch: v0.18.0
+          - vllm_branch: v0.11.0
            vllm_ascend_branch: main
+            vllm_use_v1: 1
      max-parallel: 1
    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-910b-ubuntu22.04-py3.11
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-910b-ubuntu22.04-py3.11
      volumes:
        - /usr/local/dcmi:/usr/local/dcmi
        - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
@@ -72,6 +73,7 @@ jobs:
        VLLM_USE_MODELSCOPE: True
        ES_OM_DOMAIN: ${{ secrets.ES_OM_DOMAIN }}
        ES_OM_AUTHORIZATION: ${{ secrets.ES_OM_AUTHORIZATION }}
+        VLLM_USE_V1: ${{ matrix.vllm_use_v1 }}
    steps:
      - name: Check npu and CANN info
        run: |
@@ -95,12 +97,12 @@ jobs:
          git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/

      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          repository: vllm-project/vllm
          path: ./vllm-empty
@@ -130,11 +132,11 @@ jobs:
      - name: Generate step summary
        if: github.event_name != 'schedule' && github.event_name != 'workflow_dispatch'
        run: |
-          cat ./benchmarks/results/benchmark_results.md >> "$GITHUB_STEP_SUMMARY"
+          cat ./benchmarks/results/benchmark_results.md >> $GITHUB_STEP_SUMMARY

      - name: Upload benchmark artifacts
        if: github.event_name != 'schedule' && github.event_name != 'workflow_dispatch'
-        uses: actions/upload-artifact@v7
+        uses: actions/upload-artifact@v4
        with:
          name: "benchmark-performance-${{ matrix.vllm_branch }}-${{ matrix.vllm_ascend_branch }}-report"
          path: ./benchmarks/results/benchmark_results.md
@@ -172,9 +174,9 @@ jobs:
            commit_id=${line%% *}
            commit_title=${line#* }

-            git checkout "$commit_id"
-            commit_time=$(git show -s --format=%cd "$commit_id" --date=iso-strict)
-            commit_time_no_tz="${commit_time::19}"
+            git checkout $commit_id
+            commit_time=$(git show -s --format=%cd $commit_hash --date=iso-strict)
+            commit_time_no_tz=${commit_time::19}
            pip install -e .

            echo "------------------------"
@@ -191,13 +193,14 @@ jobs:
              ERROR_MSG="Benchmark failed to run"
            fi
            # send the result to es
-            escli add --vllm_branch "${{ matrix.vllm_branch }}" \
-            --vllm_ascend_branch "${{ matrix.vllm_ascend_branch }}" \
-            --commit_id "$commit_id" \
+            escli add --vllm_branch ${{ matrix.vllm_branch }} \
+            --vllm_ascend_branch ${{ matrix.vllm_ascend_branch }} \
+            --commit_id $commit_id \
            --commit_title "$commit_title" \
            --created_at "$commit_time_no_tz" \
            --res_dir ./benchmarks/results \
            --error "$ERROR_MSG" \
+            --extra_feat '{"VLLM_USE_V1": "${{ matrix.vllm_use_v1 }}"}'
            rm -rf ./benchmarks/results
            cd -
          done < commit_log.txt
--- a/.github.backup/workflows/pre-commit.yml
+++ b/.github.backup/workflows/pre-commit.yml
@@ -0,0 +1,43 @@
+name: pre-commit
+
+on:
+    workflow_call:
+      inputs:
+        vllm:
+          required: true
+          type: string
+
+permissions:
+  contents: read
+
+jobs:
+  pre-commit:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout vllm-project/vllm-ascend repo
+      uses: actions/checkout@v4
+    - uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
+      with:
+        python-version: "3.11"
+    - run: echo "::add-matcher::.github/workflows/matchers/actionlint.json"
+    - run: echo "::add-matcher::.github/workflows/matchers/mypy.json"
+    - name: Checkout vllm-project/vllm repo
+      uses: actions/checkout@v4
+      with:
+        repository: vllm-project/vllm
+        path: ./vllm-empty
+        ref: ${{ inputs.vllm }}
+    - name: Install vllm
+      working-directory: vllm-empty
+      run: |
+        pip install -r requirements/build.txt --extra-index-url https://download.pytorch.org/whl/cpu
+        VLLM_TARGET_DEVICE=empty pip install .
+    - name: Install vllm-ascend dev
+      run: |
+        pip install -r requirements-dev.txt --extra-index-url https://download.pytorch.org/whl/cpu
+    - uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1
+      env:
+        SHELLCHECK_OPTS: "--exclude=SC2046,SC2006,SC2086" # Exclude SC2046, SC2006, SC2086 for actionlint
+      with:
+        extra_args: --all-files --hook-stage manual
+
--- a/.github.backup/workflows/release_code.yml
+++ b/.github.backup/workflows/release_code.yml
@@ -0,0 +1,75 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# This file is a part of the vllm-ascend project.
+#
+
+name: build / sdist
+
+on:
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/release_code.yml'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+  push:
+    tags:
+      - 'v*'
+
+jobs:
+  build:
+    name: release code
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11"]
+    steps:
+      - uses: actions/checkout@ff7abcd0c3c05ccf6adc123a8cd1fd4fb30fb493 # v4.2.2
+
+      - name: Print
+        run: |
+          lscpu
+      
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install dependencies
+        run: |
+          python3 -m pip install twine setuptools_scm
+
+      - name: Generate tar.gz
+        run: |
+          python3 setup.py sdist
+          ls dist
+
+      - name: Archive tar.gz
+        uses: actions/upload-artifact@v4
+        with:
+          name: vllm-ascend-src
+          path: dist/*
+
+      - name: Release
+        if: startsWith(github.ref, 'refs/tags/')
+        run: |
+          python3 -m twine upload dist/* -u __token__ -p ${{ secrets.PYPI_TOKEN }}
--- a/.github.backup/workflows/release_whl.yml
+++ b/.github.backup/workflows/release_whl.yml
@@ -0,0 +1,125 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# This file is a part of the vllm-ascend project.
+#
+
+name: build / wheel
+
+on:
+  schedule:
+    # Runs at 23:00 UTC (7:00 AM Beijing) every day
+    - cron: '0 23 * * *'
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/release_whl.yml'
+      - '.github/Dockerfile.buildwheel'
+      - 'vllm_ascend/**'
+      - 'setup.py'
+      - 'pyproject.toml'
+      - 'requirements.txt'
+      - 'cmake/**'
+      - 'CMakeLists.txt'
+      - 'csrc/**'
+  push:
+    tags:
+      - 'v*'
+
+jobs:
+  build:
+    name: build and release wheel
+    strategy:
+      matrix:
+        os: [ubuntu-24.04, ubuntu-24.04-arm]
+        # PR only trigger latest version
+        python-version: ${{ fromJSON(
+          (github.event_name == 'pull_request' && '["3.11"]') ||
+          '["3.9", "3.10", "3.11"]'
+         ) }}
+    runs-on: ${{ matrix.os }}
+    steps:
+    - uses: actions/checkout@ff7abcd0c3c05ccf6adc123a8cd1fd4fb30fb493 # v4.2.2
+
+    - name: Print
+      run: |
+        lscpu
+
+    - name: Free up disk space
+      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+      with:
+        tool-cache: true
+        docker-images: false
+
+    - name: Build wheel
+      run: |
+        ls
+        docker build -f ./.github/Dockerfile.buildwheel \
+        --build-arg PY_VERSION=${{ matrix.python-version }} \
+        -t wheel:v1 .
+        docker run --rm \
+        -u $(id -u):$(id -g) \
+        -v $(pwd):/outpwd \
+        wheel:v1 \
+        bash -c "cp -r /workspace/vllm-ascend/dist /outpwd"
+        ls dist
+
+    - name: Set up Python ${{ matrix.python-version }}
+      if: startsWith(github.ref, 'refs/tags/')
+      uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
+      with:
+        python-version: ${{ matrix.python-version }}
+      
+    - name: Repair wheels with auditwheel
+      run: |
+        python3 -m pip install auditwheel
+        python3 -m pip install patchelf
+        mkdir -p dist/repaired
+        for whl in dist/*.whl; do
+          auditwheel repair "$whl" -w dist/repaired/ \
+          --exclude libplatform.so \
+          --exclude libregister.so \
+          --exclude libge_common_base.so \
+          --exclude libc10.so \
+          --exclude libc_sec.so \
+          --exclude "libascend*.so" \
+          --exclude "libtorch*.so" \
+          --exclude "liberror_manager.so"
+        done
+        rm -f dist/*.whl
+        mv dist/repaired/*.whl dist/
+        rmdir dist/repaired
+        ls dist
+
+    - name: Verify automatic platform tags
+      run: |
+        cd dist
+        for wheel in *.whl; do
+          echo "verification file: $wheel"
+          auditwheel show "$wheel"
+        done
+
+    - name: Archive wheel
+      uses: actions/upload-artifact@v4
+      with:
+        name: vllm-ascend-${{ matrix.os }}-py${{ matrix.python-version }}-wheel
+        path: dist/*
+
+    - name: Release
+      if: startsWith(github.ref, 'refs/tags/')
+      run: |
+        python3 -m pip install twine
+        python3 -m twine upload --verbose dist/* -u __token__ -p ${{ secrets.PYPI_TOKEN }}
--- a/.github.backup/workflows/reminder_comment.yml
+++ b/.github.backup/workflows/reminder_comment.yml
@@ -0,0 +1,26 @@
+name: PR Reminder Comment Bot
+permissions:
+  pull-requests: write
+on:
+  pull_request_target:
+    types: [opened]
+jobs:
+  pr_reminder:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Remind to run full CI on PR
+        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
+        with:
+          script: |
+            github.rest.issues.createComment({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              issue_number: context.issue.number,
+              body: '👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌\n\n' +
+                '- A PR should do only one thing, smaller PRs enable faster reviews.\n' +
+                '- Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.\n' +
+                '- Write the commit message by fulfilling the PR description to help reviewer and future developers understand.\n\n' +
+                'If CI fails, you can run linting and testing checks locally according [Contributing](https://vllm-ascend.readthedocs.io/zh-cn/latest/developer_guide/contribution/index.html) and [Testing](https://vllm-ascend.readthedocs.io/zh-cn/latest/developer_guide/contribution/testing.html).'
+            })
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.github.backup/workflows/vllm_ascend_dist.yaml
+++ b/.github.backup/workflows/vllm_ascend_dist.yaml
@@ -0,0 +1,100 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# This file is a part of the vllm-ascend project.
+#
+
+name: 'e2e test / a3-test'
+
+on:
+  workflow_call:
+
+  pull_request:
+    types: [ labeled ]
+
+# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
+# declared as "shell: bash -el {0}" on steps that need to be properly activated.
+# It's used to activate ascend-toolkit environment variables.
+defaults:
+  run:
+    shell: bash -el {0}
+
+# only cancel in-progress runs of the same workflow
+# and ignore the lint / 8 cards test type
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  e2e:
+    # only trigger e2e test after lint passed and the change is e2e related with pull request.
+    if: ${{ contains(github.event.pull_request.labels.*.name, 'dist-test') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') || github.event_name == 'workflow_dispatch' }}
+    strategy:
+      matrix:
+        os: [linux-aarch64-a3-8]
+        vllm_version: [v0.11.0]
+    name: vLLM Ascend test
+    runs-on: ${{ matrix.os }}
+    container:
+      image: m.daocloud.io/quay.io/ascend/cann:8.3.rc2-a3-ubuntu22.04-py3.11
+      env:
+        DEBIAN_FRONTEND: noninteractive
+    steps:
+      - name: Check npu and CANN info
+        run: |
+          npu-smi info
+          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
+
+      - name: Config mirrors
+        run: |
+          sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
+          pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
+          apt-get update -y
+          apt install git -y
+          git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/
+
+      - name: Checkout vllm-project/vllm-ascend repo
+        uses: actions/checkout@v4
+
+      - name: Install system dependencies
+        run: |
+          apt-get -y install `cat packages.txt`
+          apt-get -y install gcc g++ cmake libnuma-dev
+
+      - name: Checkout vllm-project/vllm repo
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-project/vllm
+          ref: ${{ matrix.vllm_version }}
+          path: ./vllm-empty
+
+      - name: Install vllm-project/vllm from source
+        working-directory: ./vllm-empty
+        run: |
+          VLLM_TARGET_DEVICE=empty pip install -e .
+
+      - name: Install vllm-project/vllm-ascend
+        run: |
+          export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi
+          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/devlib
+          pip install -r requirements-dev.txt
+          pip install -v -e .
+
+      - name: Run vllm-project/vllm-ascend test for V1 Engine
+        env:
+          VLLM_WORKER_MULTIPROC_METHOD: spawn
+          VLLM_USE_MODELSCOPE: True
+        run: |
+          # TODO: enable more tests
+          pytest -sv tests/e2e/multicard/test_offline_inference_distributed.py::test_models_distributed_DeepSeek_multistream_moe
--- a/.github.backup/workflows/vllm_ascend_doctest.yaml
+++ b/.github.backup/workflows/vllm_ascend_doctest.yaml
@@ -15,7 +15,7 @@
 # This file is a part of the vllm-ascend project.
 #

-name: Doc Test
+name: 'ascend test / doctest'

 on:
  workflow_dispatch:
@@ -23,13 +23,15 @@ on:
    branches:
      - 'main'
      - '*-dev'
-      - 'releases/v*'
    paths:
      # If we are changing the doctest we should do a PR test
-      - '.github/workflows/labled_doctest.yaml'
+      - '.github/workflows/vllm_ascend_doctest.yaml'
      - 'tests/e2e/doctests/**'
      - 'tests/e2e/common.sh'
      - 'tests/e2e/run_doctests.sh'
+  schedule:
+    # Runs every 12 hours
+    - cron:  '0 */12 * * *'

 # Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
 # declared as "shell: bash -el {0}" on steps that need to be properly activated.
@@ -44,11 +46,11 @@ jobs:
      # Each version should be tested
      fail-fast: false
      matrix:
-        vllm_version: [releases-v0.13.0, releases-v0.13.0-openeuler, main, main-openeuler]
+        vllm_verison: [v0.9.1-dev, v0.9.1-dev-openeuler, main, main-openeuler]
    name: vLLM Ascend test
-    runs-on: linux-aarch64-a2b3-1
+    runs-on: linux-aarch64-a2-1
    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:${{ matrix.vllm_version }}
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:${{ matrix.vllm_verison }}
    steps:
      - name: Check NPU/CANN and git info
        run: |
@@ -64,7 +66,7 @@ jobs:
          git --no-pager log -1 || true

      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4

      - name: Run vllm-ascend/tests/e2e/run_doctests.sh
        run: |
--- a/.github.backup/workflows/vllm_ascend_test.yaml
+++ b/.github.backup/workflows/vllm_ascend_test.yaml
@@ -15,15 +15,16 @@
 # This file is a part of the vllm-ascend project.
 #

-name: E2E-Light
+name: 'ascend test'

 on:
+  push:
+    branches:
+      - 'main'
  pull_request:
    branches:
      - 'main'
      - '*-dev'
-      - 'releases/v*'
-
 # Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
 # declared as "shell: bash -el {0}" on steps that need to be properly activated.
 # It's used to activate ascend-toolkit environment variables.
@@ -39,29 +40,23 @@ concurrency:

 jobs:
  lint:
-    uses: ./.github/workflows/_pre_commit.yml
+    uses: ./.github/workflows/pre-commit.yml
    with:
-      vllm: v0.18.0
+      vllm: v0.11.0
+
  changes:
-    runs-on: linux-aarch64-a2b3-0
+    runs-on: ubuntu-latest
    outputs:
      e2e_tracker: ${{ steps.filter.outputs.e2e_tracker }}
      ut_tracker: ${{ steps.filter.outputs.ut_tracker }}
-      _310_tracker: ${{ steps.filter.outputs._310_tracker }}
    steps:
-      - name: Setup git proxy
-        run: |
-          git config --global --add safe.directory "$GITHUB_WORKSPACE"
-          git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/
-      # NOTE: Do not update the version of checkout, there have some issue on self_hosted runner with the higher version
-      - uses: actions/checkout@v6
+      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            e2e_tracker:
-              - '.github/workflows/pr_test_light.yaml'
-              - '.github/workflows/_e2e_test.yaml'
+              - '.github/workflows/vllm_ascend_test.yaml'
              - 'vllm_ascend/**'
              - 'csrc/**'
              - 'cmake/**'
@@ -74,35 +69,74 @@ jobs:
              - 'packages.txt'
            ut_tracker:
              - 'tests/ut/**'
-              - '.github/workflows/pr_test_light.yaml'
-            _310_tracker:
-              - 'vllm_ascend/_310p/**'
-              - 'tests/e2e/310p/**'
-              - 'vllm_ascend/worker/model_runner_v1.py'
-              - 'vllm_ascend/attention/attention_v1.py'
-              - 'vllm_ascend/ops/fused_moe/**'
-              - 'CMakeLists.txt'

  ut:
    needs: [lint, changes]
    name: unit test
    # only trigger unit test after lint passed and the change is e2e and ut related.
    if: ${{ needs.lint.result == 'success' && (needs.changes.outputs.e2e_tracker == 'true' || needs.changes.outputs.ut_tracker == 'true') }}
+    runs-on: ubuntu-22.04-arm
+    container:
+      image: quay.io/ascend/cann:8.2.rc1-910b-ubuntu22.04-py3.11
+      env:
+        VLLM_LOGGING_LEVEL: ERROR
+        VLLM_USE_MODELSCOPE: True
    strategy:
      matrix:
-        vllm_version: [v0.18.0]
-    uses: ./.github/workflows/_unit_test.yaml
-    with:
-      vllm: ${{ matrix.vllm_version }}
-      runner: linux-amd64-cpu-8-hk
-      image: quay.nju.edu.cn/ascend/cann:8.5.1-910b-ubuntu22.04-py3.11
-      type: pr
+        vllm_version: [v0.11.0]
+    steps:
+      - name: Install packages
+        run: |
+          apt-get update -y
+          apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev curl gnupg2
+
+      - name: Checkout vllm-project/vllm repo
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-project/vllm
+          ref: ${{ matrix.vllm_version }}
+          path: ./vllm-empty
+
+      - name: Install vllm-project/vllm from source
+        working-directory: ./vllm-empty
+        run: |
+          VLLM_TARGET_DEVICE=empty python3 -m pip install .
+          python3 -m pip uninstall -y triton
+
+      - name: Checkout vllm-project/vllm-ascend repo
+        uses: actions/checkout@v4
+
+      - name: Install vllm-project/vllm-ascend
+        run: |
+          export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi
+          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/arm64-linux/devlib
+          python3 -m pip install -r requirements-dev.txt
+          python3 -m pip install -v .
+
+      - name: Run unit test
+        env:
+          VLLM_WORKER_MULTIPROC_METHOD: spawn
+          TORCH_DEVICE_BACKEND_AUTOLOAD: 0
+        run: |
+          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/arm64-linux/devlib
+          pytest -sv --cov --cov-report=xml:unittests-coverage.xml tests/ut \
+            --ignore tests/ut/attention/test_attention_v1.py
+      - name: Upload coverage to Codecov
+        # only upload coverage when commits merged
+        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+        uses: codecov/codecov-action@v5
+        env:
+          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
+        with:
+          flags: unittests
+          name: vllm-ascend
+          verbose: true

  e2e-light:
    name: e2e-light
    strategy:
      matrix:
-        vllm_version: [v0.18.0]
+        vllm_version: [v0.11.0]
    # Note (yikun): If CI resource are limited we can split job into two chain jobs
    needs: [lint, changes]
    # only trigger e2e test after lint passed and the change is e2e related with pull request.
@@ -110,6 +144,6 @@ jobs:
    uses: ./.github/workflows/_e2e_test.yaml
    with:
      vllm: ${{ matrix.vllm_version }}
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-910b-ubuntu22.04-py3.11
-      contains_310: ${{ needs.changes.outputs._310_tracker == 'true' }}
+      runner: linux-aarch64-a2
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-910b-ubuntu22.04-py3.11
      type: light
--- a/.github.backup/workflows/vllm_ascend_test_310p.yaml
+++ b/.github.backup/workflows/vllm_ascend_test_310p.yaml
@@ -0,0 +1,117 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# This file is a part of the vllm-ascend project.
+#
+
+name: 'e2e test / 310p-test'
+
+on:
+  push:
+    tags:
+      - 'v*'
+  schedule:
+    # Runs every 6 hours
+    - cron:  '0 */6 * * *'
+  pull_request:
+    types: [ labeled ]
+
+# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
+# declared as "shell: bash -el {0}" on steps that need to be properly activated.
+# It's used to activate ascend-toolkit environment variables.
+defaults:
+  run:
+    shell: bash -el {0}
+
+# only cancel in-progress runs of the same workflow
+# and ignore the lint / 1 card / 4 cards test type
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  e2e:
+    # e2e-310p-test will be triggered when tag 'e2e-310p-test' & 'ready-for-test' or schedule job
+    if: >- 
+      ${{ 
+        (contains(github.event.pull_request.labels.*.name, 'e2e-310p-test'))  && 
+        contains(github.event.pull_request.labels.*.name, 'ready-for-test') || 
+        github.event_name == 'schedule' || github.event_name == 'push' 
+        }}
+    strategy:
+      max-parallel: 2
+      matrix:
+        os: [linux-aarch64-310p-1, linux-aarch64-310p-4]
+        vllm_version: [v0.11.0]
+    name: 310p e2e test
+    runs-on: ${{ matrix.os }}
+    container:
+      # TODO(yikun): Remove m.daocloud.io prefix when infra proxy ready
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-310p-ubuntu22.04-py3.11
+      env:
+        VLLM_LOGGING_LEVEL: ERROR
+        VLLM_USE_MODELSCOPE: True
+    steps:        
+      - name: Check npu and CANN info
+        run: |
+          npu-smi info
+          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
+
+      - name: Config mirrors
+        run: |
+          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
+          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
+          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
+          apt-get update -y
+          apt install git -y
+
+      - name: Checkout vllm-project/vllm-ascend repo
+        uses: actions/checkout@v4
+
+      - name: Install system dependencies
+        run: |
+          apt-get -y install `cat packages.txt`
+          apt-get -y install git vim wget net-tools gcc g++ cmake libnuma-dev curl gnupg2
+
+      - name: Checkout vllm-project/vllm repo
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-project/vllm
+          ref: ${{ matrix.vllm_version }}
+          path: ./vllm-empty
+
+      - name: Install vllm-project/vllm from source
+        working-directory: ./vllm-empty
+        run: |
+          VLLM_TARGET_DEVICE=empty pip install -e .
+
+      - name: Install vllm-project/vllm-ascend
+        run: |
+          export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi
+          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/devlib
+          export SOC_VERSION=ASCEND310P3 
+          pip install -r requirements-dev.txt
+          pip install -v -e .
+
+      - name: Run e2e test
+        env:
+          VLLM_WORKER_MULTIPROC_METHOD: spawn
+          VLLM_USE_MODELSCOPE: True
+          PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
+        run: |
+          if [[ "${{ matrix.os }}" == "linux-aarch64-310p-1" ]]; then
+            pytest -sv tests/e2e/310p/test_offline_inference_310p.py
+          else
+            pytest -sv tests/e2e/310p/test_offline_inference_parallel_310p.py
+          fi
--- a/.github.backup/workflows/vllm_ascend_test_full.yaml
+++ b/.github.backup/workflows/vllm_ascend_test_full.yaml
@@ -14,14 +14,13 @@
 # limitations under the License.
 # This file is a part of the vllm-ascend project.
 #
-name: E2E-Full
+name: 'ascend test / full'

 on:
  pull_request:
    branches:
      - 'main'
      - '*-dev'
-      - 'releases/v*'
    types: [ labeled, synchronize ]

 # Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
@@ -39,24 +38,19 @@ concurrency:

 jobs:
  changes:
-    runs-on: linux-aarch64-a2b3-0
+    runs-on: ubuntu-latest
    if: ${{ contains(github.event.pull_request.labels.*.name, 'ready') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') }}
    outputs:
      e2e_tracker: ${{ steps.filter.outputs.e2e_tracker }}
      ut_tracker: ${{ steps.filter.outputs.ut_tracker }}
    steps:
-      - name: Setup git proxy
-        run: |
-          git config --global --add safe.directory "$GITHUB_WORKSPACE"
-          git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/
-      # NOTE: Do not update the version of checkout, there have some issue on self_hosted runner with the higher version
-      - uses: actions/checkout@v6
+      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            e2e_tracker:
-              - '.github/workflows/pr_test_full.yaml'
+              - '.github/workflows/vllm_ascend_test.yaml'
              - '.github/workflows/_e2e_test.yaml'
              - 'vllm_ascend/**'
              - 'csrc/**'
@@ -75,12 +69,12 @@ jobs:
    name: e2e-full
    strategy:
      matrix:
-        vllm_version: [v0.18.0]
+        vllm_version: [v0.11.0]
    needs: [changes]
-    if: ${{ needs.changes.outputs.e2e_tracker == 'true' || needs.changes.outputs.e2e_tracker == true }}
+    if: ${{ needs.changes.outputs.e2e_tracker == 'true' }}
    uses: ./.github/workflows/_e2e_test.yaml
    with:
      vllm: ${{ matrix.vllm_version }}
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-910b-ubuntu22.04-py3.11
-      contains_310: false
+      runner: linux-aarch64-a2
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-910b-ubuntu22.04-py3.11
      type: full
--- a/.github.backup/workflows/vllm_ascend_test_full_vllm_main.yaml
+++ b/.github.backup/workflows/vllm_ascend_test_full_vllm_main.yaml
@@ -14,12 +14,12 @@
 # limitations under the License.
 # This file is a part of the vllm-ascend project.
 #
-name: vLLM Main Schedule Test
+name: 'ascend test / vllm main'

 on:
-  # Run full e2e tests UTC+8: 10am, 16pm, 22pm, 4am
+  # Run 1-card and 2-cards e2e tests per 2h
  schedule:
-    - cron: '0 2,8,14,20 * * *'
+    - cron: '0 */2 * * *'
  workflow_dispatch:

 # Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
@@ -29,11 +29,17 @@ defaults:
  run:
    shell: bash -el {0}

+# only cancel in-progress runs of the same workflow
+# and ignore the lint / 1 card / 4 cards test type
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
 jobs:
  e2e-test:
    uses: ./.github/workflows/_e2e_test.yaml
    with:
      vllm: main
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-910b-ubuntu22.04-py3.11
-      contains_310: false
+      runner: linux-aarch64-a2
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-910b-ubuntu22.04-py3.11
      type: full
--- a/.github.backup/workflows/vllm_ascend_test_models.yaml
+++ b/.github.backup/workflows/vllm_ascend_test_models.yaml
@@ -0,0 +1,177 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# This file is a part of the vllm-ascend project.
+#
+
+# This test will be triggered:
+# 1. schedule
+# 2. pull_request change the related files
+# 3. workflow_dispatch with models input
+
+name: ascend test / models
+
+on:
+  schedule:
+    # Runs every 6 hours
+    - cron:  '0 */6 * * *'
+  pull_request:
+    branches:
+      - 'main'
+      - '*-dev'
+    paths:
+      - '.github/workflows/vllm_ascend_test_models.yaml'
+      - 'tests/e2e/models/test_lm_eval_correctness.py'
+  workflow_dispatch:
+    inputs:
+      vllm-ascend-version:
+        description: 'vllm-ascend:'
+        required: true
+        type: choice
+        # Current supported vLLM versions
+        options:
+          - latest
+          - main
+        default: main
+
+# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
+# declared as "shell: bash -el {0}" on steps that need to be properly activated.
+# It's used to activate ascend-toolkit environment variables.
+defaults:
+  run:
+    shell: bash -el {0}
+
+# only cancel in-progress runs of the same workflow
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  run:
+    strategy:
+      matrix:
+        include:
+          - model_name: Qwen3-8B
+            runner: a2-1
+          - model_name: Qwen2.5-VL-7B-Instruct
+            runner: a2-1
+          - model_name: Qwen2-Audio-7B-Instruct
+            runner: a2-1
+          - model_name: Qwen3-30B-A3B
+            runner: a2-2
+          - model_name: Qwen3-VL-30B-A3B-Instruct
+            runner: a2-2
+          - model_name: DeepSeek-V2-Lite
+            runner: a2-2
+      fail-fast: false
+    uses: ./.github/workflows/_accuracy_test.yaml
+    with:
+      vllm: v0.11.0
+      runner:  linux-aarch64-${{ matrix.runner }}
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-910b-ubuntu22.04-py3.11
+      model_name: ${{ matrix.model_name }}
+      upload: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.vllm-ascend-version == 'latest' }}
+
+  create_pr:
+    runs-on: ubuntu-latest
+    needs: run
+    if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.vllm-ascend-version == 'latest' }}
+    env:
+      UPSTREAM_REPO: vllm-project/vllm-ascend
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-ascend-ci/vllm-ascend
+          token: ${{ secrets.PAT_TOKEN }}
+          ref: main
+      
+      - name: Add upstream remote
+        run: |
+          git remote add upstream https://github.com/${{ env.UPSTREAM_REPO }}.git
+          git fetch upstream
+          git remote -v
+
+      - name: Set Git user info dynamically
+        run: |
+          git config user.name "${{ github.actor }}"
+          git config user.email "${{ github.actor }}@users.noreply.github.com"
+
+      - name: Create or switch to branch
+        run: |
+          TIMESTAMP=$(date +%Y%m%d%H%M%S)
+          BRANCH_NAME="auto-pr/accuracy-report-${TIMESTAMP}"
+          echo "BRANCH_NAME=${BRANCH_NAME}" >> $GITHUB_ENV
+          git checkout -B "${BRANCH_NAME}" upstream/main
+
+      - name: Download only current run reports
+        uses: actions/download-artifact@v5
+        with:
+          path: ./docs/source/developer_guide/evaluation/accuracy_report
+          pattern: report-*
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          run-id: ${{ github.run_id }}
+
+      - name: Delete old report
+        run: |
+          find ./docs/source/developer_guide/evaluation/accuracy_report -maxdepth 1 -type f -name '*.md' ! -name 'index.md' -delete
+          find ./docs/source/developer_guide/evaluation/accuracy_report -mindepth 2 -type f -name '*.md' -exec mv -f {} ./docs/source/developer_guide/evaluation/accuracy_report \;
+          find ./docs/source/developer_guide/evaluation/accuracy_report -mindepth 1 -type d -empty -delete
+      
+      - name: Update accuracy_report/index.md
+        run: |
+          REPORT_DIR="./docs/source/developer_guide/evaluation/accuracy_report"
+          INDEX_MD="$REPORT_DIR/index.md"
+          {
+            echo "# Accuracy Report"
+            echo ""
+            echo ":::{toctree}"
+            echo ":caption: Accuracy Report"
+            echo ":maxdepth: 1"
+            
+            for report in "$REPORT_DIR"/*.md; do
+              filename="$(basename "$report" .md)"
+              if [ "$filename" != "index" ]; then
+                echo "$filename"
+              fi
+            done
+            echo ":::"
+          } > "$INDEX_MD"
+
+      - name: push accuracy report
+        env:
+          GITHUB_TOKEN: ${{ secrets.PAT_TOKEN }}
+        run: |
+          git add ./docs/source/developer_guide/evaluation/accuracy_report/*.md
+          git commit -s -m "[Doc] Update accuracy reports for ${{ env.BRANCH_NAME }}"
+          git push -f origin "${{ env.BRANCH_NAME }}"
+
+      - name: Create PR in upstream via API
+        uses: actions/github-script@v8
+        with:
+          github-token: ${{ secrets.PAT_TOKEN }}
+          script: |
+            const pr = await github.rest.pulls.create({
+              owner: 'vllm-project',
+              repo: 'vllm-ascend',
+              head: `vllm-ascend-ci:${{ env.BRANCH_NAME }}`,
+              base: 'main',
+              title: `[Doc] Update accuracy reports for ${{ env.BRANCH_NAME }}`,
+              body: `The accuracy results running on NPU Altlas A2 have changed, updating reports for: All models
+            
+              - [Workflow run][1]
+              
+              [1]: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}`
+            });
+            core.info(`Created PR #${pr.data.number}`);
--- a/.github.backup/workflows/vllm_ascend_test_pd.yaml
+++ b/.github.backup/workflows/vllm_ascend_test_pd.yaml
@@ -0,0 +1,112 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+# This file is a part of the vllm-ascend project.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+name: 'e2e test / pd-disaggregation'
+
+on:
+  schedule:
+    # Runs at 23:00 UTC (7:00 AM Beijing) every day
+    - cron: '0 23 * * *'
+  pull_request:
+    types: [ labeled ]
+
+# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
+# declared as "shell: bash -el {0}" on steps that need to be properly activated.
+# It's used to activate ascend-toolkit environment variables.
+defaults:
+  run:
+    shell: bash -el {0}
+
+# only 1 job can runs on static-8-01-cards
+concurrency:
+  group: static-8-01-cards
+  cancel-in-progress: false
+
+jobs:
+  prefilling-decoding-disaggregation:
+    # pd-test will be triggered when tag 'pd-test' & 'ready-for-test' or schedule job
+    if: ${{ contains(github.event.pull_request.labels.*.name, 'pd-test') && contains(github.event.pull_request.labels.*.name, 'ready-for-test') || github.event_name == 'schedule' }}
+    strategy:
+      matrix:
+        vllm_verison: [
+            main, 
+            v0.9.1
+          ]
+    name: vLLM Ascend prefilling decoding disaggregation test
+    runs-on: linux-arm64-npu-static-8
+
+    container:
+      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.3.rc2-910b-ubuntu22.04-py3.11
+      volumes:
+        - /usr/local/dcmi:/usr/local/dcmi
+        - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
+        - /usr/local/Ascend/driver/:/usr/local/Ascend/driver/
+        # Use self-host cache speed up pip and model download
+        - /home/action/.cache:/github/home/.cache/
+      options: >-
+        --device /dev/davinci0
+        --device /dev/davinci1
+        --device /dev/davinci_manager
+        --device /dev/devmm_svm
+        --device /dev/hisi_hdc
+      env:
+        VLLM_USE_MODELSCOPE: True
+    steps:
+      - name: Check npu and CANN info
+        run: |
+          npu-smi info
+          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
+
+      - name: Config mirrors
+        run: |
+          # keep using tuna's proxy since linux-arm64-npu-static-8 is in another region
+          sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
+          pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
+          apt-get update -y
+          apt install git -y
+          git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/
+
+      - name: Checkout vllm-project/vllm-ascend repo
+        uses: actions/checkout@v4
+
+      - name: Install system dependencies
+        run: |
+          apt-get -y install `cat packages.txt`
+          apt-get -y install gcc g++ cmake libnuma-dev
+
+      - name: Checkout vllm-project/vllm repo
+        uses: actions/checkout@v4
+        with:
+          repository: vllm-project/vllm
+          ref: ${{ matrix.vllm_verison }}
+          path: ./vllm-empty
+
+      - name: Install vllm-project/vllm from source
+        working-directory: ./vllm-empty
+        run: |
+          VLLM_TARGET_DEVICE=empty pip install -e .
+
+      - name: Install vllm-project/vllm-ascend
+        env:
+          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
+        run: |
+          pip install -r requirements-dev.txt
+          pip install -v -e .
+
+      - name: Run vllm-project/vllm-ascend PD Disaggregation edge test
+        run: |
+          git config --global --add safe.directory/__w/vllm-ascend/vllm-ascend
+          bash tests/e2e/pd_disaggreate/run_edge_case_test.sh
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -1,65 +0,0 @@
-# See https://help.github.com/articles/about-codeowners/
-# for more info about CODEOWNERS file
-
-# Infra, CI
-/.gemini @wangxiyuan @Yikun
-/.github @wangxiyuan @Yikun
-/tools @wangxiyuan @Yikun
-/.gitignore @wangxiyuan
-/.gitmodules @wangxiyuan @zzzzwwjj
-/.pre-commit-config.yaml @wangxiyuan
-/codecov.yml @wangxiyuan
-/Dockerfile* @wangxiyuan
-/format.sh @wangxiyuan
-/mypy.ini @wangxiyuan
-/requirements* @wangxiyuan
-/setup.py @wangxiyuan
-/typos.toml @wangxiyuan
-
-# benchmark
-/benchmarks @wangxiyuan
-
-# docs
-/docs  @wangxiyuan @Yikun @LCAIZJ
-/.readthedocs.yaml  @wangxiyuan @Yikun
-/README*  @wangxiyuan @Yikun
-
-# example
-/examples @wangxiyuan
-
-# tests
-/tests @wangxiyuan
-
-# c++ source code
-/cmake @zzzzwwjj
-/csrc @zzzzwwjj
-/CMakeLists.txt @zzzzwwjj
-
-# python source code
-/vllm_ascend/attention @weijinqian0 @whx-sjtu
-/vllm_ascend/compilation @yiz-liu
-/vllm_ascend/core @wangxiyuan @MengqingCao
-/vllm_ascend/device @weijinqian0 @zzzzwwjj
-/vllm_ascend/device_allocator @wangxiyuan @weijinqian0
-/vllm_ascend/distributed @MengqingCao @LCAIZJ
-/vllm_ascend/eplb @wangxiyuan
-/vllm_ascend/kv_offload @nalinaly
-/vllm_ascend/lora @paulyu12
-/vllm_ascend/model_loader @wangxiyuan
-/vllm_ascend/ops @zzzzwwjj @realliujiaxu @whx-sjtu
-/vllm_ascend/patch @wangxiyuan
-/vllm_ascend/quantization @wangxiyuan
-/vllm_ascend/sample @realliujiaxu @whx-sjtu
-/vllm_ascend/spec_decode @wangxiyuan
-/vllm_ascend/worker @MengqingCao
-/vllm_ascend/xlite @wangxiyuan
-/vllm_ascend/ascend_config.py @wangxiyuan
-/vllm_ascend/ascend_forward_context.py @wangxiyuan
-/vllm_ascend/batch_invariant.py @wangxiyuan
-/vllm_ascend/cpu_binding.py @wangxiyuan
-/vllm_ascend/envs.py @wangxiyuan
-/vllm_ascend/flash_common3_context.py @wangxiyuan
-/vllm_ascend/meta_registration.py @wangxiyuan
-/vllm_ascend/platform.py @wangxiyuan
-/vllm_ascend/profiling_config.py @wangxiyuan
-/vllm_ascend/utils.py @wangxiyuan
--- a/.github/issue-labeler.yml
+++ b/.github/issue-labeler.yml
@@ -1,74 +0,0 @@
-core-features:
-  - '/((pd|(prefill[- ]?decode))\s+disaggregation|kv cache pool|aclgraph|async scheduler|cpu binding|quantization)/i'
-pd-disaggregation:
-  - '/((pd|(prefill[- ]?decode))\s+disaggregation)/i'
-kv-cache-pool:
-  - '/(kv cache pool)/i'
-aclgraph:
-  - '/(aclgraph)/i'
-async-scheduler:
-  - '/(async scheduler)/i'
-cpu-binding:
-  - '/(cpu binding)/i'
-quantization:
-  - '/(quantization)/i'
-advanced_features:
-  - '/(long sequence|dpc|pcp|mtp|speculative decode)/i'
-long-seq:
-  - '/(long sequence|dpc|pcp)/i'
-mtp/speculative-decode:
-  - '/(mtp|speculative decode)/i'
-eplb:
-  - '/(eplb)/i'
-llm-model:
-  - '/(deepseek[- ]*(r1|v3(\.2)?)\S*|(kimi k2|kimik2|kimi-k2)(?!\.5)|glm5|qwen3-(?:235b|480b)\S*|Qwen3-(?:32B|8B|30B)\S*|qwen3 next|glm\s*4\.(?![^v\s]*v)\S*)/i'
-deepseek:
-  - '/(deepseek[- ]*(r1|v3(\.2)?)\S*)/i'
-kimi-k2:
-  - '/((kimi k2|kimik2|kimi-k2)(?!\.5))/i'
-kimi-k2.5:
-  - '/((kimi k2\.5|kimik2\.5|kimi-k2\.5))/i'
-glm5:
-  - '/(glm5)/i'
-qwen3-moe:
-  - '/(Qwen3-(?:235B|480B)\S*)/i'
-qwen3-dense:
-  - '/(Qwen3-(?:32B|8B|30B)\S*)/i'
-qwen3-next:
-  - '/(qwen3-next)/i'
-glm-4:
-  - '/(glm\s*4\.(?![^v\s]*v)\S*)/i'
-multi-modality-generate:
-  - '/(seedance\S*|seedream\S*|wan\d[\d.]*|hunyuan\S*|fLux\S*|kimi k2\.5|kimi-k2\.5|kimik2\.5|minimax\S*|qwen-image\S*)/i'
-seedance:
-  - '/(seedance\S*)/i'
-seedream:
-  - '/(seedream\S*)/i'
-wan:
-  - '/(wan\d[\d.]*)/i'
-hunyuan:
-  - '/(hunyuan\S*)/i'
-fLux:
-  - '/(fLux\S*)/i'
-qwen-image:
-  - '/(qwen-image\S*)/i'
-minimax:
-  - '/(minimax\S*)/i'
-multimodal_understanding:
-  - '/(glm-?4\.\S*v\b|qwen3\.5\S*|deepseek-ocr\S*)/i'
-glm-4v:
-  - '/(glm-?4\.\S*v\b)/i'
-qwen-3.5:
-  - '/(qwen3\.5\S*)/i'
-deepseek-ocr:
-  - '/(deepseek-ocr\S*)/i'
-audio-model:
-  - '/(qwen3-tts\S*)/i'
-omni-model:
-  - '/(qwen3-Omni\S*)/i'
-multimodal-unified-autoregress:
-  - '/(hunyuan\S*|emu\S*)/i'
-paddle:
-  - '/(paddle\S*)/i'
-310p:
-  - '/(310p\S*)/i'
--- a/.github/workflows/READMD.md
+++ b/.github/workflows/READMD.md
@@ -1,85 +0,0 @@
-# E2E Test Workflow Guide
-
-This document provides a guide on how to manage and extend the E2E test suite for `vllm-ascend`. It covers how to add new test cases and understand the automatic partitioning mechanism.
-
-## 1. Adding a New Test Case
-
-All E2E test cases are defined and managed in the `.github/workflows/scripts/config.yaml` file.
-
-### Steps
-
-1. **Prepare the Test Script**: Ensure your test script (`.py` file) is placed in the appropriate location under the `tests/e2e/` directory (e.g., `tests/e2e/singlecard/` or `tests/e2e/multicard/`).
-
-2. **Modify `config.yaml`**:
-    Open `.github/workflows/scripts/config.yaml` and locate the corresponding test suite (e.g., `e2e-singlecard` or `e2e-multicard-2-cards`).
-
-3. **Add Configuration Entry**:
-    Add a new entry under the corresponding list. Each entry contains the following fields:
-    * `name`: The relative path to the test file. If you only need to run a specific test function within the file, use `::` as a separator, e.g., `path/to/test.py::test_func`.
-    * `estimated_time`: The estimated time (in seconds) required to run the test. **This field is crucial** as it is used for automatic load balancing (partitioning).
-    * `is_skipped` (Optional): If set to `true`, the test will be skipped.
-
-### Example
-
-Suppose you want to add a new test named `tests/e2e/singlecard/test_new_feature.py` with an estimated runtime of 120 seconds:
-
-```yaml
-suites:
-  e2e-singlecard:
-    # ... other existing tests ...
-    - name: tests/e2e/singlecard/test_new_feature.py
-      estimated_time: 120
-```
-
-To add a specific test function:
-
-```yaml
-    - name: tests/e2e/singlecard/test_new_feature.py::test_specific_case
-      estimated_time: 60
-```
-
-## 2. Automatic Partitioning Mechanism
-
-To speed up CI execution, we support splitting large test suites into multiple parallel Jobs (partitions). The partitioning logic is primarily implemented in the `auto_partition` function in `.github/workflows/scripts/run_suite.py`.
-
-### Principle
-
-The partitioning algorithm uses a Greedy Approach to achieve load balancing, aiming to make the total estimated runtime of each partition as equal as possible.
-
-1. **Read Configuration**: The script reads all non-skipped test cases and their `estimated_time` from `config.yaml`.
-2. **Sort(Balanced Assignment)**: Test cases are sorted by `estimated_time` in descending order. This ensures that the heaviest tasks are distributed first to achieve optimal load balancing across partitions.
-3. **Assign**: Iterating through the sorted test cases, each case is assigned to the partition (Bucket) with the current minimum total time.
-4. **Re-sort (Fast Feedback)**: Within each partition, tests are re-sorted by `estimated_time` in ascending order. This allows the CI to cover as many test cases as possible in the early stages.
-    > TIP: If you need to prioritize a new test case, you can temporarily set its estimated_time to 0 to ensure it runs first, then update it to the actual value later.
-
-### How to Modify Partitioning Logic
-
-If you need to adjust the partitioning strategy, please modify the `.github/workflows/scripts/run_suite.py` file.
-
-* **Algorithm Location**: `auto_partition` function.
-* **Input Parameters**:
-    * `files`: List of test files (including `estimated_time`).
-    * `rank`: Index of the current partition (0 to size-1).
-    * `size`: Total number of partitions.
-* **Invocation**:
-    CI workflows (e.g., `.github/workflows/_e2e_test.yaml`) call the script via command-line arguments:
-    ```bash
-    python3 .github/workflows/scripts/run_suite.py --suite <suite_name> --auto-partition-id <index> --auto-partition-size <total_count>
-    ```
-
-### Notes
-
-* **Accurate Estimated Time**: To achieve the best load balancing, please provide an accurate `estimated_time` in `config.yaml`. If a new test is very time-consuming but the estimated time is set too low, it may cause a specific partition to timeout.
-* **Number of Partitions**: The number of partitions (`auto-partition-size`) is typically defined in the `strategy.matrix` of the GitHub Actions workflow definition file (e.g., `_e2e_test.yaml`).
-
-## 3. Running Tests Locally
-
-You can use the `run_suite.py` script to run test suites locally:
-
-```bash
-# Run the full e2e-singlecard suite
-python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard
-
-# Simulate partitioned execution (e.g., partition 0 of 2)
-python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard --auto-partition-id 0 --auto-partition-size 2
-```
--- a/.github/workflows/_e2e_nightly_multi_node.yaml
+++ b/.github/workflows/_e2e_nightly_multi_node.yaml
@@ -1,331 +0,0 @@
-name: 'e2e nightly test multi_node'
-
-on:
-  workflow_call:
-    inputs:
-      soc_version:
-        required: true
-        type: string
-        description: use a2 or a3
-      runner:
-        required: false
-        type: string
-        default: linux-aarch64-a3-0
-      image:
-        required: false
-        type: string
-        description: base image for pods
-        default: "swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-910b-ubuntu22.04-py3.11"
-      config_file_path:
-        required: true
-        type: string
-        description: the model config for multi_node test
-      replicas:
-        required: false
-        default: "1"
-        type: string
-        description: replicas of the k8s cluster
-      size:
-        required: false
-        default: "2"
-        type: string
-        description: how many pods will be pulled up via lws.yaml, indicates number of nodes we need
-      vllm_version:
-        required: false
-        default: "v0.18.0"
-        type: string
-        description: vllm version to use
-      vllm_ascend_remote_url:
-        required: false
-        default: https://github.com/vllm-project/vllm-ascend.git
-        type: string
-        description: used for pr level tests
-      vllm_ascend_ref:
-        required: false
-        default: main
-        type: string
-        description: used for pr level tests
-      should_run:
-        required: true
-        type: boolean
-    secrets:
-      KUBECONFIG_B64:
-        required: true
-
-
-# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
-# declared as "shell: bash -el {0}" on steps that need to be properly activated.
-# It's used to activate ascend-toolkit environment variables.
-defaults:
-  run:
-    shell: bash -el {0}
-
-# only cancel in-progress runs of the same workflow
-# and ignore the lint / 8 cards test type
-concurrency:
-  group: ascend-nightly-${{ github.workflow_ref }}-${{ github.ref }}-${{ inputs.soc_version }}-${{ inputs.config_file_path }}
-  cancel-in-progress: true
-
-jobs:
-  e2e:
-    name: ${{ inputs.config_file_path }}
-    # This is the runner with no NPU for k8s controller
-    runs-on: ${{ inputs.runner }}
-    if: ${{ inputs.should_run }}
-    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-cpu
-      env:
-        KUBECONFIG: /tmp/kubeconfig
-        NAMESPACE: vllm-project
-    steps:
-        - name: Decode kubeconfig from secrets
-          run: |
-            # Decode and save kubeconfig
-            if [ "${{ github.event_name }}" = "pull_request" ]; then
-              echo "PR test mode"
-              if [ "${{ inputs.soc_version }}" = "a3" ]; then
-                echo "Using A3 cached kubeconfig"
-                cp /root/.cache/.kube/kubeconfig.yaml "$KUBECONFIG"
-              else
-                echo "Using A2 cached kubeconfig"
-                cp /root/.cache/.kube/hk_001_kb.yaml "$KUBECONFIG"
-              fi
-            else
-              echo "Decoding kubeconfig from secrets"
-              echo "${{ secrets.KUBECONFIG_B64 }}" | base64 -d > "$KUBECONFIG"
-            fi
-        - name: Checkout code
-          uses: actions/checkout@v6
-
-        - name: Set job variables
-          run: |
-            # Derive a unique, valid k8s resource name from config_file_path.
-            # Strip .yaml extension, lowercase, replace dots/underscores with hyphens, cap at 50 chars.
-            config_file="${{ inputs.config_file_path }}"
-            lws_suffix=$(echo "$config_file" | sed 's/\.yaml$//' | tr '[:upper:]' '[:lower:]' | tr '._' '-' | cut -c1-50)
-            LWS_NAME="vllm-${lws_suffix}"
-            echo "LWS_NAME=${LWS_NAME}" >> $GITHUB_ENV
-            echo "LEADER_POD=${LWS_NAME}-0" >> $GITHUB_ENV
-            echo "Computed LWS_NAME=${LWS_NAME}"
-
-        - name: Prepare scripts
-          run: |
-            # prepare for lws entrypoint scripts
-            install -D tests/e2e/nightly/multi_node/scripts/run.sh /root/.cache/tests/run.sh
-
-        - name: Clear resources
-          run: |
-            set -euo pipefail
-
-            TIMEOUT=${TIMEOUT:-120}
-            SLEEP_INTERVAL=2
-
-            echo "Deleting leaderworkerset [$LWS_NAME] in namespace [$NAMESPACE]..."
-            kubectl delete leaderworkerset "$LWS_NAME" -n "$NAMESPACE" --ignore-not-found
-            kubectl delete service "${LWS_NAME}-leader" -n "$NAMESPACE" --ignore-not-found
-
-            echo "Waiting for pods of leaderworkerset [$LWS_NAME] to be deleted..."
-            START_TIME=$(date +%s)
-
-            while true; do
-              NOW=$(date +%s)
-              ELAPSED=$((NOW - START_TIME))
-
-              if [[ $ELAPSED -ge $TIMEOUT ]]; then
-                echo "Timeout reached ($TIMEOUT seconds), some pods still exist:"
-                kubectl get pods -n "$NAMESPACE" | grep "^${LWS_NAME}-" || true
-                exit 1
-              fi
-
-              PODS_EXIST=$(kubectl get pods -n "$NAMESPACE" -o jsonpath='{.items[*].metadata.name}' 2>/dev/null | tr ' ' '\n' | grep "^${LWS_NAME}-" || true)
-
-              if [[ -z "$PODS_EXIST" ]]; then
-                echo "All pods for [$LWS_NAME] deleted."
-                break
-              else
-                echo "Waiting for pods to be deleted: $PODS_EXIST"
-                sleep $SLEEP_INTERVAL
-              fi
-            done
-
-        - name: Launch cluster
-          id: launcher
-          run: |
-            set -e
-
-            size="${{ inputs.size }}"
-            replicas="${{ inputs.replicas }}"
-            image="${{ inputs.image }}"
-            config_file_path="${{ inputs.config_file_path }}"
-            fail_tag=FAIL_TAG_"${{ inputs.config_file_path }}"
-            is_pr_test="${{ github.event_name == 'pull_request' }}"
-            vllm_version="${{ inputs.vllm_version }}"
-            vllm_ascend_ref="${{ inputs.vllm_ascend_ref }}"
-            vllm_ascend_remote_url="${{ inputs.vllm_ascend_remote_url }}"
-            echo "FAIL_TAG=${fail_tag}" >> $GITHUB_ENV
-
-            required_params=("size" "replicas" "image" "config_file_path" "is_pr_test" "vllm_version" "vllm_ascend_ref" "vllm_ascend_remote_url")
-            for param in "${required_params[@]}"; do
-              if [ -z "${!param}" ]; then
-                echo "Error: Parameter '$param' is required but empty"
-                exit 1
-              fi
-            done
-
-            if [ "${{ inputs.soc_version }}" = "a3" ]; then
-              npu_per_node=16
-              TEMPLATE_FILE="tests/e2e/nightly/multi_node/scripts/lws.yaml.jinja2"
-            else
-              npu_per_node=8
-              TEMPLATE_FILE="tests/e2e/nightly/multi_node/scripts/lws-a2.yaml.jinja2"
-            fi
-
-            jinja2 $TEMPLATE_FILE \
-              -D lws_name="$LWS_NAME" \
-              -D size="$size" \
-              -D replicas="$replicas" \
-              -D image="$image" \
-              -D config_file_path="$config_file_path" \
-              -D npu_per_node="$npu_per_node" \
-              -D fail_tag="$fail_tag" \
-              -D is_pr_test="$is_pr_test" \
-              -D vllm_version="$vllm_version" \
-              -D vllm_ascend_ref="$vllm_ascend_ref" \
-              -D vllm_ascend_remote_url="$vllm_ascend_remote_url" \
-              --outfile lws.yaml
-
-            kubectl apply -f ./lws.yaml
-
-        - name: Waiting for pod ready
-          run: |
-            POD_PREFIX="${LWS_NAME}-0"
-            SIZE="${{ inputs.size }}"
-            TIMEOUT=1200  # default timeout 20 minutes
-
-            echo "Waiting for Pods in namespace [$NAMESPACE] to become Running and Ready (timeout ${TIMEOUT}s)..."
-
-            START_TIME=$(date +%s)
-
-            while true; do
-              NOW=$(date +%s)
-              ELAPSED=$((NOW - START_TIME))
-              if [[ $ELAPSED -ge $TIMEOUT ]]; then
-                echo "Timeout reached after ${ELAPSED}s"
-                echo "Dumping pod status for debugging:"
-                kubectl get pods -n "$NAMESPACE"
-                kubectl describe pod "$LEADER_POD" -n "$NAMESPACE"
-                exit 1
-              fi
-
-              # 1) check follower pods
-              ALL_FOLLOWERS_READY=true
-              for ((i=1; i<SIZE; i++)); do
-                POD="${POD_PREFIX}-${i}"
-                PHASE=$(kubectl get pod "$POD" -n "$NAMESPACE" -o jsonpath='{.status.phase}' 2>/dev/null || echo "NotFound")
-                READY=$(kubectl get pod "$POD" -n "$NAMESPACE" -o jsonpath='{.status.containerStatuses[*].ready}' 2>/dev/null)
-
-                echo "Follower [$POD] phase=$PHASE ready=$READY"
-
-                if [[ "$PHASE" != "Running" || "$READY" != "true" ]]; then
-                  echo "Follower [$POD] not Ready yet..."
-                  ALL_FOLLOWERS_READY=false
-                  break
-                fi
-              done
-
-              # 2) check leader pod
-              LEADER_PHASE=$(kubectl get pod "$LEADER_POD" -n "$NAMESPACE" -o jsonpath='{.status.phase}' 2>/dev/null || echo "NotFound")
-              LEADER_READY=$(kubectl get pod "$LEADER_POD" -n "$NAMESPACE" -o jsonpath='{.status.containerStatuses[*].ready}' 2>/dev/null)
-
-              echo "Leader [$LEADER_POD] phase=$LEADER_PHASE ready=$LEADER_READY"
-
-              if [[ "$LEADER_PHASE" != "Running" || "$LEADER_READY" != "true" ]]; then
-                echo "Leader not Ready yet..."
-                ALL_FOLLOWERS_READY=false
-              fi
-
-              if [[ "$ALL_FOLLOWERS_READY" == "true" ]]; then
-                echo "All follower pods and leader pod are Running and Ready — continuing."
-                break
-              fi
-
-              sleep 2
-            done
-
-        - name: Stream logs
-          run: |
-            set -euo pipefail
-
-            size="${{ inputs.size }}"
-            pids=()
-
-            cleanup() {
-              echo "Cleaning up background log streams..."
-              for pid in "${pids[@]}"; do
-                kill "$pid" 2>/dev/null || true
-              done
-            }
-            trap cleanup EXIT
-
-            for i in $(seq 1 $((size - 1))); do
-              POD="${LWS_NAME}-0-${i}"
-
-              echo "==== Collecting logs from worker pod: $POD ===="
-              kubectl logs -f "$POD" -n "$NAMESPACE" \
-                > "/tmp/${POD}_logs.txt" 2>&1 &
-
-              pids+=($!)
-            done
-
-            echo "==== Streaming logs from leader pod: $LEADER_POD ===="
-            echo "Looking for logs containing: $FAIL_TAG"
-
-            kubectl logs -f "$LEADER_POD" -n "$NAMESPACE" | while IFS= read -r line; do
-              echo "$line"
-              if echo "$line" | grep -q "$FAIL_TAG"; then
-                exit 1
-              fi
-            done
-
-        - name: Upload logs
-          if: always()
-          uses: actions/upload-artifact@v7
-          with:
-            name: ${{ inputs.config_file_path }}-pod-logs
-            path: /tmp/vllm*_logs.txt
-            retention-days: 7
-
-        - name: Post process
-          if: always()
-          run: |
-            echo "Current pod status:"
-            kubectl get pods -n "$NAMESPACE" --ignore-not-found=true
-
-            echo "Deleting resources for [$LWS_NAME]..."
-            kubectl delete -f ./lws.yaml --ignore-not-found=true || true
-
-            echo "Waiting for pods of [$LWS_NAME] to fully terminate..."
-            TIMEOUT=300
-            SLEEP_INTERVAL=5
-            START_TIME=$(date +%s)
-
-            while true; do
-              NOW=$(date +%s)
-              ELAPSED=$((NOW - START_TIME))
-
-              if [[ $ELAPSED -ge $TIMEOUT ]]; then
-                echo "Timeout reached ($TIMEOUT seconds) waiting for termination, continuing anyway."
-                kubectl get pods -n "$NAMESPACE" | grep "^${LWS_NAME}-" || true
-                break
-              fi
-
-              PODS_EXIST=$(kubectl get pods -n "$NAMESPACE" -o jsonpath='{.items[*].metadata.name}' 2>/dev/null | tr ' ' '\n' | grep "^${LWS_NAME}-" || true)
-
-              if [[ -z "$PODS_EXIST" ]]; then
-                echo "All pods for [$LWS_NAME] have terminated."
-                break
-              else
-                echo "Waiting for pods to terminate: $PODS_EXIST"
-                sleep $SLEEP_INTERVAL
-              fi
-            done
--- a/.github/workflows/_e2e_nightly_single_node.yaml
+++ b/.github/workflows/_e2e_nightly_single_node.yaml
@@ -1,224 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-name: 'e2e nightly test'
-
-on:
-  workflow_call:
-    inputs:
-      runner:
-        required: true
-        type: string
-      image:
-        required: false
-        type: string
-        default: "swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-910b-ubuntu22.04-py3.11"
-      tests:
-        required: false
-        type: string
-      config_file_path:
-        required: false
-        type: string
-      name:
-        required: false
-        type: string
-      vllm_version:
-        required: false
-        type: string
-        default: "v0.18.0"
-      should_run:
-        required: true
-        type: boolean
-
-# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
-# declared as "shell: bash -el {0}" on steps that need to be properly activated.
-# It's used to activate ascend-toolkit environment variables.
-defaults:
-  run:
-    shell: bash -el {0}
-
-# only cancel in-progress runs of the same workflow
-# and ignore the lint / 1 card / 4 cards test type
-concurrency:
-  group: ascend-nightly-${{ github.workflow_ref }}-${{ github.ref }}-${{ inputs.config_file_path || inputs.tests }}
-  cancel-in-progress: true
-
-jobs:
-  e2e-nightly:
-    name: ${{ inputs.name || inputs.config_file_path || inputs.tests }}
-    runs-on: ${{ inputs.runner }}
-    if: ${{ inputs.should_run }}
-    timeout-minutes: 600
-    container:
-      image: ${{ inputs.image }}
-    env:
-      HF_HUB_OFFLINE: 1
-      VLLM_USE_MODELSCOPE: True
-      UV_INDEX_URL: http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-      UV_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-      UV_INDEX_STRATEGY: unsafe-best-match
-      UV_NO_CACHE: 1
-      UV_SYSTEM_PYTHON: 1
-      VLLM_ENGINE_READY_TIMEOUT_S: 1800
-    steps:
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-          pip install uv
-      
-      - name: uninstall vlm vllm-ascend and remove code (if pr test)
-        if: ${{ github.event_name == 'pull_request' }}
-        run: |
-          pip uninstall -y vllm vllm-ascend || true
-          cp -r /vllm-workspace/vllm-ascend/benchmark /tmp/aisbench-backup || true
-          rm -rf /vllm-workspace/vllm /vllm-workspace/vllm-ascend
-
-      - name: Checkout vllm-project/vllm repo
-        if: ${{ github.event_name == 'pull_request' }}
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm_version }}
-          path: ./temp-vllm 
-          fetch-depth: 1
-
-      - name: Checkout vllm-project/vllm-ascend repo
-        if: ${{ github.event_name == 'pull_request' }}
-        uses: actions/checkout@v6
-        with:
-          path: ./temp-vllm-ascend
-          fetch-depth: 1
-
-      - name: Move code to /vllm-workspace
-        if: ${{ github.event_name == 'pull_request' }}
-        run: |
-          mv ./temp-vllm /vllm-workspace/vllm
-          mv ./temp-vllm-ascend /vllm-workspace/vllm-ascend
-          ls -R /vllm-workspace
-
-      - name: Install vllm-project/vllm from source
-        if: ${{ github.event_name == 'pull_request' }}
-        working-directory: /vllm-workspace/vllm
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-      
-      - name: Install vllm-project/vllm-ascend
-        if: ${{ github.event_name == 'pull_request' }}
-        working-directory: /vllm-workspace/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          git config --global --add safe.directory /vllm-workspace/vllm-ascend
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-
-      - name: Install aisbench
-        if: ${{ github.event_name == 'pull_request' }}
-        shell: bash -l {0}
-        run: |
-          cp -r /tmp/aisbench-backup /vllm-workspace/vllm-ascend/benchmark
-          cd /vllm-workspace/vllm-ascend/benchmark
-          pip install pytest asyncio pytest-asyncio
-          pip install -e . -r requirements/api.txt -r requirements/extra.txt
-          python3 -m pip cache purge
-
-      - name: Show vLLM and vLLM-Ascend version
-        working-directory: /vllm-workspace
-        run: |
-          echo "Installed vLLM-related Python packages:"
-          pip list | grep vllm || echo "No vllm packages found."
-
-          echo ""
-          echo "============================"
-          echo "vLLM Git information"
-          echo "============================"
-          cd vllm
-          if [ -d .git ]; then
-            echo "Branch:      $(git rev-parse --abbrev-ref HEAD)"
-            echo "Commit hash: $(git rev-parse HEAD)"
-            echo "Author:      $(git log -1 --pretty=format:'%an <%ae>')"
-            echo "Date:        $(git log -1 --pretty=format:'%ad' --date=iso)"
-            echo "Message:     $(git log -1 --pretty=format:'%s')"
-            echo "Tags:        $(git tag --points-at HEAD || echo 'None')"
-            echo "Remote:      $(git remote -v | head -n1)"
-            echo ""
-          else
-            echo "No .git directory found in vllm"
-          fi
-          cd ..
-
-          echo ""
-          echo "============================"
-          echo "vLLM-Ascend Git information"
-          echo "============================"
-          cd vllm-ascend
-          if [ -d .git ]; then
-            echo "Branch:      $(git rev-parse --abbrev-ref HEAD)"
-            echo "Commit hash: $(git rev-parse HEAD)"
-            echo "Author:      $(git log -1 --pretty=format:'%an <%ae>')"
-            echo "Date:        $(git log -1 --pretty=format:'%ad' --date=iso)"
-            echo "Message:     $(git log -1 --pretty=format:'%s')"
-            echo "Tags:        $(git tag --points-at HEAD || echo 'None')"
-            echo "Remote:      $(git remote -v | head -n1)"
-            echo ""
-          else
-            echo "No .git directory found in vllm-ascend"
-          fi
-          cd ..
-
-      - name: Install clang
-        shell: bash -l {0}
-        run: |
-          apt-get update && apt-get -y install clang-15
-          update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 20
-          update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 20
-
-      - name: Validate Inputs
-        run: |
-          if [[ -z "${{ inputs.tests }}" && -z "${{ inputs.config_file_path }}" ]]; then
-            echo "Error: Either 'tests' or 'config_file_path' must be provided."
-            exit 1
-          fi
-
-      - name: Run Pytest (py-driven)
-        if: ${{ inputs.tests != '' }}
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-          VLLM_USE_MODELSCOPE: True
-          VLLM_CI_RUNNER: ${{ inputs.runner }}
-        working-directory: /vllm-workspace/vllm-ascend
-        run: |
-          export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-          echo "Running pytest with tests path: ${{ inputs.tests }}"
-          pytest -sv "${{ inputs.tests }}" \
-          --ignore=tests/e2e/nightly/single_node/ops/singlecard_ops/test_fused_moe.py
-
-      - name: Run Pytest (YAML-driven)
-        if: ${{ always() && inputs.config_file_path != '' }} 
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-          VLLM_USE_MODELSCOPE: True
-          VLLM_CI_RUNNER: ${{ inputs.runner }}
-          CONFIG_YAML_PATH: ${{ inputs.config_file_path }}
-        working-directory: /vllm-workspace/vllm-ascend
-        run: |
-          export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
-          echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc
-          echo "Running YAML-driven test with config: ${{ inputs.config_file_path }}"
-          pytest -sv tests/e2e/nightly/single_node/models/scripts/test_single_node.py
--- a/.github/workflows/_e2e_nightly_single_node_models.yaml
+++ b/.github/workflows/_e2e_nightly_single_node_models.yaml
@@ -1,241 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-name: 'e2e nightly models test'
-
-on:
-  workflow_call:
-    inputs:
-      vllm:
-        required: true
-        type: string
-      vllm-ascend:
-        required: false
-        type: string
-        default: main
-      runner:
-        required: true
-        type: string
-      image:
-        required: true
-        type: string
-      model_list:
-        required: true
-        type: string
-      upload:
-        required: false
-        type: boolean
-        default: false
-      is_run:
-        required: true
-        type: boolean
-
-# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
-# declared as "shell: bash -el {0}" on steps that need to be properly activated.
-# It's used to activate ascend-toolkit environment variables.
-defaults:
-  run:
-    shell: bash -el {0}
-
-# only cancel in-progress runs of the same workflow
-# and ignore the lint / 1 card / 2 cards / 4 cards test type
-concurrency:
-  group: ${{ github.workflow }}-${{ github.ref }}-${{ inputs.runner }}-${{inputs.model_list}}
-  cancel-in-progress: true
-
-jobs:
-  e2e-nightly:
-    name: ${{inputs.model_list}} accuracy test
-    runs-on: ${{ inputs.runner }}
-    if: ${{ inputs.is_run }}
-    container:
-      image: "${{ inputs.image }}"
-      env:
-        VLLM_USE_MODELSCOPE: True
-        GHA_VLLM_ASCEND_VERSION: ${{ inputs.vllm-ascend }}
-        UV_INDEX_URL: http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-        UV_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        UV_INDEX_STRATEGY: unsafe-best-match
-        UV_NO_CACHE: 1
-        UV_SYSTEM_PYTHON: 1
-    steps:
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-
-      - name: Config mirrors
-        run: |
-          sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
-          pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
-          apt-get update -y
-          apt install git -y
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-          git config --global url."https://gh-proxy.test.osinfra.cn/https://github.com/".insteadOf https://github.com/
-
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-
-      - name: Install system dependencies
-        run: |
-          apt-get -y install `cat packages.txt`
-          apt-get -y install gcc g++ cmake libnuma-dev clang-15
-
-          update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 20
-          update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 20
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-
-      - name: Install vllm-project/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-
-      - name: Install tensorflow (for Molmo-7B-D-0924)
-        if: ${{ inputs.runner == 'linux-aarch64-a2b3-1' && contains(inputs.model_list, 'Molmo-7B-D-0924') }}
-        shell: bash -l {0}
-        run: |
-          pip install tensorflow==2.19.1 --no-cache-dir
-
-      - name: Resolve vllm-ascend version
-        run: |
-          VERSION_INPUT="${{ inputs.vllm-ascend }}"
-          
-          if [[ "$VERSION_INPUT" == "latest" ]]; then
-            TAGS=$(git ls-remote --tags --sort=-v:refname https://github.com/vllm-project/vllm-ascend "v*" | cut -f2 | sed 's|refs/tags/||')
-            LATEST_TAG=$(echo "$TAGS" | head -n1)
-            if [[ -z "$LATEST_TAG" ]]; then
-              RESOLVED_VERSION="main"
-            else
-              RESOLVED_VERSION="$LATEST_TAG"
-            fi
-          else
-            RESOLVED_VERSION="$VERSION_INPUT"
-          fi
-          echo "GHA_VLLM_ASCEND_VERSION=$RESOLVED_VERSION" >> $GITHUB_ENV
-
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm-ascend
-          path: ./vllm-ascend
-          ref: ${{ env.GHA_VLLM_ASCEND_VERSION }}
-
-      - name: Get vLLM commit hash and URL
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_COMMIT=$(git rev-parse --short=7 HEAD)
-          echo "VLLM_COMMIT=$VLLM_COMMIT" >> $GITHUB_ENV
-
-      - name: Get vLLM-Ascend commit hash and URL
-        working-directory: ./vllm-ascend
-        run: |
-          VLLM_ASCEND_COMMIT=$(git rev-parse --short=7 HEAD)
-          echo "VLLM_ASCEND_COMMIT=$VLLM_ASCEND_COMMIT" >> $GITHUB_ENV
-
-      - name: Collect version info
-        run: |
-          for dir in /usr/local/Ascend/ascend-toolkit/*; do
-            dname=$(basename "$dir")
-            if [ "$dname" != "latest" ]; then
-              TOOLKIT_DIR="$dname"
-              break
-            fi
-          done
-          INFO_FILE="/usr/local/Ascend/ascend-toolkit/${TOOLKIT_DIR}/$(uname -i)-linux/ascend_toolkit_install.info"
-          GHA_CANN_VERSION=$(grep "version=" "$INFO_FILE" \
-                           | head -n1 \
-                           | cut -d'=' -f2 \
-                           | tr -d '"')
-          {
-            echo "GHA_CANN_VERSION=$GHA_CANN_VERSION"
-            pip show torch | grep "Version:" | awk '{print "GHA_TORCH_VERSION="$2}'
-            pip show torch_npu | grep "Version:" | awk '{print "GHA_TORCH_NPU_VERSION="$2}'
-            pip show vllm | grep "Version:" | awk '{print "GHA_VLLM_VERSION="$2}' | sed 's/+.*//'
-          } >> "$GITHUB_ENV"
-
-      - name: Run vllm-project/vllm-ascend accuracy test
-        id: report
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-          HF_DATASETS_OFFLINE: True
-          VLLM_USE_MODELSCOPE: True
-          VLLM_CI_RUNNER: ${{ inputs.runner }}
-          VLLM_VERSION: ${{ env.GHA_VLLM_VERSION }}
-          VLLM_COMMIT: ${{ env.VLLM_COMMIT }}
-          VLLM_ASCEND_VERSION: ${{ env.GHA_VLLM_ASCEND_VERSION || github.ref }}
-          VLLM_ASCEND_COMMIT: ${{ env.VLLM_ASCEND_COMMIT }}
-          CANN_VERSION: ${{ env.GHA_CANN_VERSION }}
-          TORCH_VERSION: ${{ env.GHA_TORCH_VERSION }}
-          TORCH_NPU_VERSION: ${{ env.GHA_TORCH_NPU_VERSION }}
-        run: |
-          mkdir -p ./benchmarks/accuracy
-          echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc
-          echo "Received model_list: ${{ inputs.model_list }}"
-          models=$(echo '${{ inputs.model_list }}' | jq -r '.[]')
-          any_failure=0
-          for model in $models; do
-            echo "Running test for model: $model"
-            pytest -sv ./tests/e2e/models/test_lm_eval_correctness.py \
-              --config "./tests/e2e/models/configs/${model}.yaml" || {
-              echo "Test failed for model: $model"
-              any_failure=1
-            }
-          done
-
-          if [ $any_failure -ne 0 ]; then
-            exit 1
-          fi
-
-      - name: Generate step summary
-        if: ${{ always() }}
-        run: |
-          models=$(echo '${{ inputs.model_list }}' | jq -r '.[]')
-          for model in $models; do
-            echo "Processing model: $model"
-            model_base_name=$(basename "$model")
-            cat ./benchmarks/accuracy/${model_base_name}.md >> $GITHUB_STEP_SUMMARY
-          done
-
-      - name: Set artifact timestamp
-        id: ts
-        run: |
-          echo "artifact_ts=$(date -u +%Y%m%dT%H%M%SZ)" >> $GITHUB_OUTPUT
-
-      - name: Upload Report
-        if: ${{ inputs.upload == true }}
-        uses: actions/upload-artifact@v7
-        with:
-          name: report-${{ env.GHA_VLLM_ASCEND_VERSION }}-${{ steps.ts.outputs.artifact_ts }}
-          path: ./benchmarks/accuracy/
-          if-no-files-found: warn
-          retention-days: 90
-          overwrite: true
--- a/.github/workflows/_e2e_test.yaml
+++ b/.github/workflows/_e2e_test.yaml
@@ -1,731 +0,0 @@
-name: 'e2e test'
-
-on:
-  workflow_call:
-    inputs:
-      vllm:
-        required: true
-        type: string
-      image:
-        required: true
-        type: string
-      type:
-        required: true
-        type: string
-      contains_310:
-        required: true
-        type: boolean
-      continue_on_error:
-        required: false
-        type: boolean
-        default: false
-env:
-  UV_INDEX_URL: http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-  UV_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-  UV_INDEX_STRATEGY: unsafe-best-match
-  UV_NO_CACHE: 1
-  UV_SYSTEM_PYTHON: 1
-
-jobs:
-  e2e-light:
-    name: singlecard-light
-    if: ${{ inputs.type == 'light' }}
-    runs-on: linux-aarch64-a2b3-1
-    strategy:
-      fail-fast: false
-      matrix:
-        part: [0]
-    container:
-      image: ${{ inputs.image }}
-      env:
-        VLLM_LOGGING_LEVEL: ERROR
-        VLLM_USE_MODELSCOPE: True
-        HF_HUB_OFFLINE: 1
-    steps:
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-
-      - name: Config mirrors
-        run: |
-          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
-          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
-          apt-get update -y
-          apt install git -y
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-
-      - name: Install system dependencies
-        run: |
-          apt-get -y install `cat packages.txt`
-          apt-get -y install gcc g++ cmake libnuma-dev clang-15
-
-          update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 20
-          update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 20
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-          fetch-depth: 1
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-          uv pip uninstall triton
-
-      - name: Install vllm-project/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-          uv pip install https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/torch_npu-2.9.0.post1%2Bgitee7ba04-cp311-cp311-manylinux_2_28_aarch64.whl
-          uv pip install git+https://github.com/modelscope/modelscope.git@dbbcbf631fe6d10cc6446df2ad2fef24039fe7fe
-
-      - name: Run vllm-project/vllm-ascend test
-        env:
-          PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-        shell: bash
-        run: |
-          set -o pipefail
-          if [ "${{ inputs.continue_on_error }}" = "true" ]; then
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-singlecard-light \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 1 \
-              --auto-upgrade-estimated-times \
-              --continue-on-error \
-              2>&1 | tee /tmp/e2e-singlecard-light-part${{ matrix.part }}.log
-          else
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-singlecard-light \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 1 \
-              2>&1 | tee /tmp/e2e-singlecard-light-part${{ matrix.part }}.log
-          fi
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize singlecard-light failure
-        if: ${{ always() }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --step-name "Run singlecard-light test" \
-            --log-file /tmp/e2e-singlecard-light-part${{ matrix.part }}.log \
-            --output "$GITHUB_STEP_SUMMARY"
-
-
-      - name: Upload timing data
-        uses: actions/upload-artifact@v4
-        if: ${{ inputs.continue_on_error == true && github.event_name != 'pull_request' }}
-        with:
-          name: timing-data-singlecard-light-part${{ matrix.part }}
-          path: test_timing_data.json
-          if-no-files-found: warn
-          retention-days: 5
-
-  e2e-full:
-    name: singlecard-full
-    if: ${{ inputs.type == 'full' }}
-    runs-on: linux-aarch64-a2b3-1
-    strategy:
-      fail-fast: false
-      matrix:
-        part: [0, 1]
-    container:
-      image: ${{ inputs.image }}
-      env:
-        VLLM_LOGGING_LEVEL: ERROR
-        VLLM_USE_MODELSCOPE: True
-        HF_HUB_OFFLINE: 1
-        MODELSCOPE_HUB_FILE_LOCK: False
-    steps:
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-
-      - name: Config mirrors
-        run: |
-          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
-          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
-          apt-get update -y
-          apt install git -y
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-
-      - name: Install system dependencies
-        run: |
-          apt-get -y install `cat packages.txt`
-          apt-get -y install gcc g++ cmake libnuma-dev clang-15
-
-          update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 20
-          update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 20
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-          fetch-depth: 1
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-          uv pip uninstall triton
-
-      - name: Install vllm-project/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-          uv pip install https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/torch_npu-2.9.0.post1%2Bgitee7ba04-cp311-cp311-manylinux_2_28_aarch64.whl
-          uv pip install git+https://github.com/modelscope/modelscope.git@dbbcbf631fe6d10cc6446df2ad2fef24039fe7fe
-      - name: Run e2e test
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-          PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
-        shell: bash
-        run: |
-          set -o pipefail
-          if [ "${{ inputs.continue_on_error }}" = "true" ]; then
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-singlecard \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 2 \
-              --auto-upgrade-estimated-times \
-              --continue-on-error \
-              2>&1 | tee /tmp/e2e-singlecard-full-part${{ matrix.part }}.log
-          else
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-singlecard \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 2 \
-              2>&1 | tee /tmp/e2e-singlecard-full-part${{ matrix.part }}.log
-          fi
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize singlecard-full failure
-        if: ${{ always() }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --step-name "Run singlecard-full test" \
-            --log-file /tmp/e2e-singlecard-full-part${{ matrix.part }}.log \
-            --output "$GITHUB_STEP_SUMMARY"
-
-      - name: Upload timing data
-        uses: actions/upload-artifact@v4
-        if: ${{ inputs.continue_on_error == true && github.event_name != 'pull_request' }}
-        with:
-          name: timing-data-singlecard-full-part${{ matrix.part }}
-          path: test_timing_data.json
-          if-no-files-found: warn
-          retention-days: 5
-
-  e2e-2-cards-light:
-    name: multicard-2-light
-    if: ${{ inputs.type == 'light' }}
-    runs-on: linux-aarch64-a3-2
-    strategy:
-      fail-fast: false
-      matrix:
-        part: [0]
-    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-a3-ubuntu22.04-py3.11
-      env:
-        VLLM_LOGGING_LEVEL: ERROR
-        VLLM_USE_MODELSCOPE: True
-        HCCL_BUFFSIZE: 1024
-        HF_HUB_OFFLINE: 1
-    steps:
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-
-      - name: Config mirrors
-        run: |
-          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
-          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
-          apt-get update -y
-          apt install git -y
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-
-      - name: Install system dependencies
-        run: |
-          apt-get -y install `cat packages.txt`
-          apt-get -y install gcc g++ cmake libnuma-dev clang-15
-
-          update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 20
-          update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 20
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-          fetch-depth: 1
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-          uv pip uninstall triton
-
-      - name: Install vllm-project/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-          uv pip install https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/torch_npu-2.9.0.post1%2Bgitee7ba04-cp311-cp311-manylinux_2_28_aarch64.whl
-          uv pip install git+https://github.com/modelscope/modelscope.git@dbbcbf631fe6d10cc6446df2ad2fef24039fe7fe
-      - name: Run vllm-project/vllm-ascend test (light)
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-        shell: bash
-        run: |
-          set -o pipefail
-          if [ "${{ inputs.continue_on_error }}" = "true" ]; then
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-2card-light \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 1 \
-              --auto-upgrade-estimated-times \
-              --continue-on-error \
-              2>&1 | tee /tmp/e2e-2card-light-part${{ matrix.part }}.log
-          else
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-2card-light \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 1 \
-              2>&1 | tee /tmp/e2e-2card-light-part${{ matrix.part }}.log
-          fi
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize multicard-2-light failure
-        if: ${{ always() }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --step-name "Run multicard-2-light test" \
-            --log-file /tmp/e2e-2card-light-part${{ matrix.part }}.log \
-            --output "$GITHUB_STEP_SUMMARY"
-
-
-      - name: Upload timing data
-        uses: actions/upload-artifact@v4
-        if: ${{ inputs.continue_on_error == true && github.event_name != 'pull_request' }}
-        with:
-          name: timing-data-2card-light-part${{ matrix.part }}
-          path: test_timing_data.json
-          if-no-files-found: warn
-          retention-days: 5
-
-  e2e-2-cards-full:
-    name: multicard-2-full
-    if: ${{ inputs.type == 'full' }}
-    runs-on: linux-aarch64-a3-2
-    strategy:
-      fail-fast: false
-      matrix:
-        part: [0]
-    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-a3-ubuntu22.04-py3.11
-      env:
-        VLLM_LOGGING_LEVEL: ERROR
-        VLLM_USE_MODELSCOPE: True
-        HCCL_BUFFSIZE: 1024
-        HF_HUB_OFFLINE: 1
-    steps:
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-
-      - name: Config mirrors
-        run: |
-          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
-          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
-          apt-get update -y
-          apt install git -y
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-
-      - name: Install system dependencies
-        run: |
-          apt-get -y install `cat packages.txt`
-          apt-get -y install gcc g++ cmake libnuma-dev clang-15
-
-          update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 20
-          update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 20
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-          fetch-depth: 1
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-          uv pip uninstall triton
-
-      - name: Install vllm-project/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-          uv pip install https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/torch_npu-2.9.0.post1%2Bgitee7ba04-cp311-cp311-manylinux_2_28_aarch64.whl
-          uv pip install git+https://github.com/modelscope/modelscope.git@dbbcbf631fe6d10cc6446df2ad2fef24039fe7fe
-      - name: Run vllm-project/vllm-ascend test (full)
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-        shell: bash
-        run: |
-          set -o pipefail
-          if [ "${{ inputs.continue_on_error }}" = "true" ]; then
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-multicard-2-cards \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 1 \
-              --auto-upgrade-estimated-times \
-              --continue-on-error \
-              2>&1 | tee /tmp/e2e-2card-full-part${{ matrix.part }}.log
-          else
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-multicard-2-cards \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 1 \
-              2>&1 | tee /tmp/e2e-2card-full-part${{ matrix.part }}.log
-          fi
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize multicard-2-full failure
-        if: ${{ always() }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --step-name "Run multicard-2-full test " \
-            --log-file /tmp/e2e-2card-full-part${{ matrix.part }}.log \
-            --output "$GITHUB_STEP_SUMMARY"
-
-
-      - name: Upload timing data
-        uses: actions/upload-artifact@v4
-        if: ${{ inputs.continue_on_error == true && github.event_name != 'pull_request' }}
-        with:
-          name: timing-data-2card-full-part${{ matrix.part }}
-          path: test_timing_data.json
-          if-no-files-found: warn
-          retention-days: 5
-
-      - name: Run vllm-project/vllm-ascend test (non triton)
-        if: ${{ inputs.type == 'full' && matrix.part == 0 }}
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-        shell: bash
-        run: |
-          set -o pipefail
-          python3 -m pip uninstall -y triton-ascend
-          pytest -sv --durations=0 tests/e2e/multicard/2-cards/test_aclgraph_capture_replay.py \
-            2>&1 | tee /tmp/e2e-non-triton.log
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize non-triton failure
-        if: ${{ always() && inputs.type == 'full' && matrix.part == 0 }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --step-name "Run multicard-2-full test (non triton)" \
-            --log-file /tmp/e2e-non-triton.log \
-            --output "$GITHUB_STEP_SUMMARY"
-
-  e2e-4-cards-full:
-    name: multicard-4-full
-    if: ${{ inputs.type == 'full' }}
-    runs-on: linux-aarch64-a3-4
-    strategy:
-      fail-fast: false
-      matrix:
-        part: [0]
-    container:
-      image: m.daocloud.io/quay.io/ascend/cann:8.5.1-a3-ubuntu22.04-py3.11
-      env:
-        VLLM_LOGGING_LEVEL: ERROR
-        VLLM_USE_MODELSCOPE: True
-        HF_HUB_OFFLINE: 1
-    steps:
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-
-      - name: Config mirrors
-        run: |
-          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
-          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
-          apt-get update -y
-          apt install git -y
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-
-      - name: Install system dependencies
-        run: |
-          apt-get -y install `cat packages.txt`
-          apt-get -y install gcc g++ cmake libnuma-dev clang-15
-
-          update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 20
-          update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 20
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-          fetch-depth: 1
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-          uv pip uninstall triton
-
-      - name: Install vllm-project/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-          uv pip install https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/torch_npu-2.9.0.post1%2Bgitee7ba04-cp311-cp311-manylinux_2_28_aarch64.whl
-          uv pip install git+https://github.com/modelscope/modelscope.git@dbbcbf631fe6d10cc6446df2ad2fef24039fe7fe
-
-      - name: Run vllm-project/vllm-ascend test for V1 Engine
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-        shell: bash
-        run: |
-          set -o pipefail
-          if [ "${{ inputs.continue_on_error }}" = "true" ]; then
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-multicard-4-cards \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 1 \
-              --auto-upgrade-estimated-times \
-              --continue-on-error \
-              2>&1 | tee /tmp/e2e-4card-full-part${{ matrix.part }}.log
-          else
-            python3 .github/workflows/scripts/run_suite.py \
-              --suite e2e-multicard-4-cards \
-              --auto-partition-id "${{ matrix.part }}" \
-              --auto-partition-size 1 \
-              2>&1 | tee /tmp/e2e-4card-full-part${{ matrix.part }}.log
-          fi
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize multicard-4-full failure
-        if: ${{ always() }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --step-name "Run vllm-project/vllm-ascend test for V1 Engine" \
-            --log-file /tmp/e2e-4card-full-part${{ matrix.part }}.log \
-            --output "$GITHUB_STEP_SUMMARY"
-
-
-      - name: Upload timing data
-        uses: actions/upload-artifact@v4
-        if: ${{ inputs.continue_on_error == true && github.event_name != 'pull_request' }}
-        with:
-          name: timing-data-4card-full-part${{ matrix.part }}
-          path: test_timing_data.json
-          if-no-files-found: warn
-          retention-days: 5
-
-  e2e_310p:
-    name: 310p singlecard
-    runs-on: linux-aarch64-310p-1
-    if: ${{ inputs.contains_310 }}
-    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-310p-ubuntu22.04-py3.11
-      env:
-        VLLM_LOGGING_LEVEL: ERROR
-        VLLM_USE_MODELSCOPE: True
-        HF_HUB_OFFLINE: 1
-    steps:
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-      - name: Config mirrors
-        run: |
-          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
-          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
-          apt-get update -y
-          apt install git -y
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-
-      - name: Install system dependencies
-        run: |
-          apt-get -y install `cat packages.txt`
-          apt-get -y install gcc g++ cmake libnuma-dev
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-          fetch-depth: 1
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-          uv pip uninstall triton
-
-      - name: Install vllm-project/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-          uv pip install https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/torch_npu-2.9.0.post1%2Bgitee7ba04-cp311-cp311-manylinux_2_28_aarch64.whl
-          uv pip install git+https://github.com/modelscope/modelscope.git@dbbcbf631fe6d10cc6446df2ad2fef24039fe7fe
-
-      - name: Run vllm-project/vllm-ascend test
-        env:
-          PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-        shell: bash
-        run: |
-          set -o pipefail
-          pytest -sv --durations=0 tests/e2e/310p/singlecard/test_dense_model_singlecard.py \
-          tests/e2e/310p/singlecard/test_vl_model_singlecard.py \
-          2>&1 | tee /tmp/e2e-310p-singlecard.log
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize 310p singlecard failure
-        if: ${{ always() && inputs.contains_310 }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --step-name "Run vllm-project/vllm-ascend test" \
-            --log-file /tmp/e2e-310p-singlecard.log \
-            --output "$GITHUB_STEP_SUMMARY"
-
-  e2e_310p-4cards:
-    name: 310p multicards 4cards
-    runs-on: linux-aarch64-310p-4
-    if: ${{ inputs.contains_310 }}
-    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-310p-ubuntu22.04-py3.11
-      env:
-        VLLM_LOGGING_LEVEL: ERROR
-        VLLM_USE_MODELSCOPE: True
-        HF_HUB_OFFLINE: 1
-    steps:
-      - name: Check npu and CANN info
-        run: |
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-      - name: Config mirrors
-        run: |
-          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
-          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
-          apt-get update -y
-          apt install git -y
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-
-      - name: Install system dependencies
-        run: |
-          apt-get -y install `cat packages.txt`
-          apt-get -y install gcc g++ cmake libnuma-dev
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-          fetch-depth: 1
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install -e .
-          uv pip uninstall triton
-
-      - name: Install vllm-project/vllm-ascend
-        env:
-          PIP_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        run: |
-          pip install uc-manager
-          uv pip install -r requirements-dev.txt
-          uv pip install -v -e .
-          uv pip install https://vllm-ascend.obs.cn-north-4.myhuaweicloud.com/vllm-ascend/torch_npu-2.9.0.post1%2Bgitee7ba04-cp311-cp311-manylinux_2_28_aarch64.whl
-          uv pip install git+https://github.com/modelscope/modelscope.git@dbbcbf631fe6d10cc6446df2ad2fef24039fe7fe
-
-      - name: Run vllm-project/vllm-ascend test
-        env:
-          PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-        shell: bash
-        run: |
-          set -o pipefail
-          pytest -sv --durations=0 \
-          tests/e2e/310p/multicard/test_dense_model_multicard.py \
-          tests/e2e/310p/multicard/test_moe_model_multicard.py \
-          tests/e2e/310p/multicard/test_vl_model_multicard.py \
-          2>&1 | tee /tmp/e2e-310p-4cards.log
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize 310p multicards failure
-        if: ${{ always() && inputs.contains_310 }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --step-name "Run vllm-project/vllm-ascend test" \
-            --log-file /tmp/e2e-310p-4cards.log \
-            --output "$GITHUB_STEP_SUMMARY"
--- a/.github/workflows/_nightly_image_build.yaml
+++ b/.github/workflows/_nightly_image_build.yaml
@@ -1,71 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-name: 'Nightly image build'
-
-on:
-  workflow_call:
-    inputs:
-      target:
-        required: true
-        type: string
-        description: "Build target: 'a2' or 'a3'"
-    secrets:
-      HW_USERNAME:
-        required: false
-      HW_TOKEN:
-        required: false
-      GITEE_TOKEN:
-        required: false
-
-jobs:
-  build:
-    name: Build nightly-${{ inputs.target }} image
-    runs-on: ubuntu-22.04-arm
-    steps:
-      - uses: actions/checkout@v6
-
-      - name: Login to Huawei Cloud SWR
-        id: login-swr
-        if: ${{ env.HW_USERNAME != '' && env.HW_TOKEN != '' }}
-        env:
-          HW_USERNAME: ${{ secrets.HW_USERNAME }}
-          HW_TOKEN: ${{ secrets.HW_TOKEN }}
-        run: |
-          echo "$HW_TOKEN" | docker login -u "$HW_USERNAME" --password-stdin swr.cn-southwest-2.myhuaweicloud.com
-
-      - name: Build nightly-${{ inputs.target }} image
-        env:
-          GITEE_USERNAME: ${{ vars.GITEE_USERNAME }}
-          GITEE_TOKEN: ${{ secrets.GITEE_TOKEN }}
-        run: |
-          IMAGE="swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-${{ inputs.target }}"
-          docker build \
-            --network host \
-            --platform linux/arm64 \
-            -f .github/workflows/dockerfiles/Dockerfile.nightly.${{ inputs.target }} \
-            --build-arg CANN_VERSION="8.5.1" \
-            --build-arg UBUNTU_VERSION="22.04" \
-            --build-arg PYTHON_VERSION="3.11" \
-            --build-arg GITEE_USERNAME="${GITEE_USERNAME}" \
-            --build-arg GITEE_TOKEN="${GITEE_TOKEN}" \
-            -t "$IMAGE" .
-
-      - name: Push image to SWR
-        if: ${{ github.repository_owner == 'vllm-project' && steps.login-swr.conclusion == 'success' }}
-        run: |
-          docker push swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-${{ inputs.target }}
--- a/.github/workflows/_parse_trigger.yaml
+++ b/.github/workflows/_parse_trigger.yaml
@@ -1,115 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-name: 'Parse nightly trigger'
-
-on:
-  workflow_call:
-    inputs:
-      runner:
-        required: false
-        type: string
-        default: linux-aarch64-a2b3-0
-    outputs:
-      run:
-        description: "Whether nightly tests should run"
-        value: ${{ jobs.parse.outputs.run }}
-      filter:
-        description: "Comma-wrapped test name filter (e.g. ',name1,name2,'), or 'all'"
-        value: ${{ jobs.parse.outputs.filter }}
-      ref:
-        description: "The vllm-ascend ref (commit SHA for PRs, branch/tag name otherwise)"
-        value: ${{ jobs.parse.outputs.ref }}
-
-jobs:
-  parse:
-    name: Parse trigger and determine test scope
-    runs-on: ${{ inputs.runner }}
-    outputs:
-      run: ${{ steps.parse.outputs.run }}
-      filter: ${{ steps.parse.outputs.filter }}
-      ref: ${{ steps.parse.outputs.ref }}
-    steps:
-      - name: Parse trigger
-        id: parse
-        uses: actions/github-script@v7
-        with:
-          script: |
-            const eventName = context.eventName;
-
-            function parseNightlyComment(body) {
-              if (!body) return null;
-              const match = body.trim().match(/^\/nightly(?:\s+(.+))?$/m);
-              if (!match) return null;
-              const args = (match[1] || '').trim();
-              if (!args || args === 'all') return 'all';
-              // Wrap with commas for exact-name matching: ",name1,name2,"
-              return ',' + args.split(/\s+/).join(',') + ',';
-            }
-
-            function getRef() {
-              if (eventName === 'pull_request') {
-                return context.payload.pull_request.head.sha;
-              }
-              return (context.ref || '').replace(/^refs\/(heads|tags)\//, '') || 'main';
-            }
-
-            core.setOutput('ref', getRef());
-
-            // 1. schedule / workflow_dispatch: run all tests with pre-built image
-            if (eventName === 'schedule' || eventName === 'workflow_dispatch') {
-              core.setOutput('run', 'true');
-              core.setOutput('filter', 'all');
-              return;
-            }
-
-            // 2. pull_request (labeled / synchronize)
-            if (eventName === 'pull_request') {
-              const labels = context.payload.pull_request.labels.map(l => l.name);
-              if (!labels.includes('nightly-test')) {
-                core.setOutput('run', 'false');
-                core.setOutput('filter', '');
-                return;
-              }
-              // Search comments for latest /nightly command
-              const prNumber = context.payload.pull_request.number;
-              const comments = await github.paginate(github.rest.issues.listComments, {
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                issue_number: prNumber,
-                per_page: 100,
-              });
-              let filter = null;
-              for (let i = comments.length - 1; i >= 0; i--) {
-                const result = parseNightlyComment(comments[i].body);
-                if (result !== null) { filter = result; break; }
-              }
-              // No /nightly comment found: do not run any tests
-              if (filter === null) {
-                core.info('nightly-test label present but no /nightly comment found; skipping.');
-                core.setOutput('run', 'false');
-                core.setOutput('filter', '');
-                return;
-              }
-              core.setOutput('run', 'true');
-              core.setOutput('filter', filter);
-              return;
-            }
-
-            // Fallback
-            core.setOutput('run', 'false');
-            core.setOutput('filter', '');
--- a/.github/workflows/_pre_commit.yml
+++ b/.github/workflows/_pre_commit.yml
@@ -1,86 +0,0 @@
-name: pre-commit
-
-on:
-    workflow_call:
-      inputs:
-        vllm:
-          required: true
-          type: string
-
-permissions:
-  contents: read
-
-jobs:
-  pre-commit:
-    runs-on: linux-amd64-cpu-8-hk
-    container:
-      # Build it from https://github.com/nv-action/vllm-benchmarks/blob/main/Dockerfile
-      image: quay.io/ascend-ci/vllm-ascend:lint
-    steps:
-    - name: Checkout vllm-project/vllm-ascend repo
-      uses: actions/checkout@v6
-
-    # With problem matchers in a container, the output of $GITHUB_WORKSPACE and ${{ github.workspace }} are different.
-    # So we will just copy it into a temp path. see https://github.com/actions/runner/issues/2058
-    - name: cp problem matchers
-      run: |
-        cp .github/workflows/matchers/actionlint.json "$RUNNER_TEMP/actionlint.json"
-        cp .github/workflows/matchers/markdownlint.json "$RUNNER_TEMP/markdownlint.json"
-        cp .github/workflows/matchers/mypy.json "$RUNNER_TEMP/mypy.json"
-
-    - run: echo "::add-matcher::$RUNNER_TEMP/actionlint.json"
-    - run: echo "::add-matcher::$RUNNER_TEMP/markdownlint.json"
-    - run: echo "::add-matcher::$RUNNER_TEMP/mypy.json"
-
-    - name: Checkout vllm-project/vllm repo
-      uses: actions/checkout@v6
-      with:
-        repository: vllm-project/vllm
-        path: ./vllm-empty
-        ref: ${{ inputs.vllm }}
-
-    - uses: dorny/paths-filter@v3
-      id: filter
-      with:
-        filters: |
-          lint_tracker:
-            - 'requirements.txt'
-            - 'requirements-dev.txt'
-            - 'requirements-lint.txt'
-
-    - name: Install vllm-ascend dev (conditional)
-      if: steps.filter.outputs.lint_tracker == 'true'
-      env:
-        UV_INDEX_URL: http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-        UV_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        UV_INDEX_STRATEGY: unsafe-best-match
-        UV_NO_CACHE: 1
-        UV_SYSTEM_PYTHON: 1
-      run: |
-        pip install uv
-        git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-        pip install uc-manager
-        uv pip install -r requirements-dev.txt --extra-index-url https://download.pytorch.org/whl/cpu
-
-    - name: Run pre-commit
-      env:
-        PRE_COMMIT_COLOR: always
-        FORCE_COLOR: "1"
-        TERM: xterm-256color
-        SHELLCHECK_OPTS: "--exclude=SC2046,SC2006,SC2086" # Exclude SC2046, SC2006, SC2086 for actionlint
-      run: |
-        git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-        pre-commit run --all-files --hook-stage manual --show-diff-on-failure
-
-    - name: Run mypy
-      run: |
-        PYTHONPATH="$PYTHONPATH:$(pwd)/vllm-empty"
-        export PYTHONPATH
-        git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-        # Run mypy for Python 3.10, 3.11, 3.12 manually
-        # Note: We are now separating mypy from pre-commit hooks for performance reasons.
-        for python_version in "3.10" "3.11" "3.12"; do
-          echo "============================"
-          tools/mypy.sh 1 "$python_version"
-          echo "============================"
-        done
--- a/.github/workflows/_schedule_image_build.yaml
+++ b/.github/workflows/_schedule_image_build.yaml
@@ -1,192 +0,0 @@
-name: Image_oncall
-on:
-  workflow_call:
-    inputs:
-      suffix:
-        description: 'The tag subfix to use'
-        required: true
-        type: string
-      should_push:
-        description: 'Whether to push the image'
-        required: false
-        type: boolean
-        default: False
-      dockerfile:
-        description: 'The Dockerfile to use'
-        required: false
-        type: string
-      quay_username:
-        description: 'Quay username for pushing images'
-        required: false
-        type: string
-      workflow_dispatch_tag:
-        description: 'The tag to use for workflow dispatch'
-        required: false
-        type: string
-    secrets:
-        QUAY_PASSWORD:
-            description: 'Quay password for pushing images'
-            required: false
-
-jobs:
-  build-push-digest:
-    name: build
-    runs-on: ${{ matrix.runner }}
-    strategy:
-      matrix:
-        include:
-          - arch: linux/amd64
-            runner: ubuntu-latest
-            tag: amd64
-          - arch: linux/arm64
-            runner: ubuntu-22.04-arm
-            tag: arm64
-    steps:
-    - uses: actions/checkout@v6
-      if: ${{ github.event_name != 'workflow_dispatch' }}
-      with:
-        fetch-depth: 0
-        persist-credentials: false
-        ref: ${{ github.ref }}
-
-    - uses: actions/checkout@v6
-      if: ${{ github.event_name == 'workflow_dispatch' }}
-      with:
-        fetch-depth: 0
-        persist-credentials: false
-        ref: ${{ inputs.workflow_dispatch_tag }}
-
-    - name: Free up disk space
-      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
-      with:
-        tool-cache: true
-        docker-images: false
-
-    - name: Publish - Login to Quay Container Registry
-      if: ${{ inputs.should_push }}
-      uses: docker/login-action@v4
-      with:
-        registry: quay.io
-        username: ${{ inputs.quay_username }}
-        password: ${{ secrets.QUAY_PASSWORD }}
-
-    - name: Set up Docker Buildx
-      uses: docker/setup-buildx-action@v4
-      with:
-        install: true
-        driver: docker-container
-        use: true
-
-    - name: Build and push
-      uses: docker/build-push-action@v7
-      id: build
-      with:
-        platforms: ${{ matrix.arch }}
-        # use the current repo path as the build context, ensure .git is contained
-        context: .
-        file: ${{ inputs.dockerfile || 'Dockerfile' }}
-        # only trigger when tag, branch/main push
-        push: ${{ inputs.should_push }}
-        outputs: type=image,name=quay.io/ascend/vllm-ascend,push-by-digest=true,name-canonical=true,push=${{ inputs.should_push }}
-        build-args: |
-          PIP_INDEX_URL=https://pypi.org/simple
-        provenance: false
-
-    - name: Export digest
-      run: |
-        mkdir -p ${{ runner.temp }}/digests
-        digest="${{ steps.build.outputs.digest }}"
-        touch "${{ runner.temp }}/digests/${digest#sha256:}"
-
-    - name: Upload digest
-      uses: actions/upload-artifact@v7
-      with:
-        name: digests-${{ inputs.suffix }}-${{ matrix.tag }}
-        path: ${{ runner.temp }}/digests/*
-        if-no-files-found: error
-        retention-days: 1
-
-
-  merge-image:
-    runs-on: ubuntu-latest
-    needs: build-push-digest
-    if: ${{ inputs.should_push }}
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          ref: ${{ github.ref }}
-
-      - name: Download arm64 digests
-        uses: actions/download-artifact@v8
-        with:
-          path: ${{ runner.temp }}/digests
-          pattern: digests-${{ inputs.suffix }}-arm64
-          merge-multiple: true
-
-      - name: Download amd64 digests
-        uses: actions/download-artifact@v8
-        with:
-          path: ${{ runner.temp }}/digests
-          pattern: digests-${{ inputs.suffix }}-amd64
-          merge-multiple: true
-
-      - name: Prepare suffix
-        id: suffix
-        run: |
-          if [ -n "${{ inputs.suffix }}" ]; then
-            echo "SUFFIX=-${{ inputs.suffix }}" >> $GITHUB_ENV
-          else
-            echo "SUFFIX=" >> $GITHUB_ENV
-          fi
-
-      - name: Docker meta
-        id: meta
-        uses: docker/metadata-action@v6
-        with:
-          # TODO(yikun): add more hub image and a note on release policy for container image
-          images: |
-            quay.io/ascend/vllm-ascend
-          # Note for test case
-          # https://github.com/marketplace/actions/docker-metadata-action#typeref
-          # 1. branch job publish per main/*-dev branch commits
-          # 2. main and dev pull_request is build only, so the tag pr-N-openeuler is fine
-          # 3. only pep440 matched tag will be published:
-          #    - v0.7.1 --> v0.7.1-openeuler
-          #    - pre/post/dev: v0.7.1rc1-openeuler/v0.7.1rc1-openeuler/v0.7.1rc1.dev1-openeuler/v0.7.1.post1-openeuler, no latest
-          #      which follow the rule from vLLM with prefix v
-          # TODO(yikun): the post release might be considered as latest release
-          tags: |
-            type=ref,event=branch,prefix=nightly-,suffix=${{ env.SUFFIX }}
-            type=ref,event=pr,prefix=nightly-,suffix=${{ env.SUFFIX }}
-            type=pep440,pattern={{raw}},suffix=${{ env.SUFFIX }}
-          flavor:
-            latest=false
-
-      - name: Login to Quay
-        uses: docker/login-action@v4
-        with:
-          registry: quay.io
-          username: ${{ inputs.quay_username }}
-          password: ${{ secrets.QUAY_PASSWORD }}
-
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v4
-
-      - name: Merge and push multi-arch image
-        env:
-          IMAGE: quay.io/ascend/vllm-ascend
-          TAGS: ${{ steps.meta.outputs.tags }}
-        run: |
-          DIGESTS=$(printf "$IMAGE@sha256:%s " $(ls ${{ runner.temp }}/digests))
-
-          echo "Digests: $DIGESTS"
-          echo "Current tags:"
-          echo "$TAGS"
-
-          for tag in $TAGS; do
-            echo "Creating tag $tag"
-            docker buildx imagetools create \
-              -t "$tag" \
-              $DIGESTS
-          done
--- a/.github/workflows/_unit_test.yaml
+++ b/.github/workflows/_unit_test.yaml
@@ -1,109 +0,0 @@
-name: 'unit test'
-
-on:
-  workflow_call:
-    inputs:
-      vllm:
-        required: true
-        type: string
-      runner:
-        required: true
-        type: string
-      image:
-        required: true
-        type: string
-      type:
-        required: true
-        type: string
-
-jobs:
-  unit-test:
-    name: unit test
-    runs-on: ${{ inputs.runner }}
-    container:
-      image: ${{ inputs.image }}
-      env:
-        VLLM_LOGGING_LEVEL: ERROR
-        VLLM_USE_MODELSCOPE: True
-        SOC_VERSION: ascend910b1
-        MAX_JOBS: 4
-        COMPILE_CUSTOM_KERNELS: 0
-        UV_INDEX_URL: http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-        UV_EXTRA_INDEX_URL: https://mirrors.huaweicloud.com/ascend/repos/pypi
-        UV_INDEX_STRATEGY: unsafe-best-match
-        UV_NO_CACHE: 1
-        UV_SYSTEM_PYTHON: 1
-        UV_PYTHON: python3
-    steps:
-      - name: Install packages
-        run: |
-          sed -Ei 's@(ports|archive).ubuntu.com@cache-service.nginx-pypi-cache.svc.cluster.local:8081@g' /etc/apt/sources.list
-          pip config set global.index-url http://cache-service.nginx-pypi-cache.svc.cluster.local/pypi/simple
-          pip config set global.trusted-host cache-service.nginx-pypi-cache.svc.cluster.local
-          apt-get update -y
-          apt-get install -y python3-pip git vim wget net-tools gcc g++ cmake libnuma-dev curl gnupg2
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-          pip install uv
-
-      - name: Checkout vllm-project/vllm repo
-        uses: actions/checkout@v6
-        with:
-          repository: vllm-project/vllm
-          ref: ${{ inputs.vllm }}
-          path: ./vllm-empty
-
-      - name: Install vllm-project/vllm from source
-        working-directory: ./vllm-empty
-        run: |
-          VLLM_TARGET_DEVICE=empty uv pip install . --extra-index-url https://download.pytorch.org/whl/cpu/
-          uv pip uninstall triton
-
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-
-      - name: Install vllm-project/vllm-ascend
-        run: |
-          pip install uc-manager
-          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/devlib
-          uv pip install -v . --extra-index-url https://download.pytorch.org/whl/cpu/
-          uv pip install -r requirements-dev.txt --extra-index-url https://download.pytorch.org/whl/cpu/
-
-      - name: Run unit test
-        env:
-          VLLM_WORKER_MULTIPROC_METHOD: spawn
-          TORCH_DEVICE_BACKEND_AUTOLOAD: 0
-        shell: bash
-        run: |
-          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/x86_64-linux/devlib
-          set -o pipefail
-          pytest -sv --cov --cov-report=xml:unittests-coverage.xml tests/ut \
-            --ignore tests/ut/model_loader/netloader/test_netloader_elastic.py \
-            --ignore tests/ut/kv_connector/test_remote_prefill_lifecycle.py \
-            --ignore tests/ut/kv_connector/test_remote_decode_lifecycle.py \
-            --ignore tests/ut/core/test_scheduler_dynamic_batch.py \
-            --ignore tests/ut/kv_connector/test_mooncake_connector.py \
-            --ignore tests/ut/worker/test_worker_v1.py \
-            --ignore tests/ut/spec_decode/test_mtp_proposer.py \
-            --ignore tests/ut/kv_connector/test_mooncake_layerwise_connector.py \
-            2>&1 | tee /tmp/unit-test.log
-          exit ${PIPESTATUS[0]}
-
-      - name: Summarize unit test failure
-        if: ${{ always() }}
-        run: |
-          python3 .github/workflows/scripts/ci_log_summary.py \
-            --mode ut \
-            --step-name "Run unit test" \
-            --log-file /tmp/unit-test.log \
-            --output "$GITHUB_STEP_SUMMARY"
-
-      - name: Upload coverage to Codecov
-        # only upload coverage when commits merged
-        if: ${{ inputs.type == 'schedule' }}
-        uses: codecov/codecov-action@v5
-        env:
-          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
-        with:
-          flags: unittests
-          name: vllm-ascend
-          verbose: true
--- a/.github/workflows/bot_issue_manage.yaml
+++ b/.github/workflows/bot_issue_manage.yaml
@@ -1,25 +0,0 @@
-name: "Issue Create/Update Labeler"
-on:
-  issues:
-    types: [opened, edited]
-
-permissions:
-  issues: write
-  contents: read
-
-jobs:
-  triage:
-    runs-on: ubuntu-latest
-    if: |
-      startsWith(github.event.issue.title, '[Bug]:') ||
-      startsWith(github.event.issue.title, '[Installation]:') ||
-      startsWith(github.event.issue.title, '[Usage]:') ||
-      startsWith(github.event.issue.title, '[Doc]:') ||
-      startsWith(github.event.issue.title, '[Misc]:')
-    steps:
-    - uses: github/issue-labeler@v3.4 
-      with:
-        configuration-path: .github/issue-labeler.yml
-        enable-versioned-regex: 0
-        repo-token: ${{ secrets.GITHUB_TOKEN }}
-        include-title: 1
--- a/.github/workflows/bot_pr_create.yaml
+++ b/.github/workflows/bot_pr_create.yaml
@@ -1,113 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-name: PR Create
-
-on:
-  # The PR updated when PR opened and push new commits
-  pull_request_target:
-    types: [opened]
-    branches:
-      - 'main'
-
-permissions:
-  pull-requests: write
-
-jobs:
-  pr-create:
-    permissions:
-      contents: read
-      pull-requests: write
-    name: PR create action
-    runs-on: ubuntu-latest
-    steps:
-      - name: Get vLLM version
-        run: |
-          VLLM_COMMIT=v0.18.0
-          echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> "$GITHUB_ENV"
-
-      - name: Checkout repository
-        uses: actions/checkout@0c366fd6a839edf440554fa01a7085ccba70ac98 # v4.2.2
-
-      - name: Set up Python
-        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
-
-      - name: Get vLLM release version
-        run: |
-          VLLM_VERSION=$(python3 docs/source/conf.py | jq .ci_vllm_version | tr -d '"')
-          echo "VLLM_VERSION=$VLLM_VERSION" >> "$GITHUB_ENV"
-
-      - name: Update PR description
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          PR_NUMBER=${{ github.event.number }}
-          VLLM_VERSION=${{ env.VLLM_VERSION }}
-          VLLM_COMMIT=${{ env.VLLM_COMMIT }}
-          OLD=/tmp/orig_pr_body.txt
-          NEW=/tmp/new_pr_body.txt
-          FINAL=/tmp/final_pr_body.txt
-
-          gh pr view --json body --template "{{.body}}" "${PR_NUMBER}" > "${OLD}"
-          cp "${OLD}" "${NEW}"
-
-          # Remove notes in pr description and add vLLM version and commit
-          sed -i '/<!--/,/-->/d' "${NEW}"
-          sed -i '/- vLLM .*$/d' "${NEW}"
-          {
-              echo ""
-              echo "- vLLM version: $VLLM_VERSION"
-              echo "- vLLM main: $VLLM_COMMIT"
-          } >> "${NEW}"
-
-          # Remove redundant empty lines
-          uniq "${NEW}" > "${FINAL}"
-
-          # Run this only if ${NEW} is different than ${OLD}
-          if ! cmp -s "${OLD}" "${FINAL}"; then
-              echo
-              echo "Updating PR body:"
-              echo
-              cat "${NEW}"
-              gh pr edit --body-file "${FINAL}" "${PR_NUMBER}"
-          else
-              echo "No changes needed"
-          fi
-
-      - name: Label the PR
-        uses: actions/labeler@v6
-        with:
-          repo-token: ${{ secrets.GITHUB_TOKEN }}
-          configuration-path: .github/labeler.yml
-          sync-labels: true
-
-      - name: Remind to run full CI on PR
-        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
-        with:
-          script: |
-            github.rest.issues.createComment({
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              issue_number: context.issue.number,
-              body: '👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌\n\n' +
-                '- A PR should do only one thing, smaller PRs enable faster reviews.\n' +
-                '- Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.\n' +
-                '- Write the commit message by fulfilling the PR description to help reviewer and future developers understand.\n\n' +
-                'If CI fails, you can run linting and testing checks locally according [Contributing](https://docs.vllm.ai/projects/ascend/zh-cn/latest/developer_guide/contribution/index.html) and [Testing](https://docs.vllm.ai/projects/ascend/zh-cn/latest/developer_guide/contribution/testing.html).'
-            })
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.github/workflows/dockerfiles/Dockerfile.buildwheel.310p
+++ b/.github/workflows/dockerfiles/Dockerfile.buildwheel.310p
@@ -1,45 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-ARG PY_VERSION=3.11
-FROM quay.io/ascend/manylinux:8.5.1-310p-manylinux_2_28-py${PY_VERSION}
-
-ARG SOC_VERSION="ascend310p1"
-
-# Define environments
-ENV DEBIAN_FRONTEND=noninteractive
-ENV SOC_VERSION=$SOC_VERSION
-RUN yum update -y && \
-    yum install -y python3-pip git vim wget net-tools gcc gcc-c++ make cmake numactl-devel && \
-    rm -rf /var/cache/yum
-
-WORKDIR /workspace
-
-COPY . /workspace/vllm-ascend/
-
-# Install req
-RUN python3 -m pip install -r vllm-ascend/requirements.txt --extra-index https://download.pytorch.org/whl/cpu/ && \
-    python3 -m pip install twine attrs psutil
-
-# Install vllm-ascend
-RUN source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
-    source /usr/local/Ascend/nnal/atb/set_env.sh && \
-    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
-    cd vllm-ascend && \
-    python3 setup.py bdist_wheel && \
-    ls -l dist
-
-CMD ["/bin/bash"]
--- a/.github/workflows/dockerfiles/Dockerfile.buildwheel.a3
+++ b/.github/workflows/dockerfiles/Dockerfile.buildwheel.a3
@@ -1,45 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-ARG PY_VERSION=3.11
-FROM quay.io/ascend/manylinux:8.5.1-a3-manylinux_2_28-py${PY_VERSION}
-
-ARG SOC_VERSION="ascend910_9391"
-
-# Define environments
-ENV DEBIAN_FRONTEND=noninteractive
-ENV SOC_VERSION=$SOC_VERSION
-RUN yum update -y && \
-    yum install -y python3-pip git vim wget net-tools gcc gcc-c++ make cmake numactl-devel && \
-    rm -rf /var/cache/yum
-
-WORKDIR /workspace
-
-COPY . /workspace/vllm-ascend/
-
-# Install req
-RUN python3 -m pip install -r vllm-ascend/requirements.txt --extra-index https://download.pytorch.org/whl/cpu/ && \
-    python3 -m pip install twine attrs psutil
-
-# Install vllm-ascend
-RUN source /usr/local/Ascend/ascend-toolkit/set_env.sh && \
-    source /usr/local/Ascend/nnal/atb/set_env.sh && \
-    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
-    cd vllm-ascend && \
-    python3 setup.py bdist_wheel && \
-    ls -l dist
-
-CMD ["/bin/bash"]
--- a/.github/workflows/dockerfiles/Dockerfile.lint
+++ b/.github/workflows/dockerfiles/Dockerfile.lint
@@ -1,46 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-FROM ascendai/python:3.11-ubuntu22.04
-
-ARG TARGETARCH
-
-RUN apt-get update -y && \
-    apt-get install -y curl git gcc g++ cmake libnuma-dev jq wget xz-utils shellcheck && \
-    rm -rf /var/cache/apt/* && \
-    rm -rf /var/lib/apt/lists/*
-
-
-ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
-# For lint purpose, actually we need make a main2main matching.
-ARG VLLM_COMMIT=v0.18.0
-RUN git clone $VLLM_REPO /vllm-workspace/vllm && \
-    cd /vllm-workspace/vllm && \
-    git checkout $VLLM_COMMIT
-
-# # Install vLLM common dependencies
-RUN python3 -m pip install -r /vllm-workspace/vllm/requirements/common.txt --extra-index https://download.pytorch.org/whl/cpu/ && \
-    python3 -m pip uninstall -y triton && \
-    python3 -m pip cache purge
-
-COPY . /vllm-workspace/vllm-ascend/
-
-RUN pip install -r /vllm-workspace/vllm-ascend/requirements-dev.txt --extra-index-url https://download.pytorch.org/whl/cpu && \
-    pip cache purge && \
-    rm -fr /vllm-workspace/
-
-CMD ["/bin/bash"]
--- a/.github/workflows/dockerfiles/Dockerfile.nightly.a2
+++ b/.github/workflows/dockerfiles/Dockerfile.nightly.a2
@@ -1,46 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-FROM quay.io/ascend/vllm-ascend:nightly-releases-v0.18.0
-
-ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
-ARG AIS_BENCH_TAG="v3.0-20250930-master"
-ARG AIS_BENCH_URL="https://gitee.com/aisbench/benchmark.git"
-ARG GITEE_USERNAME=""
-ARG GITEE_TOKEN=""
-
-# Define environments
-ENV DEBIAN_FRONTEND=noninteractive
-
-WORKDIR /workspace
-
-RUN pip config set global.index-url ${PIP_INDEX_URL}
-
-# Install requirements-dev.txt for tests
-RUN export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi && \
-    cd /vllm-workspace/vllm-ascend && \
-    python3 -m pip install -r requirements-dev.txt && \
-    python3 -m pip cache purge
-
-# Install benchmark tools
-RUN CLONE_URL=$(echo "${AIS_BENCH_URL}" | sed "s|https://|https://${GITEE_USERNAME}:${GITEE_TOKEN}@|") && \
-    git clone -b ${AIS_BENCH_TAG} --depth 1 "${CLONE_URL}" /vllm-workspace/vllm-ascend/benchmark && \
-    cd /vllm-workspace/vllm-ascend/benchmark && \
-    pip install -e . -r requirements/api.txt -r requirements/extra.txt && \
-    python3 -m pip cache purge
-
-CMD ["/bin/bash"]
--- a/.github/workflows/dockerfiles/Dockerfile.nightly.a3
+++ b/.github/workflows/dockerfiles/Dockerfile.nightly.a3
@@ -1,46 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-FROM quay.io/ascend/vllm-ascend:nightly-releases-v0.18.0-a3
-
-ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
-ARG AIS_BENCH_TAG="v3.0-20250930-master"
-ARG AIS_BENCH_URL="https://gitee.com/aisbench/benchmark.git"
-ARG GITEE_USERNAME=""
-ARG GITEE_TOKEN=""
-
-# Define environments
-ENV DEBIAN_FRONTEND=noninteractive
-
-WORKDIR /workspace
-
-RUN pip config set global.index-url ${PIP_INDEX_URL}
-
-# Install requirements-dev.txt for tests
-RUN export PIP_EXTRA_INDEX_URL=https://mirrors.huaweicloud.com/ascend/repos/pypi && \
-    cd /vllm-workspace/vllm-ascend && \
-    python3 -m pip install -r requirements-dev.txt && \
-    python3 -m pip cache purge
-
-# Install benchmark tools
-RUN CLONE_URL=$(echo "${AIS_BENCH_URL}" | sed "s|https://|https://${GITEE_USERNAME}:${GITEE_TOKEN}@|") && \
-    git clone -b ${AIS_BENCH_TAG} --depth 1 "${CLONE_URL}" /vllm-workspace/vllm-ascend/benchmark && \
-    cd /vllm-workspace/vllm-ascend/benchmark && \
-    pip install -e . -r requirements/api.txt -r requirements/extra.txt && \
-    python3 -m pip cache purge
-
-CMD ["/bin/bash"]
--- a/.github/workflows/labled_download_model.yaml
+++ b/.github/workflows/labled_download_model.yaml
@@ -1,87 +0,0 @@
-name: 'model downloader'
-
-on:
-  pull_request:
-    paths:
-      - '.github/workflows/misc/model_list.json'
-      - '.github/workflows/labled_download_model.yaml'
-    types: [labeled, synchronize]
-
-defaults:
-  run:
-    shell: bash -el {0}
-
-concurrency:
-  group: ascend-${{ github.workflow_ref }}
-  cancel-in-progress: true
-
-jobs:
-  download-models:
-    if: contains(github.event.pull_request.labels.*.name, 'model-download')
-    name: Download models from ModelScope
-    runs-on: ${{ matrix.runner }}
-    strategy:
-      matrix:
-        runner: [linux-aarch64-a2b3-0, linux-aarch64-a3-0]
-    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-cpu
-
-    steps:
-      - name: Install dependencies
-        run: |
-          apt-get update -y && apt-get install git jq -y
-          pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
-          pip install modelscope
-
-      - name: Show Current Disk Usage
-        run: |
-          df -h /root/.cache | grep -v Filesystem | \
-          awk '{print "Mount point: "$6, "Total: "$2, "Used: "$3, "Available: "$4, "Usage: "$5}'
-      - name: Checkout PR branch
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 0
-
-      - name: Extract new models from PR
-        id: diff
-        run: |
-          set -euo pipefail
-
-          git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
-          JSON_PATH=".github/workflows/misc/model_list.json"
-
-          git fetch origin main
-
-          git show origin/main:$JSON_PATH > /tmp/models_main.json || \
-            echo '{"models":[]}' > /tmp/models_main.json
-
-          cp $JSON_PATH /tmp/models_pr.json
-
-          jq -r '
-            (.models // []) as $pr
-            | input
-            | (.models // []) as $main
-            | ($pr - $main)[]
-          ' /tmp/models_pr.json /tmp/models_main.json > /tmp/new_models.txt
-
-          echo "New models:"
-          cat /tmp/new_models.txt || true
-
-      - name: Download new models (CLI)
-        run: |
-          set -euo pipefail
-          if [ ! -s /tmp/new_models.txt ]; then
-            echo "No new models to download."
-            exit 0
-          fi
-
-          while read -r model; do
-            [ -z "$model" ] && continue
-            echo "▶ Downloading $model"
-            modelscope download "$model"
-          done < /tmp/new_models.txt
-
-      - name: Summary
-        run: |
-          echo "Downloaded models:"
-          cat /tmp/new_models.txt || echo "No new models"
--- a/.github/workflows/matchers/markdownlint.json
+++ b/.github/workflows/matchers/markdownlint.json
@@ -1,17 +0,0 @@
-{
-  "problemMatcher": [
-    {
-      "owner": "markdownlint",
-      "pattern": [
-        {
-          "regexp": "^([^:]*):(\\d+):?(\\d+)?\\s([\\w-\\/]*)\\s(.*)$",
-          "file": 1,
-          "line": 2,
-          "column": 3,
-          "code": 4,
-          "message": 5
-        }
-      ]
-    }
-  ]
-}
--- a/.github/workflows/misc/model_list.json
+++ b/.github/workflows/misc/model_list.json
@@ -1,248 +0,0 @@
-{
-    "models": [
-      "AngelSlim/Qwen3-32B_eagle3",
-      "AngelSlim/Qwen3-a3B_eagle3",
-      "Anionex/Qwen3-1.7B-W4A8-V1",
-      "ArthurZ/ilama-3.2-1B",
-      "BAAI/bge-base-en-v1.5",
-      "BAAI/bge-large-zh-v1.5",
-      "BAAI/bge-m3",
-      "BAAI/bge-multilingual-gemma2",
-      "BAAI/bge-reranker-large",
-      "BAAI/bge-reranker-v2-m3",
-      "BAAI/bge-small-en-v1.5",
-      "BAAI/kernel_meta",
-      "ByteDance-Seed/BAGEL-7B-MoT",
-      "DeepSeek-ai/DeepSeek-OCR",
-      "DevQuasar/deepseek-ai.DeepSeek-V3.2-BF16",
-      "Eco-Tech/DeepSeek-V3.1-w8a8-mtp-QuaRot",
-      "Eco-Tech/Qwen3-30B-A3B-w8a8",
-      "Eco-Tech/Kimi-K2.5-W4A8",
-      "Howeee/Qwen2.5-1.5B-apeach",
-      "IntervitensInc/pangu-pro-moe-model",
-      "IntervitensInc/pangu-pro-moe-modelt",
-      "JackFram/llama-160m",
-      "JackFram/llama-68m",
-      "Kwai-Keye/Keye-VL-8B-Preview",
-      "LLM-Research/Llama-3.2-11B-Vision",
-      "LLM-Research/Llama-3.2-1B-Instruct",
-      "LLM-Research/Llama-3.2-3B-Instruct",
-      "LLM-Research/Meta-Llama-3-8B-Instruct",
-      "LLM-Research/Meta-Llama-3.1-8B-Instruct",
-      "LLM-Research/Molmo-7B-D-0924",
-      "LLM-Research/Phi-4-mini-instruct",
-      "LLM-Research/gemma-2-9b-it",
-      "LLM-Research/gemma-3-4b-it",
-      "LLM-Research/kernel_meta",
-      "OpenBMB/MiniCPM-2B-dpo-bf16",
-      "OpenBMB/MiniCPM-Llama3-V-2_5",
-      "OpenBMB/MiniCPM3-4B",
-      "OpenBMB/MiniCPM4-0.5B",
-      "OpenGVLab/InternVL2-8B",
-      "OpenGVLab/InternVL2_5-8B",
-      "OpenGVLab/InternVL3-78B",
-      "OpenGVLab/InternVL3-8B",
-      "OpenGVLab/InternVL3_5-8B",
-      "OpenGVLab/InternVL3_5-8B-hf",
-      "PaddlePaddle/ERNIE-4.5-21B-A3B-PT",
-      "PaddlePaddle/PaddleOCR-VL",
-      "QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ",
-      "Qwen/QwQ-32B",
-      "Qwen/QwQ-32B-AWQ",
-      "Qwen/Qwen",
-      "Qwen/Qwen-Image",
-      "Qwen/Qwen1.5-MoE-A2.7B",
-      "Qwen/Qwen2-1.5B-Instruct",
-      "Qwen/Qwen2-7B",
-      "Qwen/Qwen2-7B-Instruct",
-      "Qwen/Qwen2-7B-W8A8",
-      "Qwen/Qwen2-Audio-7B-Instruct",
-      "Qwen/Qwen2-VL-2B-Instruct",
-      "Qwen/Qwen2-VL-7B",
-      "Qwen/Qwen2-VL-7B-Instruct",
-      "Qwen/Qwen2.5-0.5B-Instruct",
-      "Qwen/Qwen2.5-0.5B-Instruct-AWQ",
-      "Qwen/Qwen2.5-1.5B-Instruct",
-      "Qwen/Qwen2.5-14B-Instruct",
-      "Qwen/Qwen2.5-32B-Instruct",
-      "Qwen/Qwen2.5-7B",
-      "Qwen/Qwen2.5-7B-Instruct",
-      "Qwen/Qwen2.5-7B-Instruct-1M",
-      "Qwen/Qwen2.5-7b-Instruct",
-      "Qwen/Qwen2.5-Math-PRM-7B",
-      "Qwen/Qwen2.5-Omni-3B",
-      "Qwen/Qwen2.5-Omni-7B",
-      "Qwen/Qwen2.5-VL-32B-Instruct",
-      "Qwen/Qwen2.5-VL-3B-Instruct",
-      "Qwen/Qwen2.5-VL-7B-Instruct",
-      "Qwen/Qwen2.7-7B",
-      "Qwen/Qwen3-0.6B",
-      "Qwen/Qwen3-0.6B-Base",
-      "Qwen/Qwen3-235B-A22B",
-      "Qwen/Qwen3-235B-A22B-Instruct-2507",
-      "Qwen/Qwen3-30B-A3B",
-      "Qwen/Qwen3-30B-A3B-Instruct-2507",
-      "Qwen/Qwen3-30B-A3B-W8A8",
-      "Qwen/Qwen3-32B",
-      "Qwen/Qwen3-32B-AWQ",
-      "Qwen/Qwen3-8B",
-      "Qwen/Qwen3-8B-A3B",
-      "Qwen/Qwen3-8B-Base",
-      "Qwen/Qwen3-8B-W8A8",
-      "Qwen/Qwen3-8B-w4a8",
-      "Qwen/Qwen3-8B-w8a8",
-      "Qwen/Qwen3-Base",
-      "Qwen/Qwen3-Coder-30B-A3B-Instruct",
-      "Qwen/Qwen3-Embedding-0.6B",
-      "Qwen/Qwen3-Embedding-8B",
-      "Qwen/Qwen3-Next-80B-A3B-Instruct",
-      "Qwen/Qwen3-Next-A3B-Instruct",
-      "Qwen/Qwen3-Omni-30B-A3B-Instruct",
-      "Qwen/Qwen3-Reranker-0.6B",
-      "Qwen/Qwen3-VL-235B-A22B-Instruct",
-      "Qwen/Qwen3-VL-2B-Instruct",
-      "Qwen/Qwen3-VL-30B-A3B-Instruct",
-      "Qwen/Qwen3-VL-32B-Instruct",
-      "Qwen/Qwen3-VL-8B-Instruct",
-      "Qwen/Qwen3.5-27B",
-      "Qwen/Qwen3.5-35B-A3B",
-      "RedHatAI/Qwen3-32B-speculator.eagle3",
-      "RedHatAI/Qwen3-8B-speculator.eagle3",
-      "Shanghai_AI_Laboratory/internlm--chat-7b",
-      "Shanghai_AI_Laboratory/internlm-7b",
-      "Shanghai_AI_Laboratory/internlm-7b-chat",
-      "Shanghai_AI_Laboratory/internlm-7bi-chat",
-      "Shanghai_AI_Laboratory/internlm-chat-7b",
-      "Tencent-Hunyuan/HunyuanOCR",
-      "Tengyunw/qwen3_8b_eagle3",
-      "Tongyi-MAI/Z-Image-Turbo",
-      "baichuan-inc/Baichuan2-7B-Chat",
-      "billy800/Qwen3-30B-A3B-Instruct-2507-AWQ",
-      "deepseek-ai/DeepSeek-OCR",
-      "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
-      "deepseek-ai/DeepSeek-V2",
-      "deepseek-ai/DeepSeek-V2-Lite",
-      "deepseek-ai/DeepSeek-V2-Lite-Chat",
-      "deepseek-ai/Deepseek-V2-Lite",
-      "dengcao/ms-marco-MiniLM-L6-v2",
-      "facebook/opt-125m",
-      "google/gemma-2-9b",
-      "google/gemma-3n-E2B-it",
-      "google/siglip2-base-patch16-224",
-      "hmellor/Ilama-3.2-1B",
-      "ibm-research/PowerMoE-3b",
-      "intfloat/multilingual-e5-small",
-      "jason9693/Qwen2.5-1.5B-apeach",
-      "jinaai/jina-embeddings-v3",
-      "jinaai/jina-embeddings-v4",
-      "jinaai/jina-embeddings-v4-vllm-code",
-      "jinaai/jina-embeddings-v4-vllm-retrieval",
-      "kernel_meta/kernel_meta_temp_2116872659434949099",
-      "llava-hf/LLaVA-NeXT-Video-7B-hf",
-      "llava-hf/llava-1.5-7b-hf",
-      "llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
-      "llava-hf/llava-v1.6-mistral-7b-hf",
-      "meta-llama/Llama-3.2-1B-Instruct",
-      "mistralai/Ministral-3-3B-Instruct-2512-BF16",
-      "mistralai/Ministral-3-8B-Instruct-2512-BF16",
-      "mistralai/Mistral-7B-Instruct-v0.1",
-      "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
-      "mlx-community/DeepSeek-V3-3bit-bf16",
-      "moonshotai/Kimi-K2-Thinking",
-      "moonshotai/Kimi-Linear-48B-A3B-Instruct",
-      "neuralmagic/Qwen2.5-3B-quantized.w8a8",
-      "MNN/Qwen3-VL-8B-Instruct-Eagle3",
-      "nv-community/audio-flamingo-3",
-      "nv-community/audio-flamingo-3-hf",
-      "nvidia/audio-flamingo-3-hf",
-      "openbmb/MiniCPM-2B-sft-bf16",
-      "openbmb/MiniCPM-V-2_6",
-      "openbmb/MiniCPM-V-4_5",
-      "opendatalab/MinerU2.5-2509-1.2B",
-      "rhymes-ai/Aria",
-      "sentence-transformers/all-MiniLM-L12-v2",
-      "tencent/HunyuanOCR",
-      "unsloth/DeepSeek-V3.1-BF16",
-      "unsloth/Kimi-K2-Thinking-BF16",
-      "unsloth/gpt-oss-20b-BF16",
-      "vllm-ascend/DeepSeek-R1-0528-W8A8",
-      "vllm-ascend/DeepSeek-R1-W8A8",
-      "vllm-ascend/DeepSeek-R1-fa3-pruning",
-      "vllm-ascend/DeepSeek-R1-w4a8-pruning",
-      "vllm-ascend/DeepSeek-V2-Lite",
-      "vllm-ascend/DeepSeek-V2-Lite-W8A8",
-      "vllm-ascend/DeepSeek-V3-Pruning",
-      "vllm-ascend/DeepSeek-V3-W4A8-Pruing",
-      "vllm-ascend/DeepSeek-V3-W8A8",
-      "vllm-ascend/DeepSeek-V3.1",
-      "vllm-ascend/DeepSeek-V3.1-W4A8-puring",
-      "vllm-ascend/DeepSeek-V3.1-W8A8",
-      "vllm-ascend/DeepSeek-V3.2-W8A8",
-      "vllm-ascend/DeepSeek-V3.2-W8A8-Pruning",
-      "vllm-ascend/EAGLE-LLaMA3.1-Instruct-8B",
-      "vllm-ascend/EAGLE3-LLaMA3.1-Instruct-8B",
-      "vllm-ascend/Kimi-K2-Instruct-W8A8",
-      "vllm-ascend/Kimi-K2-Thinking-Pruning",
-      "vllm-ascend/Llama-2-7b-hf",
-      "vllm-ascend/Llama-3.2-3B-Instruct",
-      "vllm-ascend/Meta-Llama-3-8B-Instruct",
-      "vllm-ascend/QwQ-32B-W8A8",
-      "vllm-ascend/QwQ-32B-w8a8",
-      "vllm-ascend/Qwen2-7B-W8A8",
-      "vllm-ascend/Qwen2-VL-7B-W8A8",
-      "vllm-ascend/Qwen2.5-0.5B-Instruct-W8A8",
-      "vllm-ascend/Qwen2.5-0.5B-Instruct-W8A8-new",
-      "vllm-ascend/Qwen2.5-0.5B-Instruct-fa3",
-      "vllm-ascend/Qwen2.5-0.5B-Instruct-w8a8",
-      "vllm-ascend/Qwen2.5-Omni-7B",
-      "vllm-ascend/Qwen3-0.6B",
-      "vllm-ascend/Qwen3-0.6B-Instruct-W8A8",
-      "vllm-ascend/Qwen3-0.6B-W8A16",
-      "vllm-ascend/Qwen3-0.6B-W8A8",
-      "vllm-ascend/Qwen3-1.7B-W4A8-V1",
-      "vllm-ascend/Qwen3-235B-A22B",
-      "vllm-ascend/Qwen3-235B-A22B-W4A8",
-      "vllm-ascend/Qwen3-235B-A22B-W8A8",
-      "vllm-ascend/Qwen3-235B-A22B-w8a8",
-      "vllm-ascend/Qwen3-30B-A3B",
-      "vllm-ascend/Qwen3-a3B_eagle3",
-      "vllm-ascend/Qwen3-30B-A3B-Puring",
-      "vllm-ascend/Qwen3-30B-A3B-W8A8",
-      "vllm-ascend/Qwen3-30B-A3B-W8A8-Pruning",
-      "vllm-ascend/Qwen3-30B-A3B-W8A8-QuaRot",
-      "vllm-ascend/Qwen3-30B-A3B-Instruct-2507-quantized.w8a8",
-      "vllm-ascend/Qwen3-30B-A3B-Instruct-2507-quantized.w4a8",
-      "vllm-ascend/Qwen3-32B-W4A4",
-      "vllm-ascend/Qwen3-32B-W8A8",
-      "vllm-ascend/Qwen3-32B-W8A8-QuaRot",
-      "vllm-ascend/Qwen3-8B",
-      "vllm-ascend/Qwen3-8B-W4A8",
-      "vllm-ascend/Qwen3-8B-W8A8",
-      "vllm-ascend/Qwen3-Next-80B-A3B-Instruct-W8A8",
-      "vllm-ascend/Qwen3-Next-80B-A3B-Instruct-W8A8-Pruning",
-      "vllm-ascend/Qwen3-Omni-30B-A3B-Thinking",
-      "vllm-ascend/Qwen3-VL-8B-Instruct",
-      "vllm-ascend/Qwen3-VL-8B-Instruct-W8A8",
-      "vllm-ascend/TinyLlama-1.1B-Chat-v0.3",
-      "vllm-ascend/benchmark",
-      "vllm-ascend/ilama-3.2-1B",
-      "vllm-ascend/ilama-text2sql-spider",
-      "vllm-ascend/kernel_meta",
-      "vllm-ascend/llama-160m",
-      "vllm-ascend/llama-160m-accelerator",
-      "vllm-ascend/llama-2-7b-sql-lora-test",
-      "vllm-ascend/llama-68m",
-      "vllm-ascend/llama32-3b-text2sql-spider",
-      "vllm-ascend/pangu-pro-moe-pruing",
-      "vllm-ascend/self_cognition_Alice",
-      "vllm-ascend/self_cognition_Bob",
-      "vllm-ascend/tinyllama-colorist-lora",
-      "vllm-ascend/vllm-eagle-llama-68m-random",
-      "wemaster/deepseek_mtp_main_random_bf16",
-      "wemaster/deepseek_mtp_main_random_w8a8_part",
-      "xlangai/OpenCUA-7B",
-      "Eco-Tech/GLM-5-w4a8",
-      "Eco-Tech/GLM-4.7-W8A8-floatmtp",
-      "MiniMax/MiniMax-M2.5"
-    ]
-  }
--- a/.github/workflows/nightly_image_build.yaml
+++ b/.github/workflows/nightly_image_build.yaml
@@ -1,44 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-# This workflow builds nightly images as a layer-cache warm-up at 20:00 Beijing time.
-# The nightly test workflows (Nightly-A2, Nightly-A3) each rebuild the image fresh
-# before running tests, so this schedule only serves to pre-populate the build cache.
-name: Nightly Image Build Schedule
-
-on:
-  workflow_dispatch:
-    # Next step: Add more inputs here if needed, e.g. vllm version, vllm-ascend version, image tag, etc.
-
-jobs:
-  build-a2:
-    uses: ./.github/workflows/_nightly_image_build.yaml
-    with:
-      target: a2
-    secrets:
-      HW_USERNAME: ${{ secrets.HW_USERNAME }}
-      HW_TOKEN: ${{ secrets.HW_TOKEN }}
-      GITEE_TOKEN: ${{ secrets.GITEE_TOKEN }}
-
-  build-a3:
-    uses: ./.github/workflows/_nightly_image_build.yaml
-    with:
-      target: a3
-    secrets:
-      HW_USERNAME: ${{ secrets.HW_USERNAME }}
-      HW_TOKEN: ${{ secrets.HW_TOKEN }}
-      GITEE_TOKEN: ${{ secrets.GITEE_TOKEN }}
--- a/.github/workflows/pr_close_cancel_job.yaml
+++ b/.github/workflows/pr_close_cancel_job.yaml
@@ -1,46 +0,0 @@
-name: Cancel runs on PR close
-on:
-  pull_request:
-    types: [closed]
-
-permissions:
-  actions: write
-  contents: read
-
-jobs:
-  cancel:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/github-script@v8
-        with:
-          github-token: ${{ github.token }}
-          script: |
-            const { owner, repo } = context.repo;
-            const branch = context.payload.pull_request.head.ref;
-
-            const statuses = ["in_progress", "queued", "waiting", "pending", "requested"];
-            for (const status of statuses) {
-              let page = 1;
-              while (true) {
-                const resp = await github.rest.actions.listWorkflowRunsForRepo({
-                  owner, repo, branch, status, per_page: 100, page
-                });
-
-                const runs = resp.data.workflow_runs;
-                if (!runs.length) break;
-
-                for (const run of runs) {
-                  if (run.id === context.runId) continue; // don't cancel this workflow
-                  try {
-                    await github.rest.actions.cancelWorkflowRun({ owner, repo, run_id: run.id });
-                    core.info(`Cancel requested: ${run.html_url}`);
-                  } catch (e) {
-                    // common reasons: already completed (409) or insufficient permissions (403)
-                    core.warning(`Failed to cancel ${run.html_url}: ${e.message}`);
-                  }
-                }
-
-                if (runs.length < 100) break;
-                page++;
-              }
-            }
--- a/.github/workflows/schedule_codecov_refresh.yaml
+++ b/.github/workflows/schedule_codecov_refresh.yaml
@@ -1,42 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-name: Refresh codecov
-
-on:
-  schedule:
-    # UTC+8: 8am, 12pm, 16pm
-    - cron: '0 0,4,8 * * *'
-
-# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
-# declared as "shell: bash -el {0}" on steps that need to be properly activated.
-# It's used to activate ascend-toolkit environment variables.
-defaults:
-  run:
-    shell: bash -el {0}
-
-jobs:
-  refresh-codecov:
-    name: refresh codecov
-    strategy:
-      matrix:
-        vllm_version: [v0.18.0]
-    uses: ./.github/workflows/_unit_test.yaml
-    with:
-      vllm: ${{ matrix.vllm_version }}
-      runner: linux-amd64-cpu-16-hk
-      image: quay.nju.edu.cn/ascend/cann:8.2.rc2-910b-ubuntu22.04-py3.11
-      type: schedule
--- a/.github/workflows/schedule_image_build_and_push.yaml
+++ b/.github/workflows/schedule_image_build_and_push.yaml
@@ -1,73 +0,0 @@
-# This is a docker build check and publish job:
-# 1. PR Triggered docker image build check
-#   - is for image build check
-#   - Enable on main/*-dev branch
-#   - push: ${{ github.event_name != 'pull_request' }} ==> false
-# 2. branches push trigger image publish
-#   - is for branch/dev/nightly image
-#   - commits are merge into main/*-dev  ==> vllm-ascend:main / vllm-ascend:*-dev
-# 3. tags push trigger image publish
-#   - is for final release image
-#   - Publish when tag with v* (pep440 version)  ===>  vllm-ascend:v1.2.3 / vllm-ascend:v1.2.3rc1
-name: Image Build and Push
-on:
-  pull_request:
-    branches:
-      - 'releases/*'
-    paths: 
-      - 'Dockerfile*'
-      - '.github/workflows/schedule_image_build_and_push.yaml'
-    types: [ labeled, synchronize ]
-  workflow_dispatch:
-    inputs:
-      tag:
-        description: 'Docker tag for build results'
-        default: main
-        required: true
-        type: choice
-        options:
-          - main
-          - v0.17.0rc1
-          - v0.16.0rc1
-          - v0.15.0rc1
-          - v0.14.0rc1
-          - v0.13.0rc3
-
-concurrency:
-  group: ${{ github.workflow }}-${{ github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  image_build:
-    name: Image Build and Push
-    if: github.event_name != 'pull_request' || contains(github.event.pull_request.labels.*.name, 'image-build')
-    strategy:
-      matrix:
-        build_meta:
-          - name: A2 Ubuntu
-            dockerfile: Dockerfile
-            suffix: ''
-          - name: A2 openeuler
-            dockerfile: Dockerfile.openEuler
-            suffix: 'openeuler'
-          - name: A3 Ubuntu
-            dockerfile: Dockerfile.a3
-            suffix: 'a3'
-          - name: A3 openEuler
-            dockerfile: Dockerfile.a3.openEuler
-            suffix: 'a3-openeuler'
-          - name: 310P Ubuntu
-            dockerfile: Dockerfile.310p
-            suffix: '310p'
-          - name: 310P openEuler
-            dockerfile: Dockerfile.310p.openEuler
-            suffix: '310p-openeuler'
-    uses: ./.github/workflows/_schedule_image_build.yaml
-    with:
-      dockerfile: ${{ matrix.build_meta.dockerfile }}
-      suffix: ${{ matrix.build_meta.suffix }}
-      quay_username: ${{ vars.QUAY_USERNAME }}
-      should_push: ${{ github.repository_owner == 'vllm-project' && github.event_name != 'pull_request' }}
-      workflow_dispatch_tag: ${{ inputs.tag }}
-    secrets:
-      QUAY_PASSWORD: ${{ secrets.QUAY_PASSWORD }}
--- a/.github/workflows/schedule_lint_image_build.yaml
+++ b/.github/workflows/schedule_lint_image_build.yaml
@@ -1,89 +0,0 @@
-name: 'Image build lint'
-on:
-  schedule:
-    # Runs at 00:00 UTC+8 every day
-    - cron: '0 20 * * *' 
-  workflow_dispatch:
-    inputs:
-      vllm_hash:
-        description: 'vLLM base hash'
-        default: main
-        required: true
-        type: string
-  push:
-    paths:
-      - '.github/workflows/dockerfiles/Dockerfile.lint'
-      - 'requirements-lint.txt'
-      - 'requirements-dev.txt'
-      - 'requirements.txt'
-
-# only cancel in-progress runs of the same workflow
-concurrency:
-  group: ${{ github.workflow }}-${{ github.ref }}
-  cancel-in-progress: true
-
-jobs:
-
-  build:
-    name: vllm-ascend lint image build
-    runs-on: ubuntu-latest
-    steps:
-    - uses: actions/checkout@v6
-      with:
-        fetch-depth: 0
-        persist-credentials: false
-
-    - name: Print
-      run: |
-        lscpu
-    - name: Docker meta
-      id: meta
-      uses: docker/metadata-action@v6
-      with:
-        images: |
-          quay.io/ascend-ci/vllm-ascend
-        tags: lint
-        flavor:
-          latest=false
-
-    - name: Build - Set up QEMU
-      uses: docker/setup-qemu-action@v3
-
-    - name: Build - Set up Docker Buildx
-      uses: docker/setup-buildx-action@v4
-
-    - name: Publish - Login to Quay Container Registry
-      if: ${{ github.repository_owner == 'vllm-project' }}
-      uses: docker/login-action@v4
-      with:
-        registry: quay.io
-        username: ${{ vars.QUAY_CI_USERNAME }}
-        password: ${{ secrets.QUAY_CI_PASSWORD }}
-
-    - name: Build and push
-      if: ${{ github.event_name != 'workflow_dispatch' }}
-      uses: docker/build-push-action@v7
-      with:
-        # For now, we only build amd64 lint image
-        platforms: 'linux/amd64'
-        context: .
-        file: .github/workflows/dockerfiles/Dockerfile.lint
-        push: true
-        labels: ${{ steps.meta.outputs.labels }}
-        tags: ${{ steps.meta.outputs.tags }}
-        provenance: false
-
-    - name: Build and push
-      if: ${{ github.event_name == 'workflow_dispatch' }}
-      uses: docker/build-push-action@v7
-      with:
-        # For now, we only build amd64 lint image
-        platforms: 'linux/amd64'
-        context: .
-        file: .github/workflows/dockerfiles/Dockerfile.lint
-        push: true
-        labels: ${{ steps.meta.outputs.labels }}
-        tags: ${{ steps.meta.outputs.tags }}
-        provenance: false
-        build-args: |
-          VLLM_HASH=${{ inputs.vllm_hash }}
--- a/.github/workflows/schedule_nightly_test_a2.yaml
+++ b/.github/workflows/schedule_nightly_test_a2.yaml
@@ -1,246 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-# This workflow related to the resources atlas 800 A2
-# We will not limit the concurrency of jobs on A2
-name: Nightly-A2
-
-on:
-  schedule:
-      # Run test at 23:45 Beijing time (UTC+8)
-      - cron: "45 15 * * *"
-  workflow_dispatch:
-  pull_request:
-    branches:
-      - 'main'
-      - '*-dev'
-      - 'releases/v*'
-    types: [labeled, synchronize]
-
-permissions:
-  contents: read
-  pull-requests: read
-  issues: read
-
-# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
-# declared as "shell: bash -el {0}" on steps that need to be properly activated.
-# It's used to activate ascend-toolkit environment variables.
-defaults:
-  run:
-    shell: bash -el {0}
-
-# only cancel in-progress runs of the same workflow
-concurrency:
-  group: ascend-nightly-${{ github.ref }}-a2
-  cancel-in-progress: true
-
-jobs:
-  parse-trigger:
-    name: Parse trigger and determine test scope
-    if: >-
-      github.event_name == 'schedule' ||
-      github.event_name == 'workflow_dispatch' ||
-      contains(github.event.pull_request.labels.*.name, 'nightly-test')
-    uses: ./.github/workflows/_parse_trigger.yaml
-
-  build-image:
-    name: Build nightly-a2 image
-    if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
-    uses: ./.github/workflows/_nightly_image_build.yaml
-    with:
-      target: a2
-    secrets:
-      HW_USERNAME: ${{ secrets.HW_USERNAME }}
-      HW_TOKEN: ${{ secrets.HW_TOKEN }}
-      GITEE_TOKEN: ${{ secrets.GITEE_TOKEN }}
-
-  single-node-tests:
-    name: single-node
-    needs: [parse-trigger, build-image]
-    if: >-
-      always() &&
-      needs.parse-trigger.outputs.run == 'true' &&
-      (needs.build-image.result == 'success' || needs.build-image.result == 'skipped')
-    strategy:
-      fail-fast: false
-      matrix:
-        test_config:
-          # pytest-driven tests
-          - name: test_custom_op
-            os: linux-aarch64-a2b3-1
-            tests: tests/e2e/nightly/single_node/ops/singlecard_ops
-          - name: test_custom_op_multi_card
-            os: linux-aarch64-a2b3-4
-            tests: tests/e2e/nightly/single_node/ops/multicard_ops_a2/
-          # YAML-driven tests
-          - name: qwen3-32b
-            os: linux-aarch64-a2b3-4
-            config_file_path: Qwen3-32B.yaml
-          - name: qwen3-next-80b-a3b-instruct
-            os: linux-aarch64-a2b3-4
-            config_file_path: Qwen3-Next-80B-A3B-Instruct-A2.yaml
-          - name: qwen3-32b-int8
-            os: linux-aarch64-a2b3-4
-            config_file_path: Qwen3-32B-Int8-A2.yaml
-    uses: ./.github/workflows/_e2e_nightly_single_node.yaml
-    with:
-      runner: ${{ matrix.test_config.os }}
-      image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a2'
-      tests: ${{ matrix.test_config.tests }}
-      config_file_path: ${{ matrix.test_config.config_file_path }}
-      name: ${{ matrix.test_config.name }}
-      should_run: >-
-        ${{
-          needs.parse-trigger.outputs.run == 'true' && (
-            needs.parse-trigger.outputs.filter == 'all' ||
-            contains(needs.parse-trigger.outputs.filter, format(',{0},', matrix.test_config.name))
-          )
-        }}
-
-  multi-node-tests:
-    name: multi-node
-    needs: [parse-trigger, build-image, single-node-tests]
-    if: >-
-      always() &&
-      needs.parse-trigger.outputs.run == 'true' &&
-      (needs.build-image.result == 'success' || needs.build-image.result == 'skipped')
-    strategy:
-      fail-fast: false
-      max-parallel: 2
-      matrix:
-        test_config:
-          - name: multi-node-deepseek-dp
-            config_file_path: DeepSeek-R1-W8A8-A2.yaml
-            size: 2
-          - name: multi-node-qwen3-235b-dp
-            config_file_path: Qwen3-235B-A22B-A2.yaml
-            size: 2
-    uses: ./.github/workflows/_e2e_nightly_multi_node.yaml
-    with:
-      soc_version: a2
-      runner: linux-amd64-cpu-8-hk
-      image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a2'
-      replicas: 1
-      size: ${{ matrix.test_config.size }}
-      config_file_path: ${{ matrix.test_config.config_file_path }}
-      vllm_ascend_ref: ${{ needs.parse-trigger.outputs.ref }}
-      should_run: >-
-        ${{
-          needs.parse-trigger.outputs.run == 'true' && (
-            needs.parse-trigger.outputs.filter == 'all' ||
-            contains(needs.parse-trigger.outputs.filter, format(',{0},', matrix.test_config.name))
-          )
-        }}
-    secrets:
-      KUBECONFIG_B64: ${{ secrets.KUBECONFIG_HK_001_INTERNAL_B64 }}
-
-  single-node-accuracy-tests:
-    needs: [parse-trigger]
-    if: always() && needs.parse-trigger.outputs.run == 'true'
-    strategy:
-      fail-fast: false
-      matrix:
-        test_config:
-          - name: accuracy-group-1
-            os: linux-aarch64-a2b3-1
-            model_list:
-              - Qwen3-VL-8B-Instruct-W8A8
-              - Qwen3-8B
-              - Qwen2-Audio-7B-Instruct
-              - Qwen3-8B-W8A8
-              - Qwen3-VL-8B-Instruct
-              - Qwen2.5-Omni-7B
-          - name: accuracy-group-2
-            os: linux-aarch64-a2b3-1
-            model_list:
-              - ERNIE-4.5-21B-A3B-PT
-              - InternVL3_5-8B-hf
-              - Molmo-7B-D-0924
-              - Llama-3.2-3B-Instruct
-              - llava-onevision-qwen2-0.5b-ov-hf
-          - name: accuracy-group-3
-            os: linux-aarch64-a2b3-2
-            model_list:
-              - Qwen3-30B-A3B
-              - Qwen3-VL-30B-A3B-Instruct
-              - Qwen3-30B-A3B-W8A8
-          - name: accuracy-group-4
-            os: linux-aarch64-a2b3-4
-            model_list:
-              - Qwen3-Next-80B-A3B-Instruct
-              - Qwen3-Omni-30B-A3B-Instruct
-    uses: ./.github/workflows/_e2e_nightly_single_node_models.yaml
-    with:
-      vllm: v0.18.0
-      runner: ${{ matrix.test_config.os }}
-      model_list: ${{ toJson(matrix.test_config.model_list) }}
-      image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-910b-ubuntu22.04-py3.11'
-      is_run: >-
-        ${{
-          needs.parse-trigger.outputs.run == 'true' && (
-            needs.parse-trigger.outputs.filter == 'all' ||
-            contains(needs.parse-trigger.outputs.filter, format(',{0},', matrix.test_config.name))
-          )
-        }}
-      upload: false
-
-  doc-test:
-    name: doc-test
-    needs: [parse-trigger]
-    if: always() && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
-    strategy:
-      # Each version should be tested
-      fail-fast: false
-      matrix:
-        vllm_version: [releases-v0.13.0, releases-v0.13.0-openeuler, main, main-openeuler]
-    runs-on: linux-aarch64-a2b3-1
-    container:
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:${{ matrix.vllm_version }}
-    steps:
-      - name: Check NPU/CANN and git info
-        run: |
-          echo "====> Print NPU/CANN info"
-          npu-smi info
-          cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
-
-          echo "====> Print vllm-ascend git info"
-          cd /vllm-workspace/vllm-ascend
-          git --no-pager log -1 || true
-          echo "====> Print vllm git info"
-          cd /vllm-workspace/vllm
-          git --no-pager log -1 || true
-
-      - name: Checkout vllm-project/vllm-ascend repo
-        uses: actions/checkout@v6
-
-      - name: Run vllm-ascend/tests/e2e/run_doctests.sh
-        run: |
-          # PWD: /__w/vllm-ascend/vllm-ascend
-          # Make sure e2e tests are latest
-          echo "Replacing /vllm-workspace/vllm-ascend/tests/e2e ..."
-          rm -rf /vllm-workspace/vllm-ascend/tests/e2e
-          mkdir -p /vllm-workspace/vllm-ascend/tests
-          # Overwrite e2e and examples
-          cp -r tests/e2e /vllm-workspace/vllm-ascend/tests/
-          cp -r examples /vllm-workspace/vllm-ascend/
-
-          # Simulate container to enter directory
-          cd /workspace
-
-          # Run real test
-          echo "Test:"
-          /vllm-workspace/vllm-ascend/tests/e2e/run_doctests.sh
--- a/.github/workflows/schedule_nightly_test_a3.yaml
+++ b/.github/workflows/schedule_nightly_test_a3.yaml
@@ -1,236 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-# This workflow related to the resources atlas 800 A3
-# **Please note**: current A3 resource pool's maximum allowed concurrency is 5*16 NPUs
-# We will limit the concurrency of jobs on A3 to avoid the risk of insufficient resources
-name: Nightly-A3
-
-on:
-  schedule:
-      # Run test at 23:45 Beijing time (UTC+8)
-      - cron: "45 15 * * *"
-  workflow_dispatch:
-  pull_request:
-    branches:
-      - 'main'
-      - '*-dev'
-      - 'releases/v*'
-    types: [ labeled, synchronize ]
-
-permissions:
-  contents: read
-  pull-requests: read
-  issues: read
-
-# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
-# declared as "shell: bash -el {0}" on steps that need to be properly activated.
-# It's used to activate ascend-toolkit environment variables.
-defaults:
-  run:
-    shell: bash -el {0}
-
-concurrency:
-  group: ascend-nightly-${{ github.ref }}-a3
-  cancel-in-progress: true
-
-jobs:
-  parse-trigger:
-    name: Parse trigger and determine test scope
-    if: >-
-      github.event_name == 'schedule' ||
-      github.event_name == 'workflow_dispatch' ||
-      contains(github.event.pull_request.labels.*.name, 'nightly-test')
-    uses: ./.github/workflows/_parse_trigger.yaml
-
-  build-image:
-    name: Build nightly-a3 image
-    if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
-    uses: ./.github/workflows/_nightly_image_build.yaml
-    with:
-      target: a3
-    secrets:
-      HW_USERNAME: ${{ secrets.HW_USERNAME }}
-      HW_TOKEN: ${{ secrets.HW_TOKEN }}
-      GITEE_TOKEN: ${{ secrets.GITEE_TOKEN }}
-
-  multi-node-tests:
-    name: multi-node
-    needs: [parse-trigger, build-image]
-    if: >-
-      always() &&
-      needs.parse-trigger.outputs.run == 'true' &&
-      (needs.build-image.result == 'success' || needs.build-image.result == 'skipped')
-    strategy:
-      fail-fast: false
-      max-parallel: 2
-      matrix:
-        test_config:
-          - name: multi-node-deepseek-pd
-            config_file_path: DeepSeek-V3.yaml
-            size: 2
-          - name: multi-node-qwen3-dp
-            config_file_path: Qwen3-235B-A22B.yaml
-            size: 2
-          - name: multi-node-qwenw8a8-2node
-            config_file_path: Qwen3-235B-W8A8.yaml
-            size: 2
-          - name: multi-node-qwenw8a8-2node-eplb
-            config_file_path: Qwen3-235B-W8A8-EPLB.yaml
-            size: 2
-          - name: multi-node-dpsk3.2-2node
-            config_file_path: DeepSeek-V3_2-W8A8-A3-dual-nodes.yaml
-            size: 2
-          - name: multi-node-qwen3-dp-mooncake-layerwise
-            config_file_path: Qwen3-235B-A22B-Mooncake-Layerwise.yaml
-            size: 2
-          - name: multi-node-deepseek-r1-w8a8-longseq
-            config_file_path: DeepSeek-R1-W8A8-longseq.yaml
-            size: 2
-          - name: multi-node-qwenw8a8-2node-longseq
-            config_file_path: Qwen3-235B-W8A8-longseq.yaml
-            size: 2
-          - name: multi-node-qwen-disagg-pd
-            config_file_path: Qwen3-235B-disagg-pd.yaml
-            size: 2
-          - name: multi-node-qwen-vl-disagg-pd
-            config_file_path: Qwen3-VL-235B-disagg-pd.yaml
-            size: 2
-          - name: multi-node-kimi-k2-instruct-w8a8
-            config_file_path: Kimi-K2-Instruct-W8A8.yaml
-            size: 2
-          - name: multi-node-deepseek-v3.1
-            config_file_path: DeepSeek-V3.1-BF16.yaml
-            size: 2
-          - name: multi-node-deepseek-v3.2-W8A8-EP
-            config_file_path: DeepSeek-V3_2-W8A8-EP.yaml
-            size: 4
-    uses: ./.github/workflows/_e2e_nightly_multi_node.yaml
-    with:
-      soc_version: a3
-      runner: linux-aarch64-a3-0
-      image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a3'
-      replicas: 1
-      size: ${{ matrix.test_config.size }}
-      config_file_path: ${{ matrix.test_config.config_file_path }}
-      vllm_ascend_ref: ${{ needs.parse-trigger.outputs.ref }}
-      should_run: >-
-        ${{
-          needs.parse-trigger.outputs.run == 'true' && (
-            needs.parse-trigger.outputs.filter == 'all' ||
-            contains(needs.parse-trigger.outputs.filter, format(',{0},', matrix.test_config.name))
-          )
-        }}
-    secrets:
-      KUBECONFIG_B64: ${{ secrets.KUBECONFIG_B64 }}
-
-  single-node-tests:
-    name: single-node
-    needs: [parse-trigger, build-image, multi-node-tests]
-    if: >-
-      always() &&
-      needs.parse-trigger.outputs.run == 'true' &&
-      (needs.build-image.result == 'success' || needs.build-image.result == 'skipped')
-    strategy:
-      fail-fast: false
-      matrix:
-        test_config:
-          # pytest-driven tests
-          - name: qwen3-30b-acc
-            os: linux-aarch64-a3-4
-            tests: tests/e2e/weekly/single_node/models/test_qwen3_30b_acc.py
-          - name: custom-multi-ops
-            os: linux-aarch64-a3-16
-            tests: tests/e2e/nightly/single_node/ops/multicard_ops_a3/
-          # YAML-driven tests
-          - name: deepseek-r1-0528-w8a8
-            os: linux-aarch64-a3-16
-            config_file_path: DeepSeek-R1-0528-W8A8.yaml
-          - name: deepseek-r1-w8a8-hbm
-            os: linux-aarch64-a3-16
-            config_file_path: DeepSeek-R1-W8A8-HBM.yaml
-          - name: deepseek-v3-2-w8a8
-            os: linux-aarch64-a3-16
-            config_file_path: DeepSeek-V3.2-W8A8.yaml
-          - name: glm-5-w4a8
-            os: linux-aarch64-a3-16
-            config_file_path: GLM-5.yaml
-          - name: glm-4.7-w8a8
-            os: linux-aarch64-a3-16
-            config_file_path: GLM-4.7.yaml
-          - name: kimi-k2-thinking
-            os: linux-aarch64-a3-16
-            config_file_path: Kimi-K2-Thinking.yaml
-          - name: kimi-k2.5
-            os: linux-aarch64-a3-16
-            config_file_path: Kimi-K2.5.yaml
-          - name: minimax-m2-5
-            os: linux-aarch64-a3-16
-            config_file_path: MiniMax-M2.5-A3.yaml
-          - name: mtpx-deepseek-r1-0528-w8a8
-            os: linux-aarch64-a3-16
-            config_file_path: MTPX-DeepSeek-R1-0528-W8A8.yaml
-          - name: qwen3-235b-a22b-w8a8
-            os: linux-aarch64-a3-16
-            config_file_path: Qwen3-235B-A22B-W8A8.yaml
-          - name: qwen3-30b-a3b-w8a8
-            os: linux-aarch64-a3-4
-            config_file_path: Qwen3-30B-A3B-W8A8.yaml
-          - name: qwen3-next-80b-a3b-instruct
-            os: linux-aarch64-a3-4
-            config_file_path: Qwen3-Next-80B-A3B-Instruct.yaml
-          - name: qwen3-next-80b-a3b-instruct-w8a8
-            os: linux-aarch64-a3-4
-            config_file_path: Qwen3-Next-80B-A3B-Instruct-W8A8.yaml
-          - name: qwq-32b
-            os: linux-aarch64-a3-4
-            config_file_path: QwQ-32B.yaml
-          - name: qwen3-32b-int8
-            os: linux-aarch64-a3-4
-            config_file_path: Qwen3-32B-Int8.yaml
-          - name: qwen2-5-vl-7b
-            os: linux-aarch64-a3-4
-            config_file_path: Qwen2.5-VL-7B-Instruct.yaml
-          - name: qwen2-5-vl-7b-epd
-            os: linux-aarch64-a3-4
-            config_file_path: Qwen2.5-VL-7B-Instruct-EPD.yaml
-          - name: qwen2-5-vl-32b
-            os: linux-aarch64-a3-4
-            config_file_path: Qwen2.5-VL-32B-Instruct.yaml
-          - name: qwen3-32b-int8-a3-feature-stack3
-            os: linux-aarch64-a3-4
-            config_file_path: Qwen3-32B-Int8-A3-Feature-Stack3.yaml
-          - name: qwen3-32b-int8-prefix-cache
-            os: linux-aarch64-a3-4
-            config_file_path: Prefix-Cache-Qwen3-32B-Int8.yaml
-          - name: deepseek-r1-0528-w8a8-prefix-cache
-            os: linux-aarch64-a3-16
-            config_file_path: Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml
-    uses: ./.github/workflows/_e2e_nightly_single_node.yaml
-    with:
-      runner: ${{ matrix.test_config.os }}
-      image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a3'
-      tests: ${{ matrix.test_config.tests }}
-      config_file_path: ${{ matrix.test_config.config_file_path }}
-      name: ${{ matrix.test_config.name }}
-      should_run: >-
-        ${{
-          needs.parse-trigger.outputs.run == 'true' && (
-            needs.parse-trigger.outputs.filter == 'all' ||
-            contains(needs.parse-trigger.outputs.filter, format(',{0},', matrix.test_config.name))
-          )
-        }}
--- a/.github/workflows/schedule_release_code_and_wheel.yml
+++ b/.github/workflows/schedule_release_code_and_wheel.yml
@@ -1,459 +0,0 @@
-#
-# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# This file is a part of the vllm-ascend project.
-#
-
-name: Release Code and Wheel
-
-on:
-  schedule:
-    # UTC+8: 10am, 16pm
-    - cron: '0 2,8 * * *'
-  push:
-    tags:
-      - 'v*'
-  workflow_dispatch:
-    inputs:
-      tag:
-        description: 'Docker tag for build results'
-        default: main
-        required: true
-        type: choice
-        options:
-          - main
-          - v0.17.0rc1
-          - v0.16.0rc1
-          - v0.15.0rc1
-          - v0.14.0rc1
-          - v0.13.0rc3
-
-jobs:
-  build_and_release_code:
-    name: release code
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        python-version: ["3.11"]
-    steps:
-      - name: checkout vllm-ascend
-        if: ${{ github.event_name != 'workflow_dispatch' }}
-        uses: actions/checkout@v6
-
-      - name: checkout vllm-ascend ${{ inputs.tag }}
-        if: ${{ github.event_name == 'workflow_dispatch' }}
-        uses: actions/checkout@v6
-        with:
-          ref: ${{ inputs.tag }}
-
-      - name: Print
-        run: |
-          lscpu
-
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
-        with:
-          python-version: ${{ matrix.python-version }}
-
-      - name: Install dependencies
-        run: |
-          python3 -m pip install twine setuptools_scm
-
-      - name: Generate tar.gz
-        env:
-          SOC_VERSION: ascend910b1
-        run: |
-          python3 setup.py sdist
-          ls dist
-
-      - name: Archive tar.gz
-        uses: actions/upload-artifact@v7
-        with:
-          name: vllm-ascend-src
-          path: dist/*
-
-      - name: Release
-        if: ${{ github.event_name == 'push' }}
-        run: |
-          python3 -m twine upload dist/* -u __token__ -p ${{ secrets.PYPI_TOKEN }}
-
-  build_and_release_wheel:
-    name: build and release wheel
-    strategy:
-      matrix:
-        os: [ubuntu-24.04, ubuntu-24.04-arm]
-        python-version: ["3.10", "3.11"]
-
-    runs-on: ${{ matrix.os }}
-    steps:
-      - name: checkout vllm-ascend
-        if: ${{ github.event_name != 'workflow_dispatch' }}
-        uses: actions/checkout@v6
-
-      - name: checkout vllm-ascend ${{ inputs.tag }}
-        if: ${{ github.event_name == 'workflow_dispatch' }}
-        uses: actions/checkout@v6
-        with:
-          ref: ${{ inputs.tag }}
-
-      - name: Free up disk space
-        uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
-        with:
-          tool-cache: true
-          docker-images: false
-
-      - name: Build wheel
-        run: |
-          ls
-          docker build -f ./.github/workflows/dockerfiles/Dockerfile.buildwheel.a2 \
-          --build-arg PY_VERSION=${{ matrix.python-version }} \
-          -t wheel:v1 .
-          docker run --rm \
-          -u "$(id -u):$(id -g)" \
-          -v "$(pwd):/outpwd" \
-          wheel:v1 \
-          bash -c "cp -r /workspace/vllm-ascend/dist /outpwd"
-          ls dist
-
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
-        with:
-          python-version: ${{ matrix.python-version }}
-
-      - name: Repair wheels with auditwheel
-        run: |
-          python3 -m pip install auditwheel
-          python3 -m pip install patchelf
-          mkdir -p dist/repaired
-          for whl in dist/*.whl; do
-            auditwheel repair "$whl" -w dist/repaired/ \
-            --exclude libplatform.so \
-            --exclude libregister.so \
-            --exclude libge_common_base.so \
-            --exclude libc10.so \
-            --exclude libc_sec.so \
-            --exclude libnnopbase.so \
-            --exclude libprofapi.so \
-            --exclude libgraph_base.so \
-            --exclude libgraph.so \
-            --exclude libexe_graph.so \
-            --exclude "libascend*.so" \
-            --exclude "libtorch*.so" \
-            --exclude "libopapi.so" \
-            --exclude "liberror_manager.so" \
-            --exclude "libruntime.so" \
-            --exclude "libmmpa.so"
-
-          done
-          rm -f dist/*.whl
-          mv dist/repaired/*.whl dist/
-          rmdir dist/repaired
-          ls dist
-
-      - name: Verify automatic platform tags
-        run: |
-          cd dist
-          for wheel in *.whl; do
-            echo "verification file: $wheel"
-            auditwheel show "$wheel"
-          done
-
-      - name: Generate variant wheels
-        env:
-          WHEEL_FILE: dist
-          PROJECT_TOML: .github/workflows/scripts/wheel/pyproject.toml
-          OUTPUT_DIR: dist/variants
-        run: |
-          pip install build git+https://github.com/wheelnext/variantlib.git --quiet
-          mkdir -p dist/variants
-          python3 .github/workflows/scripts/wheel/make_variant.py \
-            -c .github/workflows/scripts/wheel/config.json \
-            -l a2
-          echo "Generated variant wheels:"
-          ls dist/variants/
-
-      - name: Archive wheel
-        uses: actions/upload-artifact@v7
-        with:
-          name: vllm-ascend-${{ matrix.os }}-py${{ matrix.python-version }}-wheel
-          path: dist/
-
-      - name: Release
-        if: ${{ github.event_name == 'push' || github.event_name == 'workflow_dispatch' }}
-        run: |
-          python3 -m pip install twine
-          python3 -m twine upload --verbose dist/*.whl -u __token__ -p ${{ secrets.PYPI_TOKEN }}
-
-
-  build_and_release_wheel_a3:
-    name: build and release wheel (A3)
-    strategy:
-      matrix:
-        os: [ubuntu-24.04, ubuntu-24.04-arm]
-        python-version: ["3.10", "3.11"]
-
-    runs-on: ${{ matrix.os }}
-    steps:
-      - name: checkout vllm-ascend
-        if: ${{ github.event_name != 'workflow_dispatch' }}
-        uses: actions/checkout@v6
-
-      - name: checkout vllm-ascend ${{ inputs.tag }}
-        if: ${{ github.event_name == 'workflow_dispatch' }}
-        uses: actions/checkout@v6
-        with:
-          ref: ${{ inputs.tag }}
-
-      - name: Free up disk space
-        uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
-        with:
-          tool-cache: true
-          docker-images: false
-
-      - name: Build wheel
-        run: |
-          ls
-          docker build -f ./.github/workflows/dockerfiles/Dockerfile.buildwheel.a3 \
-          --build-arg PY_VERSION=${{ matrix.python-version }} \
-          -t wheel-a3:v1 .
-          docker run --rm \
-          -u "$(id -u):$(id -g)" \
-          -v "$(pwd):/outpwd" \
-          wheel-a3:v1 \
-          bash -c "cp -r /workspace/vllm-ascend/dist /outpwd"
-          ls dist
-
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
-        with:
-          python-version: ${{ matrix.python-version }}
-
-      - name: Repair wheels with auditwheel
-        run: |
-          python3 -m pip install auditwheel
-          python3 -m pip install patchelf
-          mkdir -p dist/repaired
-          for whl in dist/*.whl; do
-            auditwheel repair "$whl" -w dist/repaired/ \
-            --exclude libplatform.so \
-            --exclude libregister.so \
-            --exclude libge_common_base.so \
-            --exclude libc10.so \
-            --exclude libc_sec.so \
-            --exclude libnnopbase.so \
-            --exclude libprofapi.so \
-            --exclude libgraph_base.so \
-            --exclude libgraph.so \
-            --exclude libexe_graph.so \
-            --exclude "libascend*.so" \
-            --exclude "libtorch*.so" \
-            --exclude "libopapi.so" \
-            --exclude "liberror_manager.so" \
-            --exclude "libruntime.so" \
-            --exclude "libmmpa.so"
-
-          done
-          rm -f dist/*.whl
-          mv dist/repaired/*.whl dist/
-          rmdir dist/repaired
-          ls dist
-
-      - name: Verify automatic platform tags
-        run: |
-          cd dist
-          for wheel in *.whl; do
-            echo "verification file: $wheel"
-            auditwheel show "$wheel"
-          done
-
-      - name: Generate variant wheels
-        env:
-          WHEEL_FILE: dist
-          PROJECT_TOML: .github/workflows/scripts/wheel/pyproject.toml
-          OUTPUT_DIR: dist/variants
-        run: |
-          pip install build git+https://github.com/wheelnext/variantlib.git --quiet
-          mkdir -p dist/variants
-          python3 .github/workflows/scripts/wheel/make_variant.py \
-            -c .github/workflows/scripts/wheel/config.json \
-            -l a3
-          echo "Generated variant wheels:"
-          ls dist/variants/
-
-      - name: Archive wheel
-        uses: actions/upload-artifact@v7
-        with:
-          name: vllm-ascend-a3-${{ matrix.os }}-py${{ matrix.python-version }}-wheel
-          path: dist/
-
-
-  build_and_release_wheel_310p:
-    name: build and release wheel (310P)
-    strategy:
-      matrix:
-        os: [ubuntu-24.04, ubuntu-24.04-arm]
-        python-version: ["3.10", "3.11"]
-
-    runs-on: ${{ matrix.os }}
-    steps:
-      - name: checkout vllm-ascend
-        if: ${{ github.event_name != 'workflow_dispatch' }}
-        uses: actions/checkout@v6
-
-      - name: checkout vllm-ascend ${{ inputs.tag }}
-        if: ${{ github.event_name == 'workflow_dispatch' }}
-        uses: actions/checkout@v6
-        with:
-          ref: ${{ inputs.tag }}
-
-      - name: Free up disk space
-        uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
-        with:
-          tool-cache: true
-          docker-images: false
-
-      - name: Build wheel
-        run: |
-          ls
-          docker build -f ./.github/workflows/dockerfiles/Dockerfile.buildwheel.310p \
-          --build-arg PY_VERSION=${{ matrix.python-version }} \
-          -t wheel-310p:v1 .
-          docker run --rm \
-          -u "$(id -u):$(id -g)" \
-          -v "$(pwd):/outpwd" \
-          wheel-310p:v1 \
-          bash -c "cp -r /workspace/vllm-ascend/dist /outpwd"
-          ls dist
-
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
-        with:
-          python-version: ${{ matrix.python-version }}
-
-      - name: Repair wheels with auditwheel
-        run: |
-          python3 -m pip install auditwheel
-          python3 -m pip install patchelf
-          mkdir -p dist/repaired
-          for whl in dist/*.whl; do
-            auditwheel repair "$whl" -w dist/repaired/ \
-            --exclude libplatform.so \
-            --exclude libregister.so \
-            --exclude libge_common_base.so \
-            --exclude libc10.so \
-            --exclude libc_sec.so \
-            --exclude libnnopbase.so \
-            --exclude libprofapi.so \
-            --exclude libgraph_base.so \
-            --exclude libgraph.so \
-            --exclude libexe_graph.so \
-            --exclude "libascend*.so" \
-            --exclude "libtorch*.so" \
-            --exclude "libopapi.so" \
-            --exclude "liberror_manager.so" \
-            --exclude "libruntime.so" \
-            --exclude "libmmpa.so"
-
-          done
-          rm -f dist/*.whl
-          mv dist/repaired/*.whl dist/
-          rmdir dist/repaired
-          ls dist
-
-      - name: Verify automatic platform tags
-        run: |
-          cd dist
-          for wheel in *.whl; do
-            echo "verification file: $wheel"
-            auditwheel show "$wheel"
-          done
-
-      - name: Generate variant wheels
-        env:
-          WHEEL_FILE: dist
-          PROJECT_TOML: .github/workflows/scripts/wheel/pyproject.toml
-          OUTPUT_DIR: dist/variants
-        run: |
-          pip install build git+https://github.com/wheelnext/variantlib.git --quiet
-          mkdir -p dist/variants
-          python3 .github/workflows/scripts/wheel/make_variant.py \
-            -c .github/workflows/scripts/wheel/config.json \
-            -l 310p
-          echo "Generated variant wheels:"
-          ls dist/variants/
-
-      - name: Archive wheel
-        uses: actions/upload-artifact@v7
-        with:
-          name: vllm-ascend-310p-${{ matrix.os }}-py${{ matrix.python-version }}-wheel
-          path: dist/
-
-
-  generate_and_upload_variant_index:
-    name: generate and upload variant index
-    needs: [build_and_release_wheel, build_and_release_wheel_a3, build_and_release_wheel_310p]
-    if: ${{ github.event_name == 'push' || github.event_name == 'workflow_dispatch' }}
-    runs-on: ubuntu-24.04
-    steps:
-      - name: Download all variant wheels
-        uses: actions/download-artifact@v4
-        with:
-          pattern: '*-wheel'
-          path: all-wheels/
-          merge-multiple: true
-
-      - name: Collect variant wheels
-        run: |
-          mkdir -p combined-variants
-          find all-wheels/ -path '*/variants/*.whl' -exec cp {} combined-variants/ \;
-          echo "Combined variant wheels:"
-          ls combined-variants/
-
-      - name: Set up Python
-        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
-        with:
-          python-version: "3.11"
-
-      - name: Generate combined variant index
-        run: |
-          pip install git+https://github.com/wheelnext/variantlib.git --quiet
-          variantlib generate-index-json -d combined-variants/
-          echo "Generated index files:"
-          ls combined-variants/
-
-      - name: Upload wheels and variant index to OBS
-        env:
-          OBS_ACCESS_KEY: ${{ secrets.OBS_ACCESS_KEY_ID }}
-          OBS_SECRET_KEY: ${{ secrets.OBS_SECRET_ACCESS_KEY }}
-        run: |
-          pip install esdk-obs-python --quiet
-          python3 - <<'EOF'
-          import os, glob
-          from obs import ObsClient
-          OBS_BUCKET = 'ascend-artifcat-packages'
-          OBS_PATH = 'pypi/packages/ascend/repos/pypi/variant/vllm-ascend'
-          client = ObsClient(
-              access_key_id=os.environ['OBS_ACCESS_KEY'],
-              secret_access_key=os.environ['OBS_SECRET_KEY'],
-              server='https://obs.cn-north-4.myhuaweicloud.com'
-          )
-          files = glob.glob('combined-variants/*.whl') + glob.glob('combined-variants/*.json')
-          for file in files:
-              filename = os.path.basename(file)
-              resp = client.putFile(OBS_BUCKET, f'{OBS_PATH}/{filename}', file)
-              if resp.status < 300:
-                  print(f'Uploaded: {filename}')
-              else:
-                  raise Exception(f'Failed to upload {filename}: {resp.errorMessage}')
-          EOF
--- a/.github/workflows/schedule_stale_manage.yaml
+++ b/.github/workflows/schedule_stale_manage.yaml
@@ -1,66 +0,0 @@
-name: "Close stale resolved/awaiting-feedback issues"
-on:
-  schedule:
-    - cron: '0 2 * * *' 
-
-jobs:
-  stale:
-    runs-on: ubuntu-latest
-    permissions:
-      actions: write
-      issues: write
-    steps:
-      - uses: actions/stale@v10
-        with:
-          # Process issues with the 'resolved' label
-          any-of-labels: 'resolved'
-
-          # Mark as stale after a period of inactivity
-          days-before-stale: 7
-          stale-issue-label: 'stale'
-          stale-issue-message: |
-            This issue has been marked as `resolved` but has not received any feedback for some time, so it is now labeled as `stale`. 
-            If you feel this was a mistake, please leave a comment to have the `stale` label removed. 
-            `Stale` issues will automatically be closed after 14 days of inactivity.
-
-          # Close stale issues after a period of inactivity
-          days-before-close: 14
-          close-issue-message: |
-            This issue is being closed due to a lack of recent activity. 
-            If you have any further questions or requirements, please feel free to reopen this issue or create a new one.
-
-          # Automatically remove the 'stale' label when the issue is updated (default is true)
-          remove-stale-when-updated: true
-          # Also remove the 'resolved' label
-          labels-to-remove-when-unstale: 'resolved'
-
-          # Avoid accidental PR processing (PRs can be handled if needed; this is issue-only)
-          days-before-pr-stale: -1
-          days-before-pr-close: -1
-      - uses: actions/stale@v10
-        with:
-          # Process issues with the 'awaiting-feedback' label
-          any-of-labels: 'awaiting-feedback'
-
-          # Mark as stale after a period of inactivity
-          days-before-stale: 7
-          stale-issue-label: 'stale'
-          stale-issue-message: |
-            This issue has been marked as `awaiting-feedback` but has not received any feedback for some time, so it is now labeled as `stale`. 
-            To more accurately locate and resolve the issue, we need you to provide the relevant information mentioned above. 
-            `Stale` issues will automatically be closed after 14 days of inactivity.
-
-          # Close stale issues after a period of inactivity
-          days-before-close: 14
-          close-issue-message: |
-            This issue is being closed due to a lack of recent activity. 
-            If you have any further questions or requirements, please feel free to reopen this issue or create a new one.
-
-          # Automatically remove the 'stale' label when the issue is updated (default is true)
-          remove-stale-when-updated: true
-          # Also remove the 'awaiting-feedback' label
-          labels-to-remove-when-unstale: 'awaiting-feedback'
-
-          # Avoid accidental PR processing (PRs can be handled if needed; this is issue-only)
-          days-before-pr-stale: -1
-          days-before-pr-close: -1
--- a/.github/workflows/schedule_update_estimated_time.yaml
+++ b/.github/workflows/schedule_update_estimated_time.yaml
@@ -1,120 +0,0 @@
-name: Update estimated test times
-
-on:
-  schedule:
-    - cron: '0 2 * * 1'  # Every Monday at 02:00 UTC
-  workflow_dispatch:
-
-permissions:
-  contents: write
-  pull-requests: write
-
-env:
-  UPSTREAM_REPO: vllm-project/vllm-ascend
-  FORK_OWNER: vllm-ascend-ci
-  BRANCH_NAME: auto/update-estimated-times-${{ github.run_id }}
-
-concurrency:
-  group: update-estimated-times-${{ github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  e2e-test:
-    name: e2e-test
-    strategy:
-      matrix:
-        vllm_version: [v0.18.0]
-        type: [full, light]
-    uses: ./.github/workflows/_e2e_test.yaml
-    with:
-      vllm: ${{ matrix.vllm_version }}
-      image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:main
-      contains_310: false
-      type: ${{ matrix.type }}
-      continue_on_error: true  # Continue even if some tests fail, we want to collect as much timing data as possible
-
-  update-estimated-times:
-    name: Update estimated_time in config.yaml
-    needs: [e2e-test]
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout fork repo
-        uses: actions/checkout@v6
-        with:
-          repository: ${{ env.FORK_OWNER }}/vllm-ascend
-          token: ${{ secrets.PAT_TOKEN }}
-
-      - name: Download all timing artifacts
-        uses: actions/download-artifact@v4
-        with:
-          pattern: timing-data-*
-          path: timing-artifacts/
-          merge-multiple: false
-
-      - name: Set up Python
-        uses: actions/setup-python@v6
-        with:
-          python-version: '3.11'
-
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install pyyaml
-
-      - name: Config git
-        run: |
-          git config user.name "github-actions[bot]"
-          git config user.email "github-actions[bot]@users.noreply.github.com"
-          git remote add upstream https://github.com/${{ env.UPSTREAM_REPO }}.git
-          git fetch upstream main && git checkout -b ${{ env.BRANCH_NAME }} upstream/main
-
-      - name: Update config.yaml from timing data
-        run: |
-          python3 .github/workflows/scripts/update_estimated_time.py \
-            --timing-dir timing-artifacts/ \
-            --config .github/workflows/scripts/config.yaml
-
-      - name: Check for changes
-        id: check_changes
-        run: |
-          if git diff --quiet .github/workflows/scripts/config.yaml; then
-            echo "changed=false" >> "$GITHUB_OUTPUT"
-            echo "No changes to config.yaml."
-          else
-            echo "changed=true" >> "$GITHUB_OUTPUT"
-            echo "config.yaml has been updated:"
-            git diff .github/workflows/scripts/config.yaml
-          fi
-
-      - name: Create pull request
-        env:
-          GITHUB_TOKEN: ${{ secrets.PAT_TOKEN }}
-        run: |
-          git add .github/workflows/scripts/config.yaml
-          git commit -sm "[CI] Auto-update estimated test times in config.yaml Computed from timing-data artifacts on workflow run-${{ github.run_id }}"
-          git remote -v
-          git push -f origin ${{ env.BRANCH_NAME }}:${{ env.BRANCH_NAME }}
-          gh pr create \
-            --repo ${{ env.UPSTREAM_REPO }} \
-            --base main \
-            --head ${{ env.FORK_OWNER }}:${{ env.BRANCH_NAME }} \
-            --title "[CI]: Auto-update estimated test times in config.yaml" \
-            --body "## Summary
-
-          This PR was auto-generated by the **Update estimated test times** [workflow](https://github.com/${{ env.UPSTREAM_REPO }}/actions/runs/${{ github.run_id }}).
-
-          It updates the \`estimated_time\` values in \`.github/workflows/scripts/config.yaml\` based on actual elapsed times collected from CI workflow runs.
-
-          ### Methodology
-
-          - Each e2e test job uploads its elapsed time as a \`timing-data-*\` artifact upon completion.
-          - The workflow aggregates all collected timing artifacts across jobs.
-          - For each test, the **median** elapsed time is computed to reduce outlier impact.
-          - A **10% safety buffer** is applied and the result is rounded to the nearest 10 seconds.
-
-          ### Review Checklist
-
-          - [ ] Verify that updated \`estimated_time\` values are within a reasonable range.
-          - [ ] Confirm no test entries are missing or unexpectedly removed.
-
-          > If the new values look reasonable, feel free to merge. Otherwise, leave a comment describing the anomaly."
--- a/.github/workflows/scripts/ci_log_summary.py
+++ b/.github/workflows/scripts/ci_log_summary.py
--- a/.github/workflows/scripts/ci_utils.py
+++ b/.github/workflows/scripts/ci_utils.py
@@ -1,103 +0,0 @@
-import subprocess
-import time
-from dataclasses import dataclass
-
-
-class _Color:
-    HEADER = "\033[95m"
-    GREEN = "\033[92m"
-    RED = "\033[91m"
-    RESET = "\033[0m"
-
-
-@dataclass
-class TestFile:
-    name: str
-    estimated_time: float = 60
-    is_skipped: bool = False
-
-
-@dataclass
-class TestRecord:
-    name: str
-    passed: bool
-    elapsed: float
-    estimated: float
-
-    def to_dict(self) -> dict:
-        return {
-            "name": self.name,
-            "passed": self.passed,
-            "elapsed": self.elapsed,
-            "estimated": self.estimated,
-        }
-
-
-def run_tests(
-    files: list[TestFile],
-    continue_on_error: bool = False,
-) -> tuple[int, list[TestRecord]]:
-    """
-    Run each TestFile with pytest and collect timing results.
-
-    NOTE:
-        The emitted START / PASSED / FAILED log lines are parsed by
-        ci_log_summary.py to recover per-test invocation boundaries.
-        Keep this output format stable, or update the corresponding
-        regexes in those CI log summarizers together.
-
-    Args:
-        files: Tests to run (skipped entries should already be filtered out).
-        continue_on_error: If True, keep running after a failure.
-        report_path: If provided, write a Markdown timing report here.
-
-    Returns:
-        (exit_code, records) — exit_code is 0 on full success, -1 otherwise.
-    """
-    records: list[TestRecord] = []
-    all_passed = True
-    total_start = time.perf_counter()
-
-    for i, test in enumerate(files):
-        print(f"\n{'.' * 60}", flush=True)
-        # NOTE: ci_log_summary.py depend on this
-        # START line format when splitting suite-level logs into test runs.
-        print(
-            f"{_Color.HEADER}[{i + 1}/{len(files)}] START  {test.name}{_Color.RESET}",
-            flush=True,
-        )
-
-        start = time.perf_counter()
-        result = subprocess.run(["pytest", "-sv", "--durations=0", "--color=yes", test.name])
-        elapsed = time.perf_counter() - start
-        passed = result.returncode == 0
-
-        records.append(TestRecord(name=test.name, passed=passed, elapsed=elapsed, estimated=test.estimated_time))
-
-        color = _Color.GREEN if passed else _Color.RED
-        status = "PASSED" if passed else f"FAILED (exit code {result.returncode})"
-        # NOTE: ci_log_summary.py depend on this
-        # PASSED / FAILED (exit code X) line format for suite end detection.
-        print(
-            f"{color}[{i + 1}/{len(files)}] {status}  {test.name}  ({elapsed:.0f}s){_Color.RESET}",
-            flush=True,
-        )
-
-        if not passed:
-            all_passed = False
-            if not continue_on_error:
-                break
-
-    total_elapsed = time.perf_counter() - total_start
-    passed_count = sum(1 for r in records if r.passed)
-
-    print(f"\n{'=' * 60}")
-    color = _Color.GREEN if all_passed else _Color.RED
-    print(f"{color}Summary: {passed_count}/{len(files)} passed  ({total_elapsed:.2f}s total){_Color.RESET}")
-    print("=" * 60)
-    for r in records:
-        icon = f"{_Color.GREEN}✓{_Color.RESET}" if r.passed else f"{_Color.RED}✗{_Color.RESET}"
-        print(f"  {icon} {r.name}  ({r.elapsed:.0f}s)")
-    print(flush=True)
-
-    return (0 if all_passed else -1), records
--- a/.github/workflows/scripts/config.yaml
+++ b/.github/workflows/scripts/config.yaml
@@ -1,169 +0,0 @@
-e2e-singlecard:
- name: tests/e2e/singlecard/compile/test_graphex_norm_quant_fusion.py
-  estimated_time: 83
- name: tests/e2e/singlecard/compile/test_graphex_qknorm_rope_fusion.py
-  estimated_time: 69
- name: tests/e2e/singlecard/test_auto_fit_max_mode_len.py
-  estimated_time: 70
- name: tests/e2e/singlecard/test_eager_mode_acc.py
-  estimated_time: 255
- name: tests/e2e/singlecard/test_aclgraph_accuracy.py
-  estimated_time: 839
- name: tests/e2e/singlecard/test_aclgraph_batch_invariant.py
-  estimated_time: 515
- name: tests/e2e/singlecard/test_aclgraph_mem.py
-  estimated_time: 187
- name: tests/e2e/singlecard/test_async_scheduling.py
-  estimated_time: 252
- name: tests/e2e/singlecard/test_batch_invariant.py
-  estimated_time: 506
- name: tests/e2e/singlecard/test_camem.py
-  estimated_time: 149
- name: tests/e2e/singlecard/test_completion_with_prompt_embeds.py
-  estimated_time: 136
- name: tests/e2e/singlecard/test_cpu_offloading.py
-  estimated_time: 166
- name: tests/e2e/singlecard/test_guided_decoding.py
-  estimated_time: 407
- name: tests/e2e/singlecard/test_ilama_lora.py
-  estimated_time: 112
- name: tests/e2e/singlecard/test_llama32_lora.py
-  estimated_time: 239
- name: tests/e2e/singlecard/test_qwen3_multi_loras.py
-  estimated_time: 140
- name: tests/e2e/singlecard/test_models.py
-  estimated_time: 320
- name: tests/e2e/singlecard/test_multistream_overlap_shared_expert.py
-  estimated_time: 292
- name: tests/e2e/singlecard/test_quantization.py
-  estimated_time: 284
- name: tests/e2e/singlecard/test_sampler.py
-  estimated_time: 258
- name: tests/e2e/singlecard/test_vlm.py
-  estimated_time: 495
- name: tests/e2e/singlecard/test_multi_instance.py
-  estimated_time: 120
- name: tests/e2e/singlecard/test_xlite.py
-  estimated_time: 135
- name: tests/e2e/singlecard/compile/test_norm_quant_fusion.py
-  estimated_time: 106
- name: tests/e2e/singlecard/pooling/test_classification.py
-  estimated_time: 148
- name: tests/e2e/singlecard/pooling/test_embedding.py
-  estimated_time: 324
- name: tests/e2e/singlecard/pooling/test_scoring.py
-  estimated_time: 553
- name: tests/e2e/singlecard/pooling/test_qwen3_reranker_lora.py
-  estimated_time: 280
- name: tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py
-  estimated_time: 6141
- name: tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py
-  estimated_time: 600
- name: tests/e2e/singlecard/model_runner_v2/test_basic.py
-  estimated_time: 80
-  is_skipped: true
-e2e-singlecard-light:
- name: tests/e2e/singlecard/test_aclgraph_accuracy.py::test_piecewise_res_consistency
-  estimated_time: 229
- name: tests/e2e/singlecard/test_quantization.py::test_qwen3_w8a8_quant
-  estimated_time: 183
-e2e-2card-light:
- name: tests/e2e/multicard/2-cards/test_qwen3_moe.py::test_qwen3_moe_distributed_mp_tp2_ep
-  estimated_time: 164
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek3_2_w8a8_pruning_mtp_tp2_ep
-  estimated_time: 90
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek3_2_w8a8c8_pruning_mtp_tp2_ep
-  estimated_time: 180
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_gpt_oss_distributed_tp2
-  estimated_time: 352
-e2e-multicard-2-cards:
- name: tests/e2e/multicard/2-cards/test_aclgraph_capture_replay.py
-  estimated_time: 0
-  is_skipped: true
- name: tests/e2e/multicard/2-cards/spec_decode/test_spec_decode.py
-  estimated_time: 0
-  is_skipped: true
- name: tests/e2e/multicard/2-cards/test_offline_weight_load.py
-  estimated_time: 0
-  is_skipped: true
- name: tests/e2e/multicard/2-cards/test_shared_expert_dp.py
-  estimated_time: 0
-  is_skipped: true
- name: tests/e2e/multicard/2-cards/test_qwen3_performance.py
-  estimated_time: 194
- name: tests/e2e/multicard/2-cards/test_data_parallel.py
-  estimated_time: 454
- name: tests/e2e/multicard/2-cards/test_expert_parallel.py
-  estimated_time: 220
- name: tests/e2e/multicard/2-cards/test_external_launcher.py
-  estimated_time: 550
- name: tests/e2e/multicard/2-cards/test_full_graph_mode.py
-  estimated_time: 805
- name: tests/e2e/multicard/2-cards/test_ilama_lora_tp2.py
-  estimated_time: 113
- name: tests/e2e/multicard/2-cards/test_llama32_lora_tp2.py
-  estimated_time: 410
- name: tests/e2e/multicard/2-cards/spec_decode/test_quarot_eagle.py
-  estimated_time: 859
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek_multistream_moe_tp2
-  estimated_time: 112
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_w4a8_dynamic_tp2
-  estimated_time: 104
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_moe_sp_tp2
-  estimated_time: 176
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek_w4a8_accuracy_tp2
-  estimated_time: 125
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_moe_fc2_tp2
-  estimated_time: 173
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek_v2_lite_fc1_tp2
-  estimated_time: 124
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_dense_fc1_tp2
-  estimated_time: 99
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_dense_prefetch_mlp_weight_tp2
-  estimated_time: 110
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek3_2_w8a8_pruning_mtp_tp2_ep
-  estimated_time: 111
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_deepseek3_2_w8a8c8_pruning_mtp_tp2_ep
-  estimated_time: 180
- name: tests/e2e/multicard/2-cards/test_offline_inference_distributed.py::test_qwen3_w4a4_distributed_tp2
-  estimated_time: 202
- name: tests/e2e/multicard/2-cards/test_prefix_caching.py
-  estimated_time: 470
- name: tests/e2e/multicard/2-cards/test_quantization.py
-  estimated_time: 511
- name: tests/e2e/multicard/2-cards/test_qwen3_moe.py
-  estimated_time: 986
- name: tests/e2e/multicard/2-cards/test_qwen3_moe_routing_replay.py
-  estimated_time: 210
- name: tests/e2e/multicard/2-cards/test_single_request_aclgraph.py
-  estimated_time: 290
- name: tests/e2e/multicard/2-cards/test_disaggregated_encoder.py
-  estimated_time: 164
- name: tests/e2e/multicard/2-cards/test_sp_pass.py
-  estimated_time: 198
- name: tests/e2e/multicard/2-cards/test_sequence_parallelism_moe.py
-  estimated_time: 120
-e2e-multicard-4-cards:
- name: tests/e2e/multicard/4-cards/test_qwen3_next.py
-  estimated_time: 1868
- name: tests/e2e/multicard/4-cards/test_qwen3_5.py
-  estimated_time: 1030
- name: tests/e2e/multicard/4-cards/test_data_parallel_tp2.py
-  estimated_time: 306
- name: tests/e2e/multicard/4-cards/test_kimi_k2.py
-  estimated_time: 19
- name: tests/e2e/multicard/4-cards/long_sequence/test_accuracy.py
-  estimated_time: 1445
- name: tests/e2e/multicard/4-cards/long_sequence/test_basic.py
-  estimated_time: 2186
- name: tests/e2e/multicard/4-cards/long_sequence/test_chunked_prefill_cp.py
-  estimated_time: 1191
- name: tests/e2e/multicard/4-cards/long_sequence/test_prefix_caching_cp.py
-  estimated_time: 883
- name: tests/e2e/multicard/4-cards/long_sequence/test_mtp.py
-  estimated_time: 60
-  is_skipped: true
- name: tests/e2e/multicard/4-cards/spec_decode/test_mtp_qwen3_next.py
-  estimated_time: 1340
- name: tests/e2e/multicard/4-cards/test_pipeline_parallel.py
-  estimated_time: 357
--- a/.github/workflows/scripts/run_suite.py
+++ b/.github/workflows/scripts/run_suite.py
@@ -1,240 +0,0 @@
-import argparse
-import json
-import os
-import sys
-from datetime import datetime, timezone
-from pathlib import Path
-
-import tabulate
-import yaml
-from ci_utils import TestFile, TestRecord, run_tests
-
-_CONFIG_PATH = Path(__file__).parent / "config.yaml"
-
-
-def load_suites(config_path: Path = _CONFIG_PATH) -> dict[str, list[TestFile]]:
-    """Load all test suites from config.yaml."""
-    data = yaml.safe_load(config_path.read_text())
-    return {
-        suite_name: [
-            TestFile(
-                name=entry["name"],
-                estimated_time=entry.get("estimated_time", 60),
-                is_skipped=entry.get("is_skipped", False),
-            )
-            for entry in entries
-        ]
-        for suite_name, entries in data.items()
-    }
-
-
-def partition(files: list[TestFile], rank: int, size: int) -> list[TestFile]:
-    """
-    Split non-skipped files into `size` groups of approximately equal estimated
-    time using a greedy algorithm, and return the group at index `rank`.
-    Files within the returned group are sorted ascending by estimated_time.
-    """
-    active = [f for f in files if not f.is_skipped]
-    if not active or size <= 0 or size > len(active):
-        return []
-
-    # Sort descending by weight; use original index as tiebreaker to be stable
-    indexed = sorted(enumerate(active), key=lambda x: (-x[1].estimated_time, x[0]))
-
-    buckets: list[list[int]] = [[] for _ in range(size)]
-    sums = [0.0] * size
-
-    for idx, test in indexed:
-        lightest = sums.index(min(sums))
-        buckets[lightest].append(idx)
-        sums[lightest] += test.estimated_time
-
-    return sorted([active[i] for i in buckets[rank]], key=lambda f: f.estimated_time)
-
-
-def _find_project_root() -> Path:
-    root = Path.cwd()
-    if (root / "tests").exists():
-        return root
-    # Fall back: assume script lives at .github/workflows/scripts/
-    return Path(__file__).parents[3]
-
-
-def _minimal_covered_dirs(file_paths: set[str], root: Path) -> set[Path]:
-    """Return the minimal set of directories that covers all file_paths."""
-    dirs: set[Path] = set()
-    for fp in file_paths:
-        candidate = (root / fp).parent
-        if not candidate.exists():
-            continue
-        try:
-            rel = candidate.relative_to(root)
-        except ValueError:
-            continue
-        # Drop any existing entries that are subdirectories of rel
-        dirs = {d for d in dirs if rel not in d.parents}
-        # Only add rel if no ancestor already covers it
-        if not any(d == rel or d in rel.parents for d in dirs):
-            dirs.add(rel)
-    return dirs
-
-
-def sanity_check(suites: dict[str, list[TestFile]]) -> None:
-    """
-    Verify that:
-    1. Every test file in any suite exists on disk.
-    2. No test_*.py files exist on disk (in covered dirs) that are absent from all suites.
-    Raises SystemExit with a descriptive message on failure.
-    """
-    suite_files = {f.name.split("::")[0] for tests in suites.values() for f in tests}
-    root = _find_project_root()
-    covered = _minimal_covered_dirs(suite_files, root)
-
-    disk_files = {str(p.relative_to(root)) for d in covered for p in (root / d).rglob("test_*.py")}
-
-    missing_from_suite = sorted(disk_files - suite_files)
-    if missing_from_suite:
-        entries = "\n".join(f'  TestFile("{f}"),' for f in missing_from_suite)
-        raise SystemExit(f"Test files on disk are not in any suite (add them or mark is_skipped=True):\n{entries}")
-
-    missing_from_disk = sorted(suite_files - disk_files)
-    if missing_from_disk:
-        entries = "\n".join(f'  TestFile("{f}"),' for f in missing_from_disk)
-        raise SystemExit(f"Test files listed in suite do not exist on disk:\n{entries}")
-
-
-def _print_plan(
-    suite: str,
-    files: list[TestFile],
-    skipped: list[TestFile],
-    partition_info: str,
-) -> None:
-    print(tabulate.tabulate([[suite, partition_info]], headers=["Suite", "Partition"], tablefmt="psql"))
-    total_est = sum(f.estimated_time for f in files)
-    print(f"✅ Enabled {len(files)} test(s)  (est. total {total_est:.1f}s):")
-    for f in files:
-        print(f"  - {f.name}  (est={f.estimated_time}s)")
-    if skipped:
-        print(f"\n❌ Skipped {len(skipped)} test(s) (consider recovering):")
-        for f in skipped:
-            print(f"  - {f.name}")
-    print(flush=True)
-
-
-def _print_results(
-    suite: str,
-    records: list[TestRecord],
-    skipped: list[TestFile],
-    partition_info: str,
-) -> None:
-    print(tabulate.tabulate([[suite, partition_info]], headers=["Suite", "Partition"], tablefmt="psql"))
-    total_elapsed = sum(r.elapsed for r in records)
-    passed_count = sum(1 for r in records if r.passed)
-    print(f"Results: {passed_count}/{len(records)} passed  (actual total {total_elapsed:.1f}s):")
-    for r in records:
-        status = "✅ PASSED" if r.passed else "❌ FAILED"
-        print(f"  {status}  {r.name}  (actual={r.elapsed:.0f}s  est={r.estimated:.0f}s)")
-    if skipped:
-        print(f"\n❌ Skipped {len(skipped)} test(s) (consider recovering):")
-        for f in skipped:
-            print(f"  - {f.name}")
-    print(flush=True)
-
-
-def _save_timing_json(
-    records: list[TestRecord],
-    suite: str,
-    partition_id: int | None,
-    partition_size: int | None,
-    output_path: Path,
-) -> None:
-    passed_suites = [r.to_dict() for r in records if r.passed]
-    payload = {
-        "suite": suite,
-        "partition_id": partition_id,
-        "partition_size": partition_size,
-        "commit_sha": os.environ.get("GITHUB_SHA", ""),
-        "github_run_id": os.environ.get("GITHUB_RUN_ID", ""),
-        "timestamp": datetime.now(timezone.utc).isoformat(),
-        "tests": passed_suites,
-    }
-    output_path.write_text(json.dumps(payload, indent=2))
-    print(
-        f"Timing data written to {output_path}  ({len(passed_suites)}/{len(records)} passed)",
-        flush=True,
-    )
-
-
-def main() -> None:
-    suites = load_suites()
-
-    parser = argparse.ArgumentParser(description="Run a named e2e test suite")
-    parser.add_argument(
-        "--suite",
-        required=True,
-        choices=list(suites.keys()),
-        help="Name of the test suite to run",
-    )
-    parser.add_argument(
-        "--auto-partition-id",
-        type=int,
-        default=None,
-        metavar="ID",
-        help="Zero-based partition index (requires --auto-partition-size)",
-    )
-    parser.add_argument(
-        "--auto-partition-size",
-        type=int,
-        default=None,
-        metavar="N",
-        help="Total number of partitions",
-    )
-    parser.add_argument(
-        "--auto-upgrade-estimated-times",
-        action="store_true",
-        help="Automatically update estimated times in config.yaml based on actual timings (default: False) \
-If enabled, the script always exit with 0, even if some tests fail, since the primary purpose is to gather \
-timing data to improve estimates.",
-    )
-    parser.add_argument(
-        "--continue-on-error",
-        action="store_true",
-        help="Continue running after a test failure (default: True)",
-    )
-    parser.add_argument(
-        "--timing-report-json",
-        type=Path,
-        default=Path("test_timing_data.json"),
-        help="Path to write the JSON timing data for CI aggregation",
-    )
-    args = parser.parse_args()
-
-    sanity_check(suites)
-
-    all_files = suites[args.suite]
-    skipped = [f for f in all_files if f.is_skipped]
-
-    if args.auto_partition_size is not None:
-        files = partition(all_files, args.auto_partition_id, args.auto_partition_size)
-        partition_info = f"{args.auto_partition_id + 1}/{args.auto_partition_size}"
-    else:
-        files = [f for f in all_files if not f.is_skipped]
-        partition_info = "full"
-
-    _print_plan(args.suite, files, skipped, partition_info)
-
-    exit_code, records = run_tests(
-        files,
-        continue_on_error=args.continue_on_error,
-    )
-
-    _save_timing_json(records, args.suite, args.auto_partition_id, args.auto_partition_size, args.timing_report_json)
-
-    _print_results(args.suite, records, skipped, partition_info)
-    if args.auto_upgrade_estimated_times:
-        sys.exit(0)
-    sys.exit(exit_code)
-
-
-if __name__ == "__main__":
-    main()
--- a/.github/workflows/scripts/update_estimated_time.py
+++ b/.github/workflows/scripts/update_estimated_time.py
@@ -1,102 +0,0 @@
-#!/usr/bin/env python3
-"""
-Update estimated_time in config.yaml from CI timing data.
-
-Usage:
-    python3 update_estimated_time.py \
-        --timing-dir ./timing-artifacts \
-        --config .github/workflows/scripts/config.yaml
-"""
-
-import argparse
-import json
-from pathlib import Path
-
-import yaml
-
-
-def collect_timings(timing_dir: Path) -> dict[str, int]:
-    """
-    Recursively scan timing_dir for JSON files produced by run_suite.py.
-    Returns {test_name: elapsed_seconds} for all passed tests.
-    Warns if the same test name appears in multiple files.
-    """
-    json_files = list(timing_dir.rglob("*.json"))
-    print(f"Found {len(json_files)} timing file(s) in {timing_dir}")
-
-    timings: dict[str, int] = {}
-    for path in json_files:
-        try:
-            data = json.loads(path.read_text())
-        except (json.JSONDecodeError, OSError) as e:
-            print(f"  Warning: skipping {path}: {e}")
-            continue
-
-        for test in data.get("tests", []):
-            if not test.get("passed", False):
-                continue
-            name: str = test.get("name", "")
-            elapsed: float = test.get("elapsed", 0.0)
-            if not name or elapsed <= 0:
-                continue
-            if name in timings:
-                print(f"  Warning: duplicate entry for '{name}', overwriting {timings[name]}s with {int(elapsed)}s")
-            timings[name] = int(elapsed)
-
-    return timings
-
-
-def update_config(config_path: Path, timings: dict[str, int]) -> int:
-    """
-    Load config.yaml, update estimated_time for each test found in timings,
-    and write the result back. Returns the number of changed entries.
-    """
-    configs: dict = yaml.safe_load(config_path.read_text())
-
-    changed = 0
-    for suite_tests in configs.values():
-        for test in suite_tests:
-            name: str = test.get("name", "")
-            if name not in timings:
-                continue
-            old_time: int = test.get("estimated_time", 0)
-            new_time: int = timings[name]
-            if old_time == new_time:
-                continue
-            test["estimated_time"] = new_time
-            print(f"  {name}: {old_time}s -> {new_time}s")
-            changed += 1
-
-    config_path.write_text(yaml.dump(configs, default_flow_style=False, allow_unicode=True, sort_keys=False))
-    return changed
-
-
-def main() -> None:
-    parser = argparse.ArgumentParser(description="Update estimated_time in config.yaml from CI timing data")
-    parser.add_argument(
-        "--timing-dir",
-        required=True,
-        type=Path,
-        help="Directory containing timing JSON files (searched recursively)",
-    )
-    parser.add_argument(
-        "--config",
-        default=".github/workflows/scripts/config.yaml",
-        type=Path,
-        help="Path to config.yaml (default: .github/workflows/scripts/config.yaml)",
-    )
-    args = parser.parse_args()
-
-    timings = collect_timings(args.timing_dir)
-    if not timings:
-        print("No timing data collected. Exiting without changes.")
-        return
-
-    print(f"\nCollected timing data for {len(timings)} test(s).")
-    print(f"Updating {args.config}...")
-    changed = update_config(args.config, timings)
-    print(f"\nDone. {changed} estimated_time value(s) changed.")
-
-
-if __name__ == "__main__":
-    main()
--- a/.github/workflows/scripts/wheel/config.json
+++ b/.github/workflows/scripts/wheel/config.json
@@ -1,33 +0,0 @@
-{
-  "variables": {
-    "wheel_env": "WHEEL_FILE",
-    "pyproject_toml_env": "PROJECT_TOML",
-    "output_dir_env": "OUTPUT_DIR"
-  },
-  "jobs": [
-    {
-      "variant_label": "310p",
-      "properties": [
-        "ascend :: npu_type :: 310p",
-        "ascend :: cann_version :: 8.5.1"
-      ],
-      "skip_plugin_validation": true
-    },
-    {
-      "variant_label": "a2",
-      "properties": [
-        "ascend :: npu_type :: a2",
-        "ascend :: cann_version :: 8.5.1"
-      ],
-      "skip_plugin_validation": true
-    },
-    {
-      "variant_label": "a3",
-      "properties": [
-        "ascend :: npu_type :: a3",
-        "ascend :: cann_version :: 8.5.1"
-      ],
-      "skip_plugin_validation": true
-    }
-  ]
-}
--- a/Show More
+++ b/Show More
				`@@ -1 +0,0 @@`
				If you want to use the skills in this repo with Claude code, please copy the skills directory `.agents/skills` to this directory.