Files

jack 29e3cdde20 [Doc][Skill] Introduce AI-assisted model-adaptation workflow for vllm-ascend (#6731 )

### What this PR does / why we need it

This PR introduces the **first AI-assisted model-adaptation skill
package** for `vllm-ascend`.

The goal is to make model adaptation work (especially for recurring
feature-request issues) **repeatable, auditable, and easier to hand
off**.

### Scope in this PR

This PR adds only skill/workflow assets under:

- `.agents/skills/vllm-ascend-model-adapter/SKILL.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/workflow-checklist.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/troubleshooting.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/multimodal-ep-aclgraph-lessons.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/fp8-on-npu-lessons.md`
- `.agents/skills/vllm-ascend-model-adapter/references/deliverables.md`

### Workflow improvements

The skill standardizes:

1. **Environment assumptions** used in our Docker setup
- implementation roots: `/vllm-workspace/vllm` and
`/vllm-workspace/vllm-ascend`
- serving root: `/workspace`
- model path convention: `/models/<model-name>`

2. **Validation strategy**
- Stage A: fast `--load-format dummy` gate
- Stage B: mandatory real-weight gate before sign-off
- avoid false-ready by requiring request-level checks (not startup log
only)

3. **Feature-first verification checklist**
- ACLGraph / EP / flashcomm1 / MTP / multimodal
- explicit `supported / unsupported / not-applicable /
checkpoint-missing` outcomes

4. **Delivery contract**
- minimal scoped code changes
- required artifacts (Chinese report + runbook, e2e config YAML,
tutorial doc)
- one signed commit in delivery repo

### What this PR does NOT do

- No runtime/kernel/model patch is included in this PR.
- No direct model support claim is made by this PR alone.
- Model-specific adaptation/fix work should be submitted in follow-up
PRs using this skill as the workflow baseline.

### Why this matters for maintainers

This gives the repo a shared, explicit AI-assistance protocol, so future
model-adaptation PRs are easier to review, compare, and reproduce.

---------

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>

2026-02-26 08:48:15 +08:00

2.8 KiB

Raw Blame History

Deliverables

Required outputs in current repo

One final signed commit (git commit -sm ...) containing the adaptation changes.
Chinese analysis report（精简但完整）:
- model architecture summary
- incompatibility root causes
- code changes and rationale
- startup and inference verification evidence
- feature status matrix（supported / unsupported / checkpoint-missing / not-applicable）
- max model len: config theoretical vs runtime practical
- dummy-vs-real validation matrix（what dummy proved / what only real proved）
- false-ready cases and final resolution path（if any）
- fallback ladder evidence（which fallback was tried, what changed）
Chinese compact runbook:
- how to start server in /workspace (direct command, default :8000)
- how to run OpenAI-compatible validation
- optional eager fallback command
- optional TORCHDYNAMO_DISABLE=1 fallback command (if relevant)
Test config YAML at tests/e2e/models/configs/<ModelName>.yaml — must include model_name, hardware, tasks with accuracy metrics (name + value), and num_fewshot. Use accuracy results from evaluation to populate metric values. Follow the schema of existing configs (e.g. Qwen3-8B.yaml).
Tutorial doc at docs/source/tutorials/models/<ModelName>.md — must follow the standard template: Introduction, Supported Features, Environment Preparation (with docker tabs for A2/A3), Deployment (with serve script), Functional Verification (with curl example), Accuracy Evaluation, Performance. Fill in model-specific details (HF path, hardware requirements, TP size, max-model-len, served-model-name, sample curl, accuracy table).
Post SKILL.md content or AI-assisted workflow summary as a comment on the originating GitHub issue.

Commit discipline

Keep one signed commit for code changes in the current working repo.
If implementation occurred in /vllm-workspace/*, backport minimal final diff to current repo before commit.
Keep diff scoped to target model adaptation.

Validation discipline

Always provide log file paths for key claims.
Keep docs synchronized with latest successful test mode (do not leave stale command variants as default).
Final report must include pass/fail reason for each key feature attempt: ACLGraph / EP / flashcomm1 / MTP / multimodal.
EP and flashcomm1 are MoE-only checks; for non-MoE models mark as not-applicable with evidence.
Final report should include baseline capacity result (128k + bs16) or explicit reason if not feasible.
Dummy-first can be used to speed up iterations, but real-weight gate is mandatory before final sign-off.
Startup-only evidence is insufficient; include first-request smoke results.

Suggested final response structure

What changed
What went well / what went wrong
Validation performed
Commit hash and changed files
Optional next step

2.8 KiB Raw Blame History Unescape Escape

Deliverables

Required outputs in current repo

Commit discipline

Validation discipline

Suggested final response structure

2.8 KiB

Raw Blame History