Files

jack 29e3cdde20 [Doc][Skill] Introduce AI-assisted model-adaptation workflow for vllm-ascend (#6731 )

### What this PR does / why we need it

This PR introduces the **first AI-assisted model-adaptation skill
package** for `vllm-ascend`.

The goal is to make model adaptation work (especially for recurring
feature-request issues) **repeatable, auditable, and easier to hand
off**.

### Scope in this PR

This PR adds only skill/workflow assets under:

- `.agents/skills/vllm-ascend-model-adapter/SKILL.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/workflow-checklist.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/troubleshooting.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/multimodal-ep-aclgraph-lessons.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/fp8-on-npu-lessons.md`
- `.agents/skills/vllm-ascend-model-adapter/references/deliverables.md`

### Workflow improvements

The skill standardizes:

1. **Environment assumptions** used in our Docker setup
- implementation roots: `/vllm-workspace/vllm` and
`/vllm-workspace/vllm-ascend`
- serving root: `/workspace`
- model path convention: `/models/<model-name>`

2. **Validation strategy**
- Stage A: fast `--load-format dummy` gate
- Stage B: mandatory real-weight gate before sign-off
- avoid false-ready by requiring request-level checks (not startup log
only)

3. **Feature-first verification checklist**
- ACLGraph / EP / flashcomm1 / MTP / multimodal
- explicit `supported / unsupported / not-applicable /
checkpoint-missing` outcomes

4. **Delivery contract**
- minimal scoped code changes
- required artifacts (Chinese report + runbook, e2e config YAML,
tutorial doc)
- one signed commit in delivery repo

### What this PR does NOT do

- No runtime/kernel/model patch is included in this PR.
- No direct model support claim is made by this PR alone.
- Model-specific adaptation/fix work should be submitted in follow-up
PRs using this skill as the workflow baseline.

### Why this matters for maintainers

This gives the repo a shared, explicit AI-assistance protocol, so future
model-adaptation PRs are easier to review, compare, and reproduce.

---------

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>

2026-02-26 08:48:15 +08:00

2.1 KiB

Raw Blame History

Multimodal + EP + ACLGraph Lessons

This note captures practical patterns that repeatedly matter for VL checkpoints on Ascend.

1) Out-of-box feature expectation

Try best to validate key features by default:

ACLGraph
MTP
multimodal (if model supports VL)
EP (MoE models only)
flashcomm1 (MoE models only)

If any feature fails, keep logs and explain the reason in the final report. For non-MoE models, EP/flashcomm1 should be marked not-applicable.

2) Validate in this order

Single text request success (/v1/models + /v1/chat/completions).
Single text+image request success.
Graph evidence (Replaying aclgraph) when graph mode is expected.
Capacity baseline: 128k + bs16.
Concurrency expansion if needed (32/64 suggested).

3) EP + graph startup expectations

Startup latency is much higher than eager due to:
- compile warmup
- graph capture rounds
- multimodal encoder profiling
Do not treat slow startup as failure unless logs show hard errors.

4) Always distinguish two max lengths

Theoretical max: from model config (max_position_embeddings).
Practical max: largest value that actually starts and serves on current hardware + TP/EP settings.

Report both values explicitly.

5) Multimodal testing with temporary layer reduction

Reducing num_hidden_layers can speed smoke tests.
This does not remove ViT structure itself.
Still require one full-layer validation before final sign-off.

6) Feature-status semantics

Use four categories:

✅ supported and verified
❌ framework-level unsupported
⚠️ checkpoint missing (weights/config do not provide feature)
N/A not-applicable (for example EP/flashcomm1 on non-MoE models)

Typical examples:

flashcomm1 on non-MoE VL models is often N/A or ❌ depending on framework gate.
MTP may be ⚠️ checkpoint missing even if framework has code paths.

7) Keep docs and defaults aligned with latest success path

If EP+graph is validated and requested/expected, it should be the default runbook path.
Eager mode should be documented as fallback/troubleshooting only.

2.1 KiB Raw Blame History