Files
xc-llm-ascend/.agents/skills/vllm-ascend-model-adapter/references/multimodal-ep-aclgraph-lessons.md
jack 29e3cdde20 [Doc][Skill] Introduce AI-assisted model-adaptation workflow for vllm-ascend (#6731)
### What this PR does / why we need it

This PR introduces the **first AI-assisted model-adaptation skill
package** for `vllm-ascend`.

The goal is to make model adaptation work (especially for recurring
feature-request issues) **repeatable, auditable, and easier to hand
off**.

### Scope in this PR

This PR adds only skill/workflow assets under:

- `.agents/skills/vllm-ascend-model-adapter/SKILL.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/workflow-checklist.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/troubleshooting.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/multimodal-ep-aclgraph-lessons.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/fp8-on-npu-lessons.md`
- `.agents/skills/vllm-ascend-model-adapter/references/deliverables.md`

### Workflow improvements

The skill standardizes:

1. **Environment assumptions** used in our Docker setup
- implementation roots: `/vllm-workspace/vllm` and
`/vllm-workspace/vllm-ascend`
- serving root: `/workspace`
- model path convention: `/models/<model-name>`

2. **Validation strategy**
- Stage A: fast `--load-format dummy` gate
- Stage B: mandatory real-weight gate before sign-off
- avoid false-ready by requiring request-level checks (not startup log
only)

3. **Feature-first verification checklist**
- ACLGraph / EP / flashcomm1 / MTP / multimodal
- explicit `supported / unsupported / not-applicable /
checkpoint-missing` outcomes

4. **Delivery contract**
- minimal scoped code changes
- required artifacts (Chinese report + runbook, e2e config YAML,
tutorial doc)
- one signed commit in delivery repo

### What this PR does NOT do

- No runtime/kernel/model patch is included in this PR.
- No direct model support claim is made by this PR alone.
- Model-specific adaptation/fix work should be submitted in follow-up
PRs using this skill as the workflow baseline.

### Why this matters for maintainers

This gives the repo a shared, explicit AI-assistance protocol, so future
model-adaptation PRs are easier to review, compare, and reproduce.

---------

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
2026-02-26 08:48:15 +08:00

2.1 KiB

Multimodal + EP + ACLGraph Lessons

This note captures practical patterns that repeatedly matter for VL checkpoints on Ascend.

1) Out-of-box feature expectation

Try best to validate key features by default:

  • ACLGraph
  • MTP
  • multimodal (if model supports VL)
  • EP (MoE models only)
  • flashcomm1 (MoE models only)

If any feature fails, keep logs and explain the reason in the final report. For non-MoE models, EP/flashcomm1 should be marked not-applicable.

2) Validate in this order

  1. Single text request success (/v1/models + /v1/chat/completions).
  2. Single text+image request success.
  3. Graph evidence (Replaying aclgraph) when graph mode is expected.
  4. Capacity baseline: 128k + bs16.
  5. Concurrency expansion if needed (32/64 suggested).

3) EP + graph startup expectations

  • Startup latency is much higher than eager due to:
    • compile warmup
    • graph capture rounds
    • multimodal encoder profiling
  • Do not treat slow startup as failure unless logs show hard errors.

4) Always distinguish two max lengths

  • Theoretical max: from model config (max_position_embeddings).
  • Practical max: largest value that actually starts and serves on current hardware + TP/EP settings.

Report both values explicitly.

5) Multimodal testing with temporary layer reduction

  • Reducing num_hidden_layers can speed smoke tests.
  • This does not remove ViT structure itself.
  • Still require one full-layer validation before final sign-off.

6) Feature-status semantics

Use four categories:

  • supported and verified
  • framework-level unsupported
  • ⚠️ checkpoint missing (weights/config do not provide feature)
  • N/A not-applicable (for example EP/flashcomm1 on non-MoE models)

Typical examples:

  • flashcomm1 on non-MoE VL models is often N/A or depending on framework gate.
  • MTP may be ⚠️ checkpoint missing even if framework has code paths.

7) Keep docs and defaults aligned with latest success path

  • If EP+graph is validated and requested/expected, it should be the default runbook path.
  • Eager mode should be documented as fallback/troubleshooting only.