Files
xc-llm-ascend/.agents/skills/vllm-ascend-model-adapter/references/deliverables.md
jack 29e3cdde20 [Doc][Skill] Introduce AI-assisted model-adaptation workflow for vllm-ascend (#6731)
### What this PR does / why we need it

This PR introduces the **first AI-assisted model-adaptation skill
package** for `vllm-ascend`.

The goal is to make model adaptation work (especially for recurring
feature-request issues) **repeatable, auditable, and easier to hand
off**.

### Scope in this PR

This PR adds only skill/workflow assets under:

- `.agents/skills/vllm-ascend-model-adapter/SKILL.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/workflow-checklist.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/troubleshooting.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/multimodal-ep-aclgraph-lessons.md`
-
`.agents/skills/vllm-ascend-model-adapter/references/fp8-on-npu-lessons.md`
- `.agents/skills/vllm-ascend-model-adapter/references/deliverables.md`

### Workflow improvements

The skill standardizes:

1. **Environment assumptions** used in our Docker setup
- implementation roots: `/vllm-workspace/vllm` and
`/vllm-workspace/vllm-ascend`
- serving root: `/workspace`
- model path convention: `/models/<model-name>`

2. **Validation strategy**
- Stage A: fast `--load-format dummy` gate
- Stage B: mandatory real-weight gate before sign-off
- avoid false-ready by requiring request-level checks (not startup log
only)

3. **Feature-first verification checklist**
- ACLGraph / EP / flashcomm1 / MTP / multimodal
- explicit `supported / unsupported / not-applicable /
checkpoint-missing` outcomes

4. **Delivery contract**
- minimal scoped code changes
- required artifacts (Chinese report + runbook, e2e config YAML,
tutorial doc)
- one signed commit in delivery repo

### What this PR does NOT do

- No runtime/kernel/model patch is included in this PR.
- No direct model support claim is made by this PR alone.
- Model-specific adaptation/fix work should be submitted in follow-up
PRs using this skill as the workflow baseline.

### Why this matters for maintainers

This gives the repo a shared, explicit AI-assistance protocol, so future
model-adaptation PRs are easier to review, compare, and reproduce.

---------

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
2026-02-26 08:48:15 +08:00

2.8 KiB
Raw Blame History

Deliverables

Required outputs in current repo

  1. One final signed commit (git commit -sm ...) containing the adaptation changes.
  2. Chinese analysis report精简但完整:
    • model architecture summary
    • incompatibility root causes
    • code changes and rationale
    • startup and inference verification evidence
    • feature status matrixsupported / unsupported / checkpoint-missing / not-applicable
    • max model len: config theoretical vs runtime practical
    • dummy-vs-real validation matrixwhat dummy proved / what only real proved
    • false-ready cases and final resolution pathif any
    • fallback ladder evidencewhich fallback was tried, what changed
  3. Chinese compact runbook:
    • how to start server in /workspace (direct command, default :8000)
    • how to run OpenAI-compatible validation
    • optional eager fallback command
    • optional TORCHDYNAMO_DISABLE=1 fallback command (if relevant)
  4. Test config YAML at tests/e2e/models/configs/<ModelName>.yaml — must include model_name, hardware, tasks with accuracy metrics (name + value), and num_fewshot. Use accuracy results from evaluation to populate metric values. Follow the schema of existing configs (e.g. Qwen3-8B.yaml).
  5. Tutorial doc at docs/source/tutorials/models/<ModelName>.md — must follow the standard template: Introduction, Supported Features, Environment Preparation (with docker tabs for A2/A3), Deployment (with serve script), Functional Verification (with curl example), Accuracy Evaluation, Performance. Fill in model-specific details (HF path, hardware requirements, TP size, max-model-len, served-model-name, sample curl, accuracy table).
  6. Post SKILL.md content or AI-assisted workflow summary as a comment on the originating GitHub issue.

Commit discipline

  • Keep one signed commit for code changes in the current working repo.
  • If implementation occurred in /vllm-workspace/*, backport minimal final diff to current repo before commit.
  • Keep diff scoped to target model adaptation.

Validation discipline

  • Always provide log file paths for key claims.
  • Keep docs synchronized with latest successful test mode (do not leave stale command variants as default).
  • Final report must include pass/fail reason for each key feature attempt: ACLGraph / EP / flashcomm1 / MTP / multimodal.
  • EP and flashcomm1 are MoE-only checks; for non-MoE models mark as not-applicable with evidence.
  • Final report should include baseline capacity result (128k + bs16) or explicit reason if not feasible.
  • Dummy-first can be used to speed up iterations, but real-weight gate is mandatory before final sign-off.
  • Startup-only evidence is insufficient; include first-request smoke results.

Suggested final response structure

  • What changed
  • What went well / what went wrong
  • Validation performed
  • Commit hash and changed files
  • Optional next step