# vLLM Ascend Model Adapter Skill Adapt and debug models for vLLM on Ascend NPU — covering both already-supported architectures and new models not yet registered in vLLM. ## What it does This skill guides an AI agent through a deterministic workflow to: 1. Triage a model checkpoint (architecture, quant type, multimodal capability). 2. Implement minimal code changes in `/vllm-workspace/vllm` and `/vllm-workspace/vllm-ascend`. 3. Validate via a two-stage gate (dummy fast gate + real-weight mandatory gate). 4. Deliver one signed commit with code, test config, and tutorial doc. ## File layout | File | Purpose | | ---- | ------- | | `SKILL.md` | Skill definition, constraints, and execution playbook | | `references/workflow-checklist.md` | Step-by-step commands and templates | | `references/troubleshooting.md` | Symptom-action pairs for common failures | | `references/fp8-on-npu-lessons.md` | FP8 checkpoint handling on Ascend | | `references/multimodal-ep-aclgraph-lessons.md` | VL, EP, and ACLGraph patterns | | `references/deliverables.md` | Required outputs and commit discipline | ## Quick start 1. Open a conversation with the AI agent inside the vllm-ascend dev container. 2. Invoke the skill (e.g. `/vllm-ascend-model-adapter`). 3. Provide the model path (default `/models/`) and the originating issue number. 4. The agent follows the playbook in `SKILL.md` and produces a ready-to-merge commit. ## Key constraints - Never upgrade `transformers`. - Start `vllm serve` from `/workspace` (direct command, port 8000). - Dummy-only evidence is not sufficient — real-weight validation is mandatory. - Final delivery is exactly one signed commit in the current repo. ## Two-stage validation - **Stage A (dummy)**: fast architecture / operator / API path check with `--load-format dummy`. - **Stage B (real)**: real-weight loading, fp8/quant path, KV sharding, runtime stability. Both stages require request-level verification (`/v1/models` + at least one chat request), not just startup success.