xc-llm-ascend

Files

Shanshan Shen d2f87ed9cc [Patch] Remove spec_decode.metrics patch (#1016 )

### What this PR does / why we need it?
Remove `spec_decode.metrics` patch as this has been resolved in
https://github.com/vllm-project/vllm/pull/16983 (include in vllm
`v0.9.0`).

Returns a CUDA event recording when the copy is complete **--after
modified-->** Returns a device event (NPU Event for vllm-ascend)
recording when the copy is complete.

Signed-off-by: shen-shanshan <467638484@qq.com>

2025-06-09 15:05:11 +08:00

__init__.py

[Patch] Remove spec_decode.metrics patch (#1016 )

2025-06-09 15:05:11 +08:00

patch_distributed.py

[BugFix]add all2all when dp_size > 1 && downgrade npu_dequant_swiglu_quant (#819 )

2025-05-15 09:19:55 +08:00

patch_eagle.py

Spec decode support for V1 Engine (#874 )

2025-05-23 14:25:46 +08:00

patch_minicpm.py

[Model][MiniCPM] support MiniCPM (#645 )