xc-llm-ascend

Files

curryliu ca8007f584 [Feature] Enable inference support for Deepseekr1-w8a8-MTP (#1994 )

Support the inference of the Deepseekr1-w8a8-mtp model with
statically-quantized shared_head in MTP layers.

- vLLM version: v0.9.2
- vLLM main:
6eca337ce0

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>

2025-07-29 18:51:57 +08:00

attention

[Perf] Avoid performing index selection of sin/cos cache every layer (#1890 )

2025-07-29 18:06:45 +08:00

compilation

[CI] Upgrade vllm to 0.9.1 (#1165 )

2025-06-11 16:33:11 +08:00

core

Disaggregate prefill for kv cache register style (#950 )

2025-07-26 17:15:47 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

distributed

[Misc]Remove PD v0 code (#2047 )

2025-07-28 19:09:22 +08:00

lora

[Bugfix] fix import error (#600 )

2025-04-22 08:57:25 +08:00

models

[Feature] Enable inference support for Deepseekr1-w8a8-MTP (#1994 )

2025-07-29 18:51:57 +08:00

multistream

[Misc] Fix logger bug (#2024 )

2025-07-28 15:59:09 +08:00

ops

[main][refactor] Refactoring forward_context and model_runner_v1 (#1979 )

2025-07-28 14:06:20 +08:00

patch

[Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926 )

2025-07-28 15:13:37 +08:00

quantization

[Feature] Enable inference support for Deepseekr1-w8a8-MTP (#1994 )

2025-07-29 18:51:57 +08:00

sample

[Misc] Fix logger bug (#2024 )

2025-07-28 15:59:09 +08:00

torchair

[main][refactor] Refactoring forward_context and model_runner_v1 (#1979 )

2025-07-28 14:06:20 +08:00

worker

[Perf] Avoid performing index selection of sin/cos cache every layer (#1890 )

2025-07-29 18:06:45 +08:00

__init__.py

[CI] Patch torch.library.infer_schema for fused moe ops to fix CI (#854 )

2025-05-14 19:49:09 +08:00

ascend_config.py

[Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681 )

2025-07-21 09:08:04 +08:00

ascend_forward_context.py

[main][refactor] Refactoring forward_context and model_runner_v1 (#1979 )

2025-07-28 14:06:20 +08:00

envs.py

[Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926 )

2025-07-28 15:13:37 +08:00

platform.py

[1/4][Refactor] Refactor torchair worker (#1885 )

2025-07-21 11:50:46 +08:00

soc_info.py

Disaggregate prefill for kv cache register style (#950 )

2025-07-26 17:15:47 +08:00

utils.py

[main][refactor] Refactoring forward_context and model_runner_v1 (#1979 )

2025-07-28 14:06:20 +08:00