xc-llm-ascend/vllm_ascend at 18eefc23c3fd7275c6d2a9f540de80f2bf5e7f5f - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Slightwind 18eefc23c3 [feature] Support W8A8 PD-Mix Quantization (#4235 )

In PD-separated deployment scenarios:

* MoE layers use dynamic quantization exclusively.
* For the Attention module, Prefill (P) nodes use **dynamic**
quantization, while Decode (D) nodes use **static** quantization.

In PD-mixed deployment scenarios:
* **All components fall back to dynamic quantization**, as it is
difficult to distinguish between Prefill and Decode tokens.
___

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Signed-off-by: Slightwind <slightwindsec@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

2025-11-30 11:57:26 +08:00

..

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

2025-11-29 09:20:22 +08:00

upgrade to vllm 0.11.2 (#4400 )

2025-11-26 11:48:58 +08:00

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[Bugfix] Fix kvpool precision synchronization (#4574 )

2025-11-30 09:39:07 +08:00

[bugfix] dep ineffective (#4417 )

2025-11-29 15:18:29 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

remove qwen3-next model file (#4573 )

2025-11-29 18:37:26 +08:00

[feature] Support W8A8 PD-Mix Quantization (#4235 )

2025-11-30 11:57:26 +08:00

[Bugfix] fix dp parallel + tp > 1 offline inference port conflict (#4539 )

2025-11-29 18:37:11 +08:00

[feature] Support W8A8 PD-Mix Quantization (#4235 )

2025-11-30 11:57:26 +08:00

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

remove qwen3-next model file (#4573 )

2025-11-29 18:37:26 +08:00

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

[Bugfix] Fix kvpool precision synchronization (#4574 )

2025-11-30 09:39:07 +08:00

__init__.py

[Misc][Doc] Add service profiling feature with user guide (#3756 )

2025-11-12 09:07:14 +08:00

ascend_config.py

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

ascend_forward_context.py

[Refactor] remove moe type of multicast. (#4224 )

2025-11-24 17:32:37 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

profiling_config.py

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

utils.py

Move mla to ops module (#4575 )

2025-11-29 18:36:55 +08:00