xc-llm-ascend

Files

Feng Liu 1858f3d36e [Bugfix] Fix Qwen P/D Disaggregation accuracy issue (#5340 )

### What this PR does / why we need it?
Fix Qwen P/D Disaggregation accuracy issue

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

Signed-off-by: F.Liu <liufeng248@huawei.com>
Co-authored-by: F.Liu <liufeng248@huawei.com>

2025-12-25 22:46:08 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_cp.py

[Bugfix] Fix Qwen P/D Disaggregation accuracy issue (#5340 )

2025-12-25 22:46:08 +08:00

attention_mask.py

[Model] Support pooling models (#3122 )

2025-12-10 11:37:57 +08:00

attention_v1.py

Revert [KV-Sharing] Support KV-Sharing feature in CLA models (#4138 ) (#5317 )

2025-12-24 22:24:17 +08:00

common_cp.py

[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097 )

2025-12-24 10:25:19 +08:00

mla_cp.py

[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097 )

2025-12-24 10:25:19 +08:00

mla_v1.py

[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097 )

2025-12-24 10:25:19 +08:00

sfa_v1.py

[main][Refactor] Remove with_prefill parameter from set_ascend_forward_context (#5094 )

2025-12-23 14:30:50 +08:00

utils.py

[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097 )

2025-12-24 10:25:19 +08:00