xc-llm-ascend

Files

wemaster 0ae9ee0f8a [BUGFIX] main-sd-bugfix && [UT] add mtp UT (#593 )

### What this PR does / why we need it?
The pr will fix some bug about spec decode / MTP
The pr add a mtp e2e UT `test_mtp_correctness.py`

**vllm_ascend/attention/attention.py**
1. add support `self.attn_mask_cache` only has 1 element to cover scene
in which both spec docode and chunked prefill are enabled.

**vllm_ascend/distributed/parallel_state.py**
1. remove 2 assert because spec decode worker would use init_worker
twice

**vllm_ascend/models/deepseek_mtp.py**
1. remove unused params;
2. add support w8a8 in `CustomDeepSeekMTP`

**vllm_ascend/quantization/quant_config.py**
1. use `AscendUnquantizedFusedMoEMethod` instead of
`UnquantizedFusedMoEMethod`

**other**
1. replace `from vllm.logger import init_logger` to `from vllm.logger
import logger` all of the vllm-ascend project



### Does this PR introduce _any_ user-facing change?


### How was this patch tested?

Signed-off-by: mengwei805 <mengwei25@huawei.com>

2025-04-21 19:25:51 +08:00

device_communicators

Add pyhccl (#503 )

2025-04-17 14:57:52 +08:00

__init__.py

[Feature] Add PD separation feature (#432 )

2025-04-15 15:11:35 +08:00

communicator.py

[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 )

2025-04-17 14:59:56 +08:00

llmdatadist_connector.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00

parallel_state.py

[BUGFIX] main-sd-bugfix && [UT] add mtp UT (#593 )

2025-04-21 19:25:51 +08:00