[BUGFIX] main-sd-bugfix && [UT] add mtp UT (#593)
### What this PR does / why we need it? The pr will fix some bug about spec decode / MTP The pr add a mtp e2e UT `test_mtp_correctness.py` **vllm_ascend/attention/attention.py** 1. add support `self.attn_mask_cache` only has 1 element to cover scene in which both spec docode and chunked prefill are enabled. **vllm_ascend/distributed/parallel_state.py** 1. remove 2 assert because spec decode worker would use init_worker twice **vllm_ascend/models/deepseek_mtp.py** 1. remove unused params; 2. add support w8a8 in `CustomDeepSeekMTP` **vllm_ascend/quantization/quant_config.py** 1. use `AscendUnquantizedFusedMoEMethod` instead of `UnquantizedFusedMoEMethod` **other** 1. replace `from vllm.logger import init_logger` to `from vllm.logger import logger` all of the vllm-ascend project ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Signed-off-by: mengwei805 <mengwei25@huawei.com>
This commit is contained in:
3
.github/workflows/vllm_ascend_test.yaml
vendored
3
.github/workflows/vllm_ascend_test.yaml
vendored
@@ -161,7 +161,8 @@ jobs:
|
||||
if: steps.filter_spec_decode.outputs.speculative_tests_changed == 'true'
|
||||
run: |
|
||||
if [[ "${{ matrix.os }}" == "linux-arm64-npu-1" ]]; then
|
||||
pytest -sv tests/singlecard/spec_decode
|
||||
pytest -sv tests/singlecard/spec_decode/e2e/test_mtp_correctness.py # it needs a clean process
|
||||
pytest -sv tests/singlecard/spec_decode --ignore=tests/singlecard/spec_decode/e2e/test_mtp_correctness.py
|
||||
fi
|
||||
|
||||
- name: Run vllm-project/vllm test for V0 Engine
|
||||
|
||||
Reference in New Issue
Block a user