Commit Graph

5 Commits

Author SHA1 Message Date
sdmyzlp
53c2d58ae1 Handle with_prefill_across_dp for multistream mla (#1322)
### What this PR does / why we need it?
After #1094, decode might be executed with non-compiled mode, despite of
`torchair_graph_config.enabled`, causing multistream mla to fail, which
assumes torchair compiled mode for decode when
`torchair_graph_config.enabled == True`.
Augment that assumption to fix this.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tested both offline, and by graph mode mla e2e testcase.

---------

Signed-off-by: sdmyzlp <lrwei2@petalmail.com>
2025-06-26 09:32:07 +08:00
sharonyunyun
941269a6c5 adjusting the communication method in graph mode (#1194)
### What this PR does / why we need it?
Communication performance optimization: replace allreduce with
reduce_scatter+all_gather in MLA layer's TP group,to remove
stridedsliced and all_gather in MOE layer.
when tp > 1, It is enabled during the decode phase of the graph mode
when enable_multistream_moe、MLA, use_v1, and MC2 are used.
According to the end-to-end RL inference test results, this PR can bring
3% gain in the decode stage.

**Before Improvement**
Profiling kernel_details

![image](https://github.com/user-attachments/assets/1bb5dfa1-809b-410a-90c9-c5fd23cff003)
Evaluation

![image](https://github.com/user-attachments/assets/0b8ea0c7-88e7-410f-9ef4-f0cfe910cdc7)

![image](https://github.com/user-attachments/assets/94fde910-c125-4c2e-8de4-88fc3fafc057)

**After Improvement**
Profiling kernel_details

![image](https://github.com/user-attachments/assets/55fac0e0-11f2-4654-8fd4-287949e0b29e)
Evaluation

![image](https://github.com/user-attachments/assets/e923f74b-29c4-4171-9382-40a00cf05df0)

![image](https://github.com/user-attachments/assets/5dba7967-07ea-4926-a8be-804bfd34e3e4)

### Does this PR introduce _any_ user-facing change?
Users need to configure enable_multistream_moe=True

### How was this patch tested?
Add e2e test cases to cover code logic

Signed-off-by: sharonyunyun <zhangying134@huawei.com>
2025-06-25 19:56:49 +08:00
Mengqing Cao
52317f92cb [DP] Tiny fix of dp and update example (#1273)
### What this PR does / why we need it?
Add `max_num_tokens_across_dp` to AscendMetadata to fix dp

This pr fixes the bug introduced by
https://github.com/vllm-project/vllm-ascend/pull/1229, which add an arg
`max_num_tokens_across_dp` when dp_size > 1.

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-06-25 11:03:04 +08:00
zxdukki
f04c6763d8 [Bugfix] fix env variable in dbo (#1284)
### What this PR does / why we need it?
Fix env variable in dbo to enable dbo in DeepSeek-V3 model. Besides, we
have fixed an known issue in deepseek-dbo.


### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
This patch can be tested with newly added e2e tests:
[tests/multicard/test_offline_inference_distributed.py](https://github.com/vllm-project/vllm-ascend/pull/1285/files#diff-7cd2e6b1bda6b8ad1bedb3276971fe7064aeae4dc0efd41c301c4ede2158c57e).
It can be verified with pytest.

---------

Signed-off-by: zhuohuan <zxdu1997@gmail.com>
2025-06-23 09:07:57 +08:00
wangxiyuan
69b817ed65 [CI] Add unit test framework (#1201)
This PR added the unit test framework to enable ut for vLLM Ascend. Unit
test runs on CPU machines. It'll be ran once lint check is passed the
same as e2e test.

For unit test, this PR created a new folder called `ut` under `tests`
module. All the test file in `ut` should keep the same with the code in
`vllm-ascend`. The file name should be start with `test_` prefix. For
example, in this PR. the `test_ascend_config.py` is added for
`ascend_config.py` test.

A new fille `worker/test_worker_v1.py` is also added as the placeholder.
This file should be the unit test for `vllm-ascend/worker/worker_v1.py`.

Additional, a new `fake_weight` folder is added, it contains the
config.json from `facebook/opt-125m`, so that the test will not always
visit huggingface.

TODO:
We should add all the unit test file one by one in the future.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-16 18:32:28 +08:00