### What this PR does / why we need it?
convert the format of gmm to nz
### Does this PR introduce _any_ user-facing change?
not involved
### How was this patch tested?
ut: test_fused_ops.py and e2e: test_fused_moe.py
**performance**:
(qwen3 30B, 2k->20k)
base:
Total Token throughput (tok/s): 719.93
gmm nz:
Total Token throughput (tok/s): 728.52
- vLLM version: v0.10.1.1
- vLLM main:
bfc1edc9f5
Signed-off-by: huangxialu <huangxialu1@huawei.com>
### What this PR does / why we need it?
Integrate the arange operator to reduce the time spent and improve
performance
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.10.1.1
- vLLM main:
56dcf4e7e9
---------
Signed-off-by: s30076806 <songjiayang2@h-partners.com>
### What this PR does / why we need it?
Fix some ci issue and refactor modelrunner
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed with existing test.
- vLLM version: v0.10.0
- vLLM main:
4d9c61993a
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
Co-authored-by: weiguihua2 <weiguihua2@huawei.com>
### What this PR does / why we need it?
this pr refactor select_experts of moe module
i merge implementations of quantitative and non-quantitative method in a
new class
use such as vllm like ExpertsSelector.select_experts
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
test in qwen3-moe and all ut.
- vLLM version: v0.10.0
- vLLM main:
e18859298d
Signed-off-by: yangcheng <yangcheng104@huawei.com>
Co-authored-by: yangcheng (AJ) <y00806874@china.huawei.com>
### What this PR does / why we need it?
Fix AscendFusedMoE init error. Use `super().__init__()` instead of
`super(FusedMoE, self).__init__()` to ensure the member variables in
base class could be called by the children class
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
CI passed with new existing test.
- vLLM version: v0.10.0
- vLLM main:
766bc8162c
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
torch_npu.npu_grouped_matmul:
https://www.hiascend.com/document/detail/zh/Pytorch/710/apiref/torchnpuCustomsapi/context/torch_npu-npu_grouped_matmul.md
According to the document, when `split_item` is 2 or 3,
`torch_npu.npu_grouped_matmul` will return a list which has one element.
Therefore, the `torch.cat` after `torch_npu.npu_grouped_matmul` is
unnecessary.
### Does this PR introduce _any_ user-facing change?
not involved
### How was this patch tested?
ut and e2e covered: `tests/ut/ops/test_fused_ops.py`,
`tests/e2e/singlecard/ops/test_fused_moe.py`
**performance**:
(qwen3 30B, 2k->20k)
base:
Total Token throughput (tok/s): 667.76
remove cat:
Total Token throughput (tok/s): 680.82
- vLLM version: v0.10.0
- vLLM main:
fa00c5d75b
Signed-off-by: huangxialu <huangxialu1@huawei.com>
we recently added disaggregated_prefill and ascend_forward_context
feature by
ba3dfbd59e
and
df0ec55162.
This PR fix some nit introduced by them to make the code clear.
1. drop `current_platform` usage. It'll lead unknown circular import
error in some case
2. update `set_ascend_forward_context` function to make the logic clear.
for example, remove V0 support in this function.
3. Remove useless `self.local_rank_across_dp` in worker
4. Remove `soc_info.py` to use `get_ascend_soc_version` instead.
- vLLM version: v0.10.0
- vLLM main:
02f82fe438
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
A refactoring of forward_context and model_runner_v1, add some context
which is necessary in model inference into forward_context, and refactor
dummy_run logic, make it more reasonable.
Some details for this PR:
Add `ascend_forward_context`;
Update mc2_v2 op, and support `active_mask` param;
Update scripts in examples dir;
refactor `dummy_run` logic;
Add soc_version for A2 and A3;
### Does this PR introduce _any_ user-facing change?
No change at user-facing.
### How was this patch tested?
- vLLM version: v0.10.0
- vLLM main:
57c22e57f9
Signed-off-by: zzzzwwjj <1183291235@qq.com>
What this PR does / why we need it?
Add uts for deepseek_v2
Does this PR introduce any user-facing change?
No
How was this patch tested?
- vLLM version: v0.9.2
- vLLM main:
f3137cdd81
---------
Signed-off-by: 张帮政 <zhangbangzheng@huawei.com>