Shanshan Shen
8326f15ecf
[CustomOp] Register AscendSharedFusedMoE custom op (#2980)
### What this PR does / why we need it?
Register `AscendSharedFusedMoE` custom op.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`DeepSeek-V2-Lite` is a MoE model with shared experts.
Test:
```bash
vllm serve /root/.cache/modelscope/hub/models/deepseek-ai/DeepSeek-V2-Lite \
--trust-remote-code \
--enforce-eager \
--no-enable-prefix-caching \
--gpu-memory-utilization 0.95
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "/root/.cache/modelscope/hub/models/deepseek-ai/DeepSeek-V2-Lite",
"messages": [
{"role": "user", "content": "介绍一下联通公司?"}
],
"stream": false,
"max_tokens": 100
}'
```
Output:
```bash
中国联合网络通信集团有限公司(简称“中国联通”)于2009年1月6日在原中国网通和原中国联通的基础上合并组建而成,在国内31个省(自治区、直辖市)和境外多个国家和地区设有分支机构,是中国唯一一家在纽约、香港、上海三地同时上市的电信运营企业,连续多年入选“世界500强企业”。\n\n中国联通主要经营固定通信业务,移动通信业务,国内
```
- vLLM version: v0.10.2
- vLLM main:
486c5599e3
---------
Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-09-19 19:05:01 +08:00
..
2025-09-19 14:05:36 +08:00
2025-09-13 11:58:52 +08:00
2025-09-11 21:20:09 +08:00
2025-07-26 17:15:47 +08:00
2025-09-16 01:17:42 +08:00
2025-09-19 19:05:01 +08:00
2025-06-09 19:28:11 +08:00
2025-09-16 01:17:42 +08:00
2025-09-17 10:36:43 +08:00
2025-09-16 22:31:38 +08:00
2025-09-18 14:09:19 +08:00
2025-09-18 14:09:19 +08:00
2025-09-12 16:58:08 +08:00
2025-09-13 11:58:52 +08:00
2025-08-07 09:15:49 +08:00
2025-09-16 01:17:42 +08:00
2025-09-08 17:31:53 +08:00