xc-llm-ascend

Files

yiz-liu 0db6670bfa [Feature] Implement EP-compatible fused_moe (#121 )

### What this PR does / why we need it?

Enable Expert-Parallel for ascend devices.

### Does this PR introduce _any_ user-facing change?

Enable EP
add `enable_expert_parallel=True` in your offline inference scripts,
like this:
```python
llm = LLM(
    model="/path/to/model",
    trust_remote_code=True,
    tensor_parallel_size=4,
    max_model_len=4096,
    enforce_eager=True,
    distributed_executor_backend="mp",
    enable_expert_parallel=True,
)
```

### How was this patch tested?

Please use the `main` branch of vLLM.

---------

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>

2025-03-11 21:08:02 +08:00

ops

[Feature] Implement EP-compatible fused_moe (#121 )

2025-03-11 21:08:02 +08:00

conftest.py

[Core] Init vllm-ascend (#3 )

2025-02-05 10:53:12 +08:00

model_utils.py

[Core] Init vllm-ascend (#3 )

2025-02-05 10:53:12 +08:00

test_offline_inference.py

[CI] Add dispatch job to leverage dynamic devices (#251 )

2025-03-07 09:47:13 +08:00