Commit Graph

5 Commits

Author SHA1 Message Date
hfadzxy
9935d45728 [CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460)
### What this PR does / why we need it?
Add model basic accuracy test(Qwen2.5-0.5B-Instruct)

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-04-17 14:59:56 +08:00
Pleaplusone
ce8259975e [core] Support custom ascendc kernels in vllm-ascend (#233)
This PR add custom ascendc kernel rotary_embedding support in
vllm-ascend, related CMakeLists and setuptools is also added in this PR.

Related: https://github.com/vllm-project/vllm-ascend/issues/156

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-04-03 14:52:34 +08:00
wangxiyuan
31f29b9f30 [Core] Make V1 work and enable V1 engine test (#389)
1. Make sure the version is string before parse in collect_env
2. Add basic V1 engine test

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-28 19:34:23 +08:00
wangxiyuan
663dca7578 [CI] fix race condition problem (#353)
fix race condition problem

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-19 17:04:36 +08:00
yiz-liu
0db6670bfa [Feature] Implement EP-compatible fused_moe (#121)
### What this PR does / why we need it?

Enable Expert-Parallel for ascend devices.

### Does this PR introduce _any_ user-facing change?

Enable EP
add `enable_expert_parallel=True` in your offline inference scripts,
like this:
```python
llm = LLM(
    model="/path/to/model",
    trust_remote_code=True,
    tensor_parallel_size=4,
    max_model_len=4096,
    enforce_eager=True,
    distributed_executor_backend="mp",
    enable_expert_parallel=True,
)
```

### How was this patch tested?

Please use the `main` branch of vLLM.

---------

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>
2025-03-11 21:08:02 +08:00