### What this PR does / why we need it?
Adapt to the model type of Qwen3-VL-8B-Instruct-W8A8
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: betta18 <jiangmengyu1@huawei.com>
Co-authored-by: betta18 <jiangmengyu1@huawei.com>
### What this PR does / why we need it?
1. Add nightly test on MiniMax-M2.5 with deployment method on A3
2. Add MiniMax-M2.5 deployment introduction to vllm-ascend docs
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: limuyuan <limuyuan3@huawei.com>
Signed-off-by: SparrowMu <52023119+SparrowMu@users.noreply.github.com>
Co-authored-by: limuyuan <limuyuan3@huawei.com>
### What this PR does / why we need it?
Add Kimi-K2.5 weights download.
- vLLM version: v0.17.0
- vLLM main:
4497431df6
Signed-off-by: LoganJane <loganJane73@hotmail.com>
### What this PR does / why we need it?
Add an e2e test for QuaRot model with eagle3 that runs both the QuaRot
model and the float model, and then compares their acceptance rates. The
QuaRot model adapting eagle3 PR(#6914, #7038)
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
Signed-off-by: zhaomingyu <zhaomingyu13@h-partners.com>
### What this PR does / why we need it?
The pr aims to add new models like Qwen3.5-35B-A3B/Qwen3.5-27B to model
list for testing.
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
Signed-off-by: pppeng <60355449+ppppeng@users.noreply.github.com>
### What this PR does / why we need it?
Since the PVC files for Guiyang and Hong Kong are not shared, we need to
trigger the download of both regions simultaneously when downloading the
model to ensure that the models in all regions are synchronized.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
d68209402d
Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
### What this PR does / why we need it?
While using the LLM Compressor quantization tool from the VLLM community
to generate quantized weights, the VLLM Ascend engine needs to be
adapted to support the compressed tensors quantization format.
1. Support Moe model W4A8 dynamic weight.
- vLLM version: v0.13.0
- vLLM main:
bde38c11df
---------
Signed-off-by: LHXuuu <scut_xlh@163.com>
Signed-off-by: menogrey <1299267905@qq.com>
Co-authored-by: menogrey <1299267905@qq.com>
### What this PR does / why we need it?
While using the LLM Compressor quantization tool from the VLLM community
to generate quantized weights, the VLLM Ascend engine needs to be
adapted to support the compressed tensors quantization format.
1. Support Moe model W8A8 Int8 dynamic weight.
2. Specify W4A16 quantization configuration.
Co-authored-by: menogrey 1299267905@qq.com
Co-authored-by: kunpengW-code 1289706727@qq.com
### Does this PR introduce _any_ user-facing change?
No
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
---------
Signed-off-by: LHXuuu <scut_xlh@163.com>
Signed-off-by: menogrey <1299267905@qq.com>
Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Co-authored-by: menogrey <1299267905@qq.com>
Co-authored-by: Wang Kunpeng <1289706727@qq.com>
### What this PR does / why we need it?
1. Remove some useless but too large models from the shared volume
2. Add a new step to show current usage
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
Add the model to CI for subsequent addition of nightly test cases:
- moonshotai/Kimi-K2-Thinking
- vllm-ascend/Kimi-K2-Instruct-W8A8
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
Add a new workflow to allow developers submit a pull_request downloading
new models for CI cache
- vLLM version: release/v0.13.0
- vLLM main:
254f6b9867
Signed-off-by: wangli <wangli858794774@gmail.com>