xc-llm-ascend

Files

NeverRaR df84cceca8 perf: use multicast to avoid padding decode request to prefill size (#1555 )

### What this PR does / why we need it?
perf: use multicast to avoid padding decode request to prefill size

### How was this patch tested?

- vLLM version: v0.9.1
- vLLM main:
1fd471e957

Signed-off-by: boying <897013703@qq.com>

2025-07-07 22:36:03 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

func_wrapper.py

[quantization] Support w8a8 quantization (#580 )

2025-04-20 18:14:05 +08:00

quant_config.py

support pangumoe w8a8c8 and docs (#1477 )

2025-06-28 18:51:07 +08:00

quantizer.py

Fix W8A8 fused moe bug (#1529 )

2025-07-02 16:40:51 +08:00

w8a8_dynamic.py

perf: use multicast to avoid padding decode request to prefill size (#1555 )

2025-07-07 22:36:03 +08:00

w8a8.py

[CORE]initial support for torchair with non-mla backend (#1506 )

2025-07-03 22:21:42 +08:00