xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
wangxiyuan	f811a24bf0	Remove VLLM_USE_V1 (#4086 ) Drop VLLM_USE_V1 usage. This env has been removed from vLLM already. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-11 15:43:39 +08:00
zhangxinyuehfad	789ba4c5c2	[Doc] Update doc (#3836 ) ### What this PR does / why we need it? Update doc ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-10-29 11:03:39 +08:00
wangxiaoteng888	ca05f7d632	[Bugfix] TP size larger than KV cache head causes accuracy issues (#3366 ) ### What this PR does / why we need it? Resolve the issue where, in the case of unequal TP (Tensor Parallelism), the TP size is larger than the number of model attention kvcache heads, causing the KV cache to generate duplicates, which leads to transmission errors in the original code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By ci - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>	2025-10-11 11:22:23 +08:00

wangxiyuan

f811a24bf0

Remove VLLM_USE_V1 (#4086 )

Drop VLLM_USE_V1 usage.  This env has been removed from vLLM already.

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-11-11 15:43:39 +08:00

zhangxinyuehfad

789ba4c5c2

[Doc] Update doc (#3836 )

### What this PR does / why we need it?

Update doc

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

2025-10-29 11:03:39 +08:00

wangxiaoteng888

ca05f7d632

[Bugfix] TP size larger than KV cache head causes accuracy issues (#3366 )

### What this PR does / why we need it?
Resolve the issue where, in the case of unequal TP (Tensor Parallelism),
the TP size is larger than the number of model attention kvcache heads,
causing the KV cache to generate duplicates, which leads to transmission
errors in the original code.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: nwpu-zxr <zhouxuerong2@huawei.com>

2025-10-11 11:22:23 +08:00

3 Commits