xc-llm-ascend/vllm_ascend at 3b7eb5179f2b5d52e9bd693095d51de604ba5ece - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

wangx700 3b7eb5179f [Bugfix] fix the incorrect use of python's sum on tensors. (#4655 )

### What this PR does / why we need it?
Fix the incorrect use of python's sum function on PyTorch tensors.
1. Using Python's sum() function on a tensor self.num_pcp_pads resulted
in 6ms execution time
Optimization: replacing with PyTorch's torch.sum() reduced execution
time to 474µs
2. scheduler_output.scheduled_spec_decode_tokens undergoes repeated loop
processing even when speculative decoding is not used

Optimization: added conditional logic to skip processing loops when
speculative decoding is disabled, eliminating unnecessary computational
overhead.


- vLLM version: 86e178f7c4d8c3b0eaf3c8e3f810a83f63b90e24
- vLLM main:
86e178f7c4

Signed-off-by: wangx700 <wangxin700@huawei.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>

2025-12-15 19:22:40 +08:00

..

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Refactor]3/N Refactor mla_v1.py & extract mla_cp (#4933 )

2025-12-15 12:59:18 +08:00

[Graph][Fusion] Add AddRMSNorm(with bias) and Quant Fusion Pattern (#5011 )

2025-12-15 18:37:56 +08:00

[bugfix][refactor] fix recompute_scheduler break with vllm 0.12.0 & support async scheduling & refactor recompute_scheduler.py (#4895 )

2025-12-11 22:24:49 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[Bugfix] Fix the bug in initializing the shared_weight communication domain in sfa-cp, and fix the mtp weight load in pp>1 situation (#4913 )

2025-12-15 16:21:49 +08:00

BugFix: Resolve PolicyFlashlb warm up function attribute error (#4741 )

2025-12-12 14:55:26 +08:00

upgrade vLLM to main (#4608 )

2025-12-02 22:10:52 +08:00

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

[CI] speed up ut (#4901 )

2025-12-11 18:45:43 +08:00

[Perf]enable prefill flashcommon3 (#4065 )

2025-12-14 09:34:13 +08:00

[main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#4932 )

2025-12-15 13:22:30 +08:00

[Bugfix] qwen3-vl-235b-w8a8 load weight ERROR when start service (#4292 )

2025-12-15 16:39:58 +08:00

[Performance] Pre-issued exponential distribution operator. (#4908 )

2025-12-11 23:02:51 +08:00

[Bugfix] Fix the bug in initializing the shared_weight communication domain in sfa-cp, and fix the mtp weight load in pp>1 situation (#4913 )

2025-12-15 16:21:49 +08:00

[Bugfix] fix the incorrect use of python's sum on tensors. (#4655 )

2025-12-15 19:22:40 +08:00

[Feat] Add Euler xlite graph wrapper support (#4526 )

2025-12-08 08:27:46 +08:00

__init__.py

clean up model module (#4611 )

2025-12-02 17:35:47 +08:00

ascend_config.py

[Perf]enable prefill flashcommon3 (#4065 )

2025-12-14 09:34:13 +08:00

ascend_forward_context.py

[Feature] model_runner refactor (#4764 )

2025-12-12 17:27:09 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[Feat] Add custom Embedding tensor model parallel (#2616 )

2025-12-12 14:41:20 +08:00

flash_common3_context.py

[Perf]enable prefill flashcommon3 (#4065 )

2025-12-14 09:34:13 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[bugfix][refactor] fix recompute_scheduler break with vllm 0.12.0 & support async scheduling & refactor recompute_scheduler.py (#4895 )

2025-12-11 22:24:49 +08:00

profiling_config.py

Drop ascend scheduler (#4623 )

2025-12-05 09:03:45 +08:00

utils.py

[Bugfix] Add support for PP intermediate value types in graph mode (#4902 )

2025-12-15 16:27:17 +08:00