xc-llm-ascend/vllm_ascend at 2b3bfe432e886b4773ef5cfa33a0e69b2c7d5b6d - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

weijinqian0 2b3bfe432e [bugfix] Repair the problem of moe model accuracy caused by version upgrade. (#4562 )

Repair the problem of moe model accuracy caused by version upgrade.

Reason:
The new version adds the "reduce_output" operation after "forward_impl".

Then we have fully taken over the implementation of the FusedMoe module.


- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>

2025-11-30 06:12:39 +08:00

..

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

2025-11-29 09:20:22 +08:00

upgrade to vllm 0.11.2 (#4400 )

2025-11-26 11:48:58 +08:00

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[feature]Pooling Features and PCP Adaptation (#4143 )

2025-11-29 22:07:45 +08:00

[bugfix] dep ineffective (#4417 )

2025-11-29 15:18:29 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

Drop 0.11.0 support (#4377 )

2025-11-24 17:08:20 +08:00

remove qwen3-next model file (#4573 )

2025-11-29 18:37:26 +08:00

[bugfix] Repair the problem of moe model accuracy caused by version upgrade. (#4562 )

2025-11-30 06:12:39 +08:00

[Bugfix] fix dp parallel + tp > 1 offline inference port conflict (#4539 )

2025-11-29 18:37:11 +08:00

[Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036 )

2025-11-28 14:09:39 +08:00

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

remove qwen3-next model file (#4573 )

2025-11-29 18:37:26 +08:00

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

__init__.py

[Misc][Doc] Add service profiling feature with user guide (#3756 )

2025-11-12 09:07:14 +08:00

ascend_config.py

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

ascend_forward_context.py

[Refactor] remove moe type of multicast. (#4224 )

2025-11-24 17:32:37 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

profiling_config.py

Revert "drop ascend scheduler" (#4580 )

2025-11-29 22:20:48 +08:00

utils.py

Move mla to ops module (#4575 )

2025-11-29 18:36:55 +08:00