xc-llm-ascend/vllm_ascend at 07014e2101ce5bd2d9d3198f2e5fab9f2717975a - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

panchao-hub 8069442b41 enable npugraph_ex (#5120 )

### What this PR does / why we need it?
We will expose the enabling switch for npugraph_ex to better facilitate
subsequent optimization.

### Does this PR introduce _any_ user-facing change?
Previously, the enable_npugraph_ex switch would trigger an error; now we
have removed the error reporting mechanism to better facilitate
subsequent optimization efforts.
Basic functionalities are available in CANN and torch_npu for Q3, while
advanced optimizations will depend on the Q4 release.

### How was this patch tested?
llm =LLM(
    model=model,
    enforce_eager=False ,
        additional_config={
        "enable_npugraph_ex":  True
        },
        compilation_config={
            "cudagraph_mode": "FULL_DECODE_ONLY",
            "cudagraph_capture_sizes": [16],
        },
}


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: p00465316 <panchao13@huawei.com>
Co-authored-by: p00465316 <panchao13@huawei.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>

2025-12-18 09:08:40 +08:00

..

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Refactor] 4/N Distinguish the branches based on the applicable scenarios of PA and FIA Ops. (#5081 )

2025-12-17 23:14:02 +08:00

enable npugraph_ex (#5120 )

2025-12-18 09:08:40 +08:00

[bugfix][refactor] fix recompute_scheduler break with vllm 0.12.0 & support async scheduling & refactor recompute_scheduler.py (#4895 )

2025-12-11 22:24:49 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[BugFix] Fix mooncake bug in PCP scenario (#5055 )

2025-12-17 16:32:16 +08:00

[Misc] Upgrade vllm hash to 12_14 (#5000 )

2025-12-15 19:54:23 +08:00

upgrade vLLM to main (#4608 )

2025-12-02 22:10:52 +08:00

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

[CI] speed up ut (#4901 )

2025-12-11 18:45:43 +08:00

[Pangu][MoE] Remove PanguProMoEV1 related code (#5088 )

2025-12-17 16:14:42 +08:00

[Refactor] Remove the process patches of Qwen2.5-VL and Qwen2.5-Omni (#5035 )

2025-12-16 11:43:52 +08:00

[model] Support PanguUltraMoE (#4615 )

2025-12-17 16:15:29 +08:00

[Feat] Refactor rejection sampler (#4975 )

2025-12-16 11:32:26 +08:00

[Fix] Synchronize the host query_start_loc with device values to prevent shape mismatches (#5134 )

2025-12-17 23:50:12 +08:00

fix profile run for vl model (#5136 )

2025-12-17 23:51:31 +08:00

[Feat] Add Euler xlite graph wrapper support (#4526 )

2025-12-08 08:27:46 +08:00

__init__.py

clean up model module (#4611 )

2025-12-02 17:35:47 +08:00

ascend_config.py

enable npugraph_ex (#5120 )

2025-12-18 09:08:40 +08:00

ascend_forward_context.py

[Bugfix][MoE] Remove All2All in w4a8_dynamic (#4977 )

2025-12-17 17:39:57 +08:00

cpu_binding.py

[main] support cpu binding (#3546 )

2025-10-21 09:17:03 +08:00

envs.py

[Feat] Add custom Embedding tensor model parallel (#2616 )

2025-12-12 14:41:20 +08:00

flash_common3_context.py

[Perf]enable prefill flashcommon3 (#4065 )

2025-12-14 09:34:13 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

Upgrade vllm commit hash to 1216 (#5053 )

2025-12-17 08:48:36 +08:00

profiling_config.py

Drop ascend scheduler (#4623 )

2025-12-05 09:03:45 +08:00

utils.py

[main] rename device type (#5099 )

2025-12-17 14:08:19 +08:00