xc-llm-ascend/ut at 2ee4f23f28e44ce0195c7aef6e7916ee0b8ef635 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Nengjun Ma 78fad4e348 [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (#6442 )

### What this PR does / why we need it?
Refactor MLP weight prefetch to consistency with MoE Model's prefetching
in terms of code and usage.
Environments VLLM_ASCEND_ENABLE_PREFETCH_MLP,
VLLM_ASCEND_MLP_DOWN_PREFETCH_SIZE and
VLLM_ASCEND_MLP_GATE_UP_PREFETCH_SIZE is removed, usage as following:

--additional-config '{"weight_prefetch_config": { "enabled": true,
"prefetch_ratio": {"mlp": { "gate_up": 1.0, "down": 1.0} }}}'

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>

2026-02-04 09:08:18 +08:00

..

[Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470 )

2026-02-02 15:57:55 +08:00

Reapply "[Refactor] Unify full-graph parameter update logic (#6041 )" (#6227 ) (#6231 )

2026-01-26 09:04:54 +08:00

[MM][Bugfix] Update hf_config to hf_text_config (#5319 )

2026-01-06 16:41:39 +08:00

device_allocator

[Refactor] Modify the binding logic to allocate CPU cores for each NPU card (#5555 )

2026-01-13 09:21:28 +08:00

[Refactor]Refactor of vllm_ascend/distributed module (#5719 )

2026-01-15 08:57:40 +08:00

[Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470 )

2026-02-02 15:57:55 +08:00

[CI] Add unit test framework (#1201 )

2025-06-16 18:32:28 +08:00

[P/D] Using the cache load operator to replace the index select operator. (#6295 )

2026-01-30 14:27:53 +08:00

model_loader/netloader

Revert "moe_gating_top_k" (#5512 )

2025-12-30 15:05:47 +08:00

[Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (#6442 )

2026-02-04 09:08:18 +08:00

patch/worker/patch_common

[Refactor] refactor patch module (#3555 )

2025-10-21 20:19:46 +08:00

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

[Refactor] Import global var form vllm instead of overwirte it (#5469 )

2026-01-07 18:41:45 +08:00

[Main2Main] Upgrade vllm commit to 0123 (#6169 )

2026-01-27 08:44:36 +08:00

[bugfix](CP) Fix and unify the PD request discrimination logic. (#5939 )

2026-01-31 10:26:02 +08:00

__init__.py

[2/4][Refactor] Refactor torchair utils (#1892 )

2025-07-21 19:43:30 +08:00

base.py

[Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926 )

2025-07-28 15:13:37 +08:00

conftest.py

[Main2Main] Upgrade vllm commit to 0105 (#5595 )

2026-01-06 08:44:29 +08:00

test_ascend_config.py

[Feature]refactor the npugraph_ex config, support online-infer with static kernel (#5775 )

2026-01-20 21:31:38 +08:00

test_envs.py

[Misc] Remove redundant imported envs, using envs_ascend instead (#2193 )

2025-08-14 09:33:39 +08:00

test_platform.py

[Misc] Removes unnecessary graph size re-initialization (#6280 )

2026-01-27 14:38:07 +08:00

test_utils.py

[Core][Misc] Clean up ProfileExecuteDuration (#6461 )

2026-02-01 20:06:01 +08:00