xc-llm-ascend

Files

whx d7e19ed57a [BugFix] fix length of sin/cos cache in rope (#1266 )

This PR fixes the bug that constructs shorter sin/cos cache than model's
max positional embedding.

Closes: https://github.com/vllm-project/vllm-ascend/issues/1038

Signed-off-by: whx-sjtu <2952154980@qq.com>

2025-06-17 23:14:25 +08:00

__init__.py

[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836 )

2025-05-29 11:58:26 +08:00

activation.py

Optimize qwen2_vl and qwen2_5_vl (#701 )

2025-04-30 14:22:38 +08:00

attention.py

[1/N][UT][v1 MTP] add basic v1 mtp features (#890 )

2025-05-30 08:59:58 +08:00

cache.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00

common_fused_moe.py

[Attention][Kernel]moe support for llama4 and mllama4 (#740 )

2025-05-13 19:12:40 +08:00

expert_load_balancer.py

Add static EPLB (#1116 )

2025-06-09 19:28:11 +08:00

fused_moe.py

[refactor] Refactoring AscendFusedMoE (#1229 )

2025-06-17 17:49:03 +08:00

layernorm.py

[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 )

2025-04-17 14:59:56 +08:00

rotary_embedding.py

[BugFix] fix length of sin/cos cache in rope (#1266 )

2025-06-17 23:14:25 +08:00

vocab_parallel_embedding.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00