[v0.11.0-dev][Bugfix][cherry-pick]bugfix for weight load of kimi-k2 (#4190)

### What this PR does / why we need it? This is cherry-pick from #3798 Fix kimi-k2 start bug, weight load ERROR：https://github.com/vllm-project/vllm-ascend/issues/3785 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: c9461e05a4 --------- Signed-off-by: Levi-JQ <yujinqi2@huawei.com> Signed-off-by: menogrey <1299267905@qq.com> Co-authored-by: Levi <54832289+Levi-JQ@users.noreply.github.com> Co-authored-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
2025-11-14 15:43:22 +08:00
parent 51e5806d76
commit a7eb42cf0a
2 changed files with 12 additions and 1 deletions
--- a/.github/workflows/release_whl.yml
+++ b/.github/workflows/release_whl.yml
@@ -57,7 +57,13 @@ jobs:
    - name: Print
      run: |
        lscpu
-        
+
+    - name: Free up disk space
+      uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
+      with:
+        tool-cache: true
+        docker-images: false
+
    - name: Build wheel
      run: |
        ls
--- a/vllm_ascend/quantization/quant_config.py
+++ b/vllm_ascend/quantization/quant_config.py
@@ -193,6 +193,11 @@ packed_modules_model_mapping = {
        ["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"],
        "fused_qkv_a_proj": ["q_a_proj", "kv_a_proj_with_mqa"]
    },
+    "kimi_k2": {
+        "gate_up_proj": ["gate_proj", "up_proj"],
+        "experts":
+        ["experts.0.gate_proj", "experts.0.up_proj", "experts.0.down_proj"]
+    },
    "deepseek_v32": {
        "gate_up_proj": ["gate_proj", "up_proj"],
        "experts":