xc-llm-ascend

Files

Icey 14b39d3c70 [1/N][Refactor][Qwen3-Next] remove redundant Qwen3NextSparseMoeBlock and Qwen3NextAttention (#3019 )

### What this PR does / why we need it?
remove redundant Qwen3NextSparseMoeBlock and Qwen3NextAttention

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
```
def main():
    prompts = [
        "The future of AI is",
    ]

    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(
        # model="/root/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B",
        model="Qwen/Qwen3-Next-80B-A3B-Instruct",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=256,
              gpu_memory_utilization=0.7,
              block_size=64,
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

- vLLM version: v0.10.2
- vLLM main:
9d1c50a5ac

---------

Signed-off-by: Icey <1790571317@qq.com>

2025-09-22 11:24:08 +08:00

layers

[Feat][Graph] Support DeepSeek with ACL Graph (#2707 )

2025-09-16 17:50:17 +08:00

__init__.py

[main] addrmsnorm + quant fusion optim in Dense Models (#2772 )

2025-09-16 22:31:38 +08:00

deepseek_mtp.py

[3/N][Refactor][Quantization]remove packed_modules_mapping from models (#3021 )

2025-09-19 20:50:14 +08:00

deepseek_v2.py

[3/N][Refactor][Quantization]remove packed_modules_mapping from models (#3021 )