xc-llm-ascend

Files

Shanshan Shen 93754d8061 [Bugfix] Fix long context seq accuracy problem for GLM4.5 (#2601 )

### What this PR does / why we need it?

Fix long context seq accuracy problem for `GLM4.5`.

When `max_tokens=1000`, there is cyclic output problem like:

```bash
00 00 00 00 00 00 00 00 00 00 00 00 00 00
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

```python
import os

os.environ["VLLM_USE_MODELSCOPE"] = "True"
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"

from vllm import LLM, SamplingParams

def main():
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=1000, temperature=0.0)
    # Create an LLM.
    llm = LLM(model="/root/.cache/modelscope/hub/models/ZhipuAI/GLM-4___5",
              tensor_parallel_size=8,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=1024)

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

if __name__ == "__main__":
    main()
```

- vLLM version: v0.10.1.1
- vLLM main:
0235103cbb

---------

Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
Signed-off-by: shen-shanshan <467638484@qq.com>

2025-09-03 09:18:44 +08:00

layers

[Perf]Enable npu_moe_gating_top_k_softmax on quantized scenarios (#2633 )

2025-09-03 09:14:17 +08:00

moe_dispatcher

[Refactor][MoE] remove redundant code after refactoring fused_moe (#2612 )

2025-08-30 22:28:50 +08:00

__init__.py

[CustomOp] Register RotaryEmbedding instead of overwrite forward (#2385 )

2025-08-25 09:32:35 +08:00

activation.py

[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841 )