xc-llm-ascend

Files

NeverRaR c7f1c59911 feat: support compile multiple batch graph (#1085 )

### What this PR does / why we need it?

support compile multiple batch graph with different code object to avoid
cache invalidation

### How was this patch tested?

```
export VLLM_ENABLE_MC2=0
export VLLM_USE_V1=1
export TASK_QUEUE_ENABLE=1

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh

nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM \
    --quantization ascend \
    --served-model-name auto \
    --trust-remote-code \
    --distributed-executor-backend=mp \
    --port 8006 \
    -tp=8 \
    -dp=2 \
    --no-enforce-eager \
    --max-num-seqs 24 \
    --max-model-len 32768 \
    --max-num-batched-tokens 32768 \
    --block-size 128 \
    --no-enable-prefix-caching \
    --additional-config '{"torchair_graph_config": {"enabled": true,"use_cached_graph": true,"graph_batch_sizes": [8,16,24]},"ascend_scheduler_config": {"enabled":true,"chunked_prefill_enabled":false},"expert_tensor_parallel_size":16}' \
    --gpu-memory-utilization 0.95 &> run.log &
disown
```

Signed-off-by: boying <897013703@qq.com>

2025-06-06 20:17:51 +08:00

attention

[Bugfix][DP] Add with_prefill_across_dp to AscendMetadata to fix dp (#1094 )

2025-06-06 19:20:33 +08:00

compilation

[aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836 )

2025-05-29 11:58:26 +08:00

core

[Misc] fix initialize_kv_cache (#1102 )

2025-06-06 16:46:23 +08:00

device_allocator

[bugfix] Improve log level and info for custom ops build (#937 )