xc-llm-ascend

Files

Shanshan Shen 93860574bb [ModelRunner][MultiModal] Remove legacy input mapper/processor from V0 (#951 )

### What this PR does / why we need it?
Remove legacy input mapper/processor from V0.

Find more details at
https://github.com/vllm-project/vllm-ascend/issues/673 and
https://github.com/vllm-project/vllm/pull/15686.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
Launch online service:

```bash
vllm serve Qwen/Qwen2.5-VL-7B-Instruct \
--dtype bfloat16 \
--max_model_len 32768 \
--max-num-batched-tokens 32768
```

Query the server:

```bash
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "Qwen/Qwen2.5-VL-7B-Instruct",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},
        {"type": "text", "text": "What is the text in the illustrate?"}
    ]}
    ]
    }'
```

Result:

```bash
{"id":"chatcmpl-619e70733ed148b3be3a0b6524ee0ef3","object":"chat.completion","created":1748226332,"model":"/home/sss/.cache/modelscope/hub/models/Qwen/Qwen2___5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"The text in the illustration reads \"TONGYI Qwen.\"","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"pro
```

Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-06-03 11:32:03 +08:00

__init__.py

port deepseekv2 and mtp to main branch (#429 )

2025-04-19 17:38:18 +08:00

cache_engine.py

support deepseek quant & mix-parallel with graphmode (#585 )

2025-04-23 16:23:25 +08:00

draft_model_runner.py

[CI] upgrade vllm to 0.8.5 (#715 )

2025-04-30 09:15:50 +08:00

model_runner_v1.py

feat: support compile torchair graph while warming up (#839 )