### What this PR does / why we need it?
This PR fixes a bug in Xlite
backend(https://atomgit.com/openeuler/GVirt/issues/3).
This PR adds support for mrope (Mixture-of-RoPE) and deepstack features
in the xlite backend. These features are necessary for running certain
multimodal models that utilize them.
The main changes include:
- Updating `_build_model_config` to parse mrope and deepstack
configurations from the model's `hf_config`.
- Modifying `XliteWrapper.__call__` to handle `deepstack_input_embeds`
and mrope positions during the model forward pass.
- Replacing `ModelAttnMeta` with the newer `AttnMeta` to accommodate the
new metadata fields required by these features.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
online server config:
```
python -m vllm.entrypoints.openai.api_server \
--model /mnt/nvme0n1/models/checkpoint-8200 \
--additional-config='{"xlite_graph_config": {"enabled": true}}' \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.9 \
--max-num-batched-tokens 8192 \
--max-num-seqs=20 \
--block-size 128 \
--max-model-len 8192 \
--trust-remote-code \
--served-model-name Qwen3-VL-8B \
--host localhost \
--generation-config vllm \
--port 6777
```
test_config:
```
vllm bench serve \
--max-concurrency ${maxconcurrency} \
--num-prompts ${num_prompts} \
--host ${HOST} \
--port ${PORT} \
--model ${MODEL_NAME} \
--dataset-name random \
--backend openai-chat \
--random-input-len 512 \
--random-output-len 512 \
--random-range-ratio 0.2 \
--temperature 0.6 \
--metric-percentiles "50,90,99" \
--tokenizer ${TOKENIZER_PATH} \
--endpoint /v1/chat/completions \
--ignore-eos
```
- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
Signed-off-by: LVYANGGUO <lvyangguo@huawei.com>
Co-authored-by: LVYANGGUO <lvyangguo@huawei.com>