### What this PR does / why we need it? This patch adds support for the xlite graph wrapper to vllm_ascend. Xlite provides operator implementations of the transformer network on Ascend hardware. For details about xlite, please refer to the following link: https://gitee.com/openeuler/GVirt/blob/master/xlite/README.md The latest performance comparison data between xlite and the default aclgraph mode is as follows: ## Qwen3 32B TPS 910B3(A2) Online Inference Performance Comparison - aclgraph: main(c4a71fc6) - xlite-full: main(c4a71fc6) + xlite-full - xlite-decode-only: main(c4a71fc6) + xlite-decode-only - diff1: Performance comparison between xlite-full and aclgraph - diff2: Performance comparison between xlite-decode-only and aclgraph ### Does this PR introduce _any_ user-facing change? Enable the xlite graph mode by setting xlite_graph_config: --additional-config='{"xlite_graph_config": {"enabled": true}}' # Enabled for decode only --additional-config='{"xlite_graph_config": {"enabled": true, "full_mode": true}}' # Enabled for prefill and decode - vLLM version: v0.12.0 - vLLM main:ad32e3e19c--------- Signed-off-by: lulina <lina.lulina@huawei.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
24 lines
456 B
Plaintext
24 lines
456 B
Plaintext
-r requirements-lint.txt
|
|
-r requirements.txt
|
|
modelscope
|
|
openai
|
|
pytest >= 6.0,<9.0.0
|
|
pytest-asyncio
|
|
pytest-mock
|
|
lm-eval[api] @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d
|
|
types-jsonschema
|
|
xgrammar
|
|
zmq
|
|
types-psutil
|
|
pytest-cov
|
|
regex
|
|
sentence_transformers
|
|
ray>=2.47.1,<=2.48.0
|
|
protobuf>3.20.0
|
|
librosa
|
|
soundfile
|
|
pytest_mock
|
|
msserviceprofiler>=1.2.2
|
|
mindstudio-probe>=8.3.0
|
|
arctic-inference==0.1.1
|
|
xlite |