### What this PR does / why we need it?
Support KV-Sharing feature in CLA (cross layer attention) models, which
sharing kv cache in some layers.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
27 lines
473 B
Plaintext
27 lines
473 B
Plaintext
-r requirements-lint.txt
|
|
-r requirements.txt
|
|
modelscope
|
|
openai
|
|
pytest >= 6.0,<9.0.0
|
|
pytest-asyncio
|
|
pytest-mock
|
|
lm-eval[api] @ git+https://github.com/EleutherAI/lm-evaluation-harness.git@206b7722158f58c35b7ffcd53b035fdbdda5126d
|
|
types-jsonschema
|
|
xgrammar
|
|
zmq
|
|
types-psutil
|
|
pytest-cov
|
|
regex
|
|
sentence_transformers
|
|
ray>=2.47.1,<=2.48.0
|
|
protobuf>3.20.0
|
|
librosa
|
|
soundfile
|
|
pytest_mock
|
|
msserviceprofiler>=1.2.2
|
|
mindstudio-probe>=8.3.0
|
|
arctic-inference==0.1.1
|
|
xlite
|
|
uc-manager
|
|
timm
|