Files
xc-llm-ascend/requirements.txt
LHXuuu bdc66972db [Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036)
### What this PR does / why we need it?

While using the LLM Compressor quantization tool from the VLLM community
to generate quantized weights, the VLLM Ascend engine needs to be
adapted to support the compressed tensors quantization format.

1. Add AscendCompressedTensorsConfig to replace CompressedTensorsConfig
in vllm.
2. Support CompressedTensorsW8A8 static weight.
- weight: per-channel, int8, symmetric; activation: per-tensor, int8,
symmetric.
4. Support CompressedTensorsW8A8Dynamic weight.
- weight: per-channel, int8, symmetric; activation: per-token, int8,
symmetric, dynamic.
5. Modify the override_quantization_method in AscendQuantConfig.

Co-authored-by: taoqun110 taoqun@huawei.com
Co-authored-by: chenxi-hh chen464822955@163.com

- vLLM version: v0.11.2

---------

Signed-off-by: LHXuuu <scut_xlh@163.com>
Signed-off-by: chenxi-hh <chen464822955@163.com>
Signed-off-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>
Co-authored-by: chenxi-hh <chen464822955@163.com>
Co-authored-by: chenxi-hh <32731611+chenxi-hh@users.noreply.github.com>
2025-11-28 14:09:39 +08:00

34 lines
556 B
Plaintext

# Should be mirrored in pyporject.toml
cmake>=3.26
decorator
einops
numpy<2.0.0
packaging
pip
pybind11
pyyaml
scipy
pandas
setuptools>=64
setuptools-scm>=8
torch==2.7.1
torchvision
wheel
pandas-stubs
opencv-python-headless<=4.11.0.86 # Required to avoid numpy version conflict with vllm
compressed_tensors>=0.11.0
# requirements for disaggregated prefill
msgpack
quart
# Required for N-gram speculative decoding
numba
# Install torch_npu
#--pre
#--extra-index-url https://mirrors.huaweicloud.com/ascend/repos/pypi
torch-npu==2.7.1
transformers<=4.57.1