Files

HandH1998 4065248214 Support Llama4 fp8 inference (#5194 )

Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>

2025-04-09 20:14:34 +08:00

schemes

Support compressed tensors fp8w8a8 (#4743 )

2025-03-26 13:21:25 -07:00

__init__.py

Support compressed tensors fp8w8a8 (#4743 )

2025-03-26 13:21:25 -07:00

compressed_tensors_moe.py

Support Llama4 fp8 inference (#5194 )

2025-04-09 20:14:34 +08:00

compressed_tensors.py

Support Llama4 fp8 inference (#5194 )

2025-04-09 20:14:34 +08:00

README.md

Support compressed tensors fp8w8a8 (#4743 )

2025-03-26 13:21:25 -07:00

utils.py

Support compressed tensors fp8w8a8 (#4743 )

2025-03-26 13:21:25 -07:00

README.md

quantization compressed_tensors module

To support compressed_tensors format quantization models, we adapted https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/quantization/compressed_tensors into SGLang.

For practical purposes, we have only applied the compressed_tensors format of w8a8_fp8. If you have requirements for other formats, you can submit an issue through this link.