sglang/quantization at c66b2c9cf18dcb1aab436c29d0244a2e87f89eee - sglang - Gitea: Git with a cup of tea

EngineX-Hygon/sglang

Files

History

Zhiyu c66b2c9cf1 Add support for nvidia modelopt fp8 kv cache (#3223 )

2025-02-22 07:04:58 +08:00

..

[ROCm] Add additional block quant GEMM tuning configs for AMD GPUs. (#3616 )

2025-02-17 15:54:18 -08:00

__init__.py

Fix deepseek awq v3 (#3450 )

2025-02-12 22:09:52 +08:00

base_config.py

fix black in pre-commit (#1940 )

2024-11-08 07:42:47 +08:00

fp8_kernel.py

Revert "[ROCm] Use tl.range() in block GEMM kernels with `num_stage… (#3632 )

2025-02-17 18:01:21 +08:00

fp8_utils.py

add control for cutlass fp8 blockwise gemm (#3727 )

2025-02-20 16:10:35 +08:00

fp8.py

AMD/ROCm: update AITER repo to ROCm/aiter (#3747 )

2025-02-21 00:18:08 -08:00

int8_kernel.py

Fix quant kernel accuracy issue (#2865 )

2025-01-13 20:32:17 +08:00

modelopt_quant.py

Add support for nvidia modelopt fp8 kv cache (#3223 )

2025-02-22 07:04:58 +08:00

w8a8_int8.py

feat: patch linear base (#2915 )

2025-01-16 18:00:03 +08:00