Files
enginex-mthreads-vllm/csrc/quantization/w8a8/per_token_group_quant_8bit.h
xiezhongtao 2bd9bd4cc2 refactor: 统一硬件相关头文件引用
将分散在各文件中的CUDA/HIP/MUSA硬件相关头文件引用统一到vendors目录下的对应头文件中,提高代码可维护性。移除重复的头文件引用,优化构建配置。
2026-01-20 10:14:31 +08:00

8 lines
451 B
C

#pragma once
#include "../../vendors/functions.h"
// 8-bit per-token-group quantization helper used by both FP8 and INT8
void per_token_group_quant_8bit(const torch::Tensor& input,
torch::Tensor& output_q,
torch::Tensor& output_s, int64_t group_size,
double eps, double min_8bit, double max_8bit,
bool scale_ue8m0 = false);