### What this PR does / why we need it?
This PR supports W8A8C8 in dsv3.2/glm5 with lightning_indexer_quant ops
in pd-mix stage mainly.
Because the code for the current PD-disaggregated scenario is still
under refactoring and cleanup, this PR prioritizes ensuring the C8
functionality in the pd-mix scenario.
The next steps are planned in two parts:
① Once the optimized scatter operator is updated, we will replace the
original operator to improve the performance of storing k_scale.
② Once the code logic for the PD-disaggregated scenario becomes stable,
we will carry out more comprehensive validation and make appropriate
adaptations.
③ Because enabling C8 currently introduces several new operators whose
performance still needs improvement, performance may regress in some
scenarios. Therefore, only after all the operators are fully ready can
we ensure that this feature does not cause any performance degradation.
At that point, we will enable this feature by default and remove the
switch in `additional_config`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI passed with new added/existing test.
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: rjg-lyh <1318825571@qq.com>
42 lines
1.6 KiB
CMake
42 lines
1.6 KiB
CMake
# This program is free software, you can redistribute it and/or modify it.
|
|
# Copyright (c) 2025 Huawei Technologies Co., Ltd.
|
|
# This file is a part of the CANN Open Software.
|
|
# Licensed under CANN Open Software License Agreement Version 2.0 (the "License").
|
|
# Please refer to the License for details. You may not use this file except in compliance with the License.
|
|
# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
|
|
# See LICENSE in the root of the software repository for the full text of the License.
|
|
# ======================================================================================================================
|
|
|
|
add_ops_compile_options(
|
|
OP_NAME LightningIndexerQuant
|
|
OPTIONS --cce-auto-sync=off
|
|
-Wno-deprecated-declarations
|
|
-Werror
|
|
-mllvm -cce-aicore-hoist-movemask=false
|
|
--op_relocatable_kernel_binary=true
|
|
)
|
|
|
|
set(lightning_indexer_quant_depends transformer/attention/lightning_indexer_quant PARENT_SCOPE)
|
|
|
|
target_sources(op_host_aclnn PRIVATE
|
|
lightning_indexer_quant_def.cpp
|
|
)
|
|
|
|
target_sources(optiling PRIVATE
|
|
lightning_indexer_quant_tiling.cpp
|
|
)
|
|
|
|
if (NOT BUILD_OPEN_PROJECT)
|
|
target_sources(opmaster_ct PRIVATE
|
|
lightning_indexer_quant_tiling.cpp
|
|
)
|
|
endif ()
|
|
|
|
target_include_directories(optiling PRIVATE
|
|
${CMAKE_CURRENT_SOURCE_DIR}/op_host
|
|
)
|
|
|
|
target_sources(opsproto PRIVATE
|
|
lightning_indexer_quant_proto.cpp
|
|
)
|