[2/N][Refactor][Quantization] clean quantization patch (#2785)

### What this PR does / why we need it? quantization patch is unused code ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? tested by CI - vLLM version: v0.10.1.1 - vLLM main: f4962a6d55 Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-09-08 17:31:53 +08:00
parent cd88f89267
commit d51694a77b
4 changed files with 2 additions and 456 deletions
--- a/vllm_ascend/ops/vocab_parallel_embedding.py
+++ b/vllm_ascend/ops/vocab_parallel_embedding.py
@@ -97,6 +97,7 @@ class AscendVocabParallelEmbedding(VocabParallelEmbedding):

        if params_dtype is None:
            params_dtype = torch.get_default_dtype()
+        self.params_dtype = params_dtype
        # Divide the weight matrix along the vocaburaly dimension.
        self.num_added_embeddings = self.num_embeddings - self.org_vocab_size
        self.num_embeddings_per_partition = divide(self.num_embeddings_padded,