xc-llm-ascend

Author	SHA1	Message	Date
Li Wang	83a4065b4b	[CI] Add pre-commit check for patch logger (#7446 ) ### What this PR does / why we need it? See https://github.com/vllm-project/vllm-ascend/pull/7402, pre-commit hook will forbid init_logger(__name__) in vllm_ascend patch modules - vLLM version: v0.17.0 - vLLM main: `8a680463fa` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-19 16:53:20 +08:00
Cao Yi	5ec610e832	[Feature][Quant] Reapply auto-detect quantization format and support remote model ID (#7111 ) ### What this PR does / why we need it? Reapply the auto-detect quantization format feature (originally in #6645, reverted in #6873) and extend it to support remote model identifiers (e.g., `org/model-name`). Changes: - Reapply auto-detection of quantization method from model files (`quant_model_description.json` for ModelSlim, `config.json` for compressed-tensors) - Add `get_model_file()` utility to handle file retrieval from both local paths and remote repos (HuggingFace Hub / ModelScope) - Update `detect_quantization_method()` to accept remote repo IDs with optional `revision` parameter - Update `maybe_update_config()` to work with remote model identifiers - Add platform-level `auto_detect_quantization` support - Add unit tests and e2e tests for both local and remote model ID scenarios Closes #6836 ### Does this PR introduce _any_ user-facing change? Yes. When `--quantization` is not explicitly specified, vllm-ascend will now automatically detect the quantization format from the model files for both local directories and remote model IDs. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-03-13 22:53:25 +08:00
Li Wang	33234aa0c5	Revert "[Feature][Quant] Auto-detect quantization format from model f… (#6873 ) This reverts commit `3953dcf784`. to keep the basic functions available --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-03-10 11:27:32 +08:00
Cao Yi	3953dcf784	[Feature][Quant] Auto-detect quantization format from model files (#6645 ) ## Summary - Add automatic quantization format detection, eliminating the need to manually specify `--quantization` when serving quantized models. - The detection inspects only lightweight JSON files (`quant_model_description.json` and `config.json`) at engine initialization time, with no `.safetensors` reads. - User-explicit `--quantization` flags are always respected; auto-detection only applies when the flag is omitted. ## Details Detection priority: 1. `quant_model_description.json` exists → `quantization="ascend"` (ModelSlim) 2. `config.json` contains `"quant_method": "compressed-tensors"` → `quantization="compressed-tensors"` (LLM-Compressor) 3. Neither → default float behavior Technical approach: Hooked into `NPUPlatform.check_and_update_config()` to run detection after `VllmConfig.__post_init__`. Since `quant_config` is already `None` at that point, we explicitly recreate it via `VllmConfig._get_quantization_config()` to trigger the full quantization initialization pipeline. ## Files Changed \| File \| Description \| \|------\|-------------\| \| `vllm_ascend/quantization/utils.py` \| Added `detect_quantization_method()` and `maybe_auto_detect_quantization()` \| \| `vllm_ascend/platform.py` \| Integrated auto-detection in `check_and_update_config()` \| \| `vllm_ascend/quantization/modelslim_config.py` \| Improved error handling for weight loading \| - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-02-26 10:59:25 +08:00

4 Commits