xc-llm-ascend

Author	SHA1	Message	Date
bazingazhou233-hub	c69291eefc	[Doc] Add USE_MODELSCOPE_HUB=0 to lm-eval guide (#7279 ) ## Summary - Add `USE_MODELSCOPE_HUB=0` to both Online and Offline lm-eval sections - Add explanatory notes about Docker containers launching with `VLLM_USE_MODELSCOPE=True` The Docker containers set `VLLM_USE_MODELSCOPE=True`, which causes lm-eval to download datasets from ModelScope instead of HuggingFace, resulting in "Repo not exists" errors. Setting `USE_MODELSCOPE_HUB=0` disables this behavior. Fixes #607 - vLLM version: v0.17.0 - vLLM main: `4034c3d32e` Signed-off-by: bazingazhou233-hub <bazingazhou233-hub@users.noreply.github.com> Co-authored-by: bazingazhou233-hub <bazingazhou233-hub@users.noreply.github.com>	2026-03-14 22:41:02 +08:00
herizhen	e5024d0264	[doc] Add Ascend PyTorch Profiler section (#7117 ) ### What this PR does / why we need it? add Ascend PyTorch Profiler section ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Documentation Format Checks Technical Content Validation Build Verification Version Compatibility - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: herizhen <1270637059@qq.com>	2026-03-12 15:51:00 +08:00
Frank Chen	14c71b19e1	[Doc][CPU binding] Add user/developer guide for CPU binding (#7045 ) ### What this PR does / why we need it? This PR adds comprehensive documentation for the CPU binding feature on Ascend NPUs. It includes: - A detailed developer guide (`docs/source/developer_guide/feature_guide/cpu_binding.md`) covering the design, internal logic, allocation examples, and troubleshooting for the CPU binding mechanism. - A concise user guide (`docs/source/user_guide/feature_guide/cpu_binding.md`) explaining the core concepts, usage, and common issues for end-users. - An update to `additional_config.md` to use consistent terminology for binding strategies (`global-slicing` and `topo-affinity`). This documentation is needed to help both developers and users understand, use, and debug the CPU binding feature, which is critical for performance on ARM+Ascend platforms. ### Does this PR introduce _any_ user-facing change? No. This is a documentation-only update. ### How was this patch tested? The documentation has been reviewed for clarity and technical accuracy. The examples and descriptions align with the implementation in `vllm_ascend/cpu_binding.py`. - vLLM version: v0.16.0 - vLLM main: `4034c3d32e` --------- Signed-off-by: chenchuw886 <chenchuw@huawei.com> Signed-off-by: c00818886 <chenchuwei@huawei.com> Co-authored-by: chenchuw886 <chenchuw@huawei.com>	2026-03-10 15:59:31 +08:00
NJX	c7fd7a25f7	[Doc][Misc] Fix msprobe_guide.md documentation issues (#6965 ) ## What this PR does / why we need it? Fixes several documentation issues in the msprobe debugging guide as reported in #6065: 1. Remove unnecessary `cat` heredoc wrapper: The example configuration section used a `cat <<'JSON'` bash wrapper around the JSON config. Simplified to a plain JSON code block. 2. Fix duplicate chapter numbering: Two sections were both numbered '2'. Renumbered sections sequentially (0-6). 3. Fix msprobe command: Changed `msprobe graph_visualize` to `msprobe -f pytorch graph` in section 5.2 Visualization. 4. Remove backward-related content: Since vllm is inference-only (no training), removed all backward pass references including backward tensor examples, parameter gradient examples, and backward descriptions from dump.json explanations. ## Does this PR introduce _any_ user-facing change? Documentation improvement only. No code changes. ## How was this patch tested? Manual review of the markdown file to verify all 4 issues from #6065 are addressed. Closes #6065 - vLLM version: v0.16.0 - vLLM main: `15d76f74e2` Signed-off-by: NJX-njx <3771829673@qq.com>	2026-03-04 10:28:31 +08:00
wangxiyuan	a95c0b8b82	[Doc] fix the nit in docs (#6826 ) Refresh the doc, fix the nit in the docs - vLLM version: v0.15.0 - vLLM main: `83b47f67b1` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-27 11:50:27 +08:00
Cao Yi	6de207de88	[main][Docs] Fix typos across documentation (#6728 ) ## Summary Fix typos and improve grammar consistency across 50 documentation files. ### Changes include: - Spelling corrections (e.g., "Facotory" → "Factory", "certainty" → "determinism") - Grammar improvements (e.g., "multi-thread" → "multi-threaded", "re-routed" → "re-run") - Punctuation fixes (semicolon consistency in filter parameters) - Code style fixes (correct flag name `--num-prompts` instead of `--num-prompt`) - Capitalization consistency (e.g., "python" → "Python", "ascend" → "Ascend") - vLLM version: v0.15.0 - vLLM main: `9562912cea` --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-02-13 15:50:05 +08:00
Cao Yi	1c7d1163f5	[main][Docs] Fix spelling errors across documentation (#6649 ) Fix various spelling mistakes in the project documentation to improve clarity and correctness. - vLLM version: v0.15.0 - vLLM main: `d7e17aaacd` --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-02-10 11:14:57 +08:00
wangxiyuan	b4aafd4293	[Core][Misc] Clean up ProfileExecuteDuration (#6461 ) ### What this PR does / why we need it? This PR removes the custom `ProfileExecuteDuration` utility and its usages across the codebase. This utility was used for profiling execution duration of different stages in the inference process. It is replaced by the standard `vllm.v1.utils.record_function_or_nullcontext`, which integrates with PyTorch's profiler. This change simplifies the code by removing a custom implementation in favor of an upstream utility, improving maintainability. Associated documentation and tests for `ProfileExecuteDuration` are also removed. ### Does this PR introduce _any_ user-facing change? `VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE` env is removed now. ### How was this patch tested? CI passed. The changes are a cleanup and replacement with a standard utility. Existing tests cover the functionality. The removed feature had its own tests which are also removed. Related RFC: #5304 - vLLM version: v0.14.1 - vLLM main: `dc917cceb8` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-02-01 20:06:01 +08:00
ChenCangtao	46cee945b3	[doc][npugraph_ex]add npugraph_ex introduction doc (#6306 ) ### What this PR does / why we need it? As part of the preparation work for the [RFC](https://github.com/vllm-project/vllm-ascend/issues/6214) We have added a documentation about npugraph_ex, which mainly explains and introduces its usage and FX graph optimization. The introduction to FX graph optimization also includes specific explanations of the default passes, the implementation methods for custom fusion passes, and how to capture the FX graph during the optimization process through environment variable configuration. --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>	2026-01-30 11:21:37 +08:00
Li Wang	c26ad78f86	[CI][lint] Add rule `codespell` back (#6236 ) ### What this PR does / why we need it? After removing codepsell a while, we discovered that typo had a problem correctly recognizing certain misspelled words, so I suggested adding it back. - vLLM version: v0.14.1 - vLLM main: `d68209402d` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-26 14:12:33 +08:00
Shanshan Shen	e3eefdecbd	[Doc] Update `max_tokens` to `max_completion_tokens` in all docs (#6248 ) ### What this PR does / why we need it? Fix: ``` DeprecationWarning: max_tokens is deprecated in favor of the max_completion_tokens field. ``` - vLLM version: v0.14.1 - vLLM main: `d68209402d` Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-26 11:57:40 +08:00
Cao Yi	a69ef10c3a	[Refactor] Quantization Module Refactor (#5738 ) ### Summary This PR refactors the `vllm_ascend/quantization` module to improve code organization, maintainability, and extensibility. The refactoring introduces a clear separation of concerns with a registry-based scheme discovery pattern, abstract base classes for quantization schemes, and dedicated wrapper classes. ### Key Changes #### 1. Modular Directory Structure \| Before \| After \| \|--------\|-------\| \| Flat file structure with mixed responsibilities \| Organized into `methods/` subpackage for schemes \| \| Single `quant_config.py` (600+ lines) \| Separate config files: `modelslim_config.py`, `compressed_tensors_config.py` \| \| `utils.py` with scheme lookup logic \| `methods/registry.py` with decorator-based registration \| #### 2. Registry-Based Scheme Discovery Replaced hardcoded `ASCEND_QUANTIZATION_METHOD_MAP` dictionary with a decorator-based registry pattern: ```python # Before: Manual dictionary mapping ASCEND_QUANTIZATION_METHOD_MAP = { "W8A8_DYNAMIC": {"linear": AscendW8A8DynamicLinearMethod, ...}, ... } # After: Decorator-based registration @register_scheme("W8A8_DYNAMIC", "linear") class AscendW8A8DynamicLinearMethod(AscendLinearScheme): ... ``` #### 3. Abstract Base Classes Introduced three abstract base classes in `methods/base.py`: - `AscendLinearScheme` - Base for linear layer quantization - `AscendMoEScheme` - Base for MoE layer quantization - `AscendAttentionScheme` - Base for attention layer quantization #### 4. Separated Config and Wrapper Classes - Config classes (`AscendModelSlimConfig`, `AscendCompressedTensorsConfig`): Handle config parsing and scheme selection - Wrapper classes (`AscendLinearMethod`, `AscendFusedMoEMethod`, etc.): Implement vLLM interfaces and delegate to schemes #### 5. Cleaner Public API ```python # New clean module interface from vllm_ascend.quantization import ( AscendModelSlimConfig, AscendCompressedTensorsConfig, ) from vllm_ascend.quantization.methods import get_scheme_class ``` ### Architecture Diagram ```mermaid classDiagram direction TB class QuantizationConfig { <<vLLM Interface>> +get_quant_method() } class AscendModelSlimConfig { +quant_description +get_quant_method() -create_scheme_for_layer() } class AscendCompressedTensorsConfig { +target_scheme_map +get_quant_method() -_get_scheme_from_parts() } class AscendLinearMethod { <<Wrapper>> +quant_method: AscendLinearScheme +create_weights() +apply() } class AscendFusedMoEMethod { <<Wrapper>> +quant_method: AscendMoEScheme +create_weights() +apply() } class AscendLinearScheme { <<Abstract>> +get_weight()* +apply()* +get_pertensor_param() +get_perchannel_param() } class AscendMoEScheme { <<Abstract>> +get_weight()* +get_dynamic_quant_param()* +apply()* } class W8A8DynamicLinear { +get_weight() +apply() } class W8A8DynamicMoE { +get_weight() +apply() } QuantizationConfig <\|-- AscendModelSlimConfig QuantizationConfig <\|-- AscendCompressedTensorsConfig AscendModelSlimConfig ..> AscendLinearMethod : creates AscendModelSlimConfig ..> AscendFusedMoEMethod : creates AscendCompressedTensorsConfig ..> AscendLinearMethod : creates AscendCompressedTensorsConfig ..> AscendFusedMoEMethod : creates AscendLinearMethod o-- AscendLinearScheme : delegates to AscendFusedMoEMethod o-- AscendMoEScheme : delegates to AscendLinearScheme <\|-- W8A8DynamicLinear AscendMoEScheme <\|-- W8A8DynamicMoE ``` ### Scheme Registration Flow ```mermaid sequenceDiagram participant Module as Scheme Module participant Registry as _SCHEME_REGISTRY participant Config as QuantConfig participant Wrapper as Wrapper Class Note over Module: At import time Module->>Registry: @register_scheme("W8A8_DYNAMIC", "linear") Registry->>Registry: Store (quant_type, layer_type) -> Class Note over Config: At runtime Config->>Config: Determine quant_type from description Config->>Registry: get_scheme_class(quant_type, layer_type) Registry-->>Config: Return scheme class Config->>Config: scheme = scheme_cls() Config->>Wrapper: Create wrapper with scheme Wrapper-->>Config: Return wrapper instance ``` ### File Changes Summary \| Original Files \| Refactored Files \| \|----------------\|------------------\| \| `__init__.py` (empty) \| `__init__.py` (exports public API) \| \| `quant_config.py` \| `modelslim_config.py` + `wrappers.py` \| \| `compressed_tensors/` \| `compressed_tensors_config.py` \| \| `utils.py` \| `methods/registry.py` \| \| `w8a8_dynamic.py` \| `methods/w8a8_dynamic.py` \| \| `w8a8.py` \| `methods/w8a8_static.py` \| \| `w4a4_flatquant_dynamic.py` \| `methods/w4a4_flatquant.py` \| \| ... \| `methods/base.py` (new) \| ### Benefits 1. Extensibility: Adding new quantization schemes only requires implementing the base class and adding `@register_scheme` decorator 2. Maintainability: Clear separation between config parsing, wrapper logic, and scheme implementation 3. Testability: Abstract base classes enable easier unit testing and mocking 4. Discoverability: Registry pattern makes it easy to list all supported schemes 5. Reduced Coupling: Config classes no longer need to know about all scheme implementations ___ - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` --------- Signed-off-by: SlightwindSec <slightwindsec@gmail.com>	2026-01-23 14:13:47 +08:00
Qiu	38cfcd572a	[doc](cp) correct the prefill of GQA and adjust desc of block table. (#5697 ) ### What this PR does / why we need it? correct the seq length of KV for prefill of GQA and clarify the desc of block table distribution in developer guide. - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` --------- Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>	2026-01-19 18:53:48 +08:00
LI SHENGYONG	83de5385b4	[EPLB][Bugfix] policy_swift_balancer bugfix and renaming (#5897 ) ### What this PR does / why we need it? 1. Rename dynamic_ep to default_eplb. 2. Rename dynamic_ep_v2 to swift_balancer 3. Discard func compose_expert_update_info_bipartite. - vLLM version: v0.13.0 - vLLM main: `bde38c11df` Signed-off-by: shenchuxiaofugui <1311027364@qq.com>	2026-01-19 05:47:40 +00:00
Li Wang	c4fde5c064	[Doc] Upgrade outdated ut doc (#5937 ) ### What this PR does / why we need it? For cpu env, we should set `SOC_VERSION` to mock different NPU chips for different compilation paths ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `11b6af5280` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-19 09:12:46 +08:00
LI SHENGYONG	da958ee386	[EPLB]Eplb Config Renaming (#5533 ) ### What this PR does / why we need it? 1. Rename num_iterations_eplb_update to expert_heat_collection_interval. 2. Rename num_wait_worker_iterations to algorithm_execution_interval. 3. Rename init_redundancy_expert to num_redundant_experts because the variable with the same meaning in vLLM is named this way. 4. Delete gate_eplb because we don't need this feature. 5. Move eplb config into a dict in additional config. 6. Depend on pr5817 ### Does this PR introduce _any_ user-facing change? before this pr： `--additional-config '{"dynamic_eplb":true, "num_iterations_eplb_update": 4000, "num_wait_worker_iterations": 150, "init_redundancy_expert": 16, "expert_map_path": "xxx.json"}'` after this pr: `--additional-config '{"eplb_config":{"dynamic_eplb":true,"expert_heat_collection_interval":4000, "algorithm_execution_interval":150,"num_redundant_experts": 16, "expert_map_path": "xxx.json"}}'` ### How was this patch tested? #### test qwen3-235b eplb num_redundant_experts=16 without pr5817 \| dataset \| version \| metric \| mode \| vllm-api-general-chat \| \|----- \| ----- \| ----- \| ----- \| -----\| \| aime2024 \| 604a78 \| accuracy \| gen \| 83.33 \| with pr5817 \| dataset \| version \| metric \| mode \| vllm-api-general-chat \| \|----- \| ----- \| ----- \| ----- \| -----\| \| aime2024 \| 604a78 \| accuracy \| gen \| 86.67 \| - vLLM version: v0.13.0 - vLLM main: `45c1ca1ca1` Signed-off-by: shenchuxiaofugui <1311027364@qq.com>	2026-01-15 10:26:44 +08:00
SILONG ZENG	4811ba62e0	[Lint]Style: reformat markdown files via markdownlint (#5884 ) ### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: `bde38c11df` --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>	2026-01-15 09:06:01 +08:00
lty	295018ec0f	[Refactor]Refactor of vllm_ascend/distributed module (#5719 ) ### What this PR does / why we need it? Based on the RFC:https://github.com/vllm-project/vllm-ascend/issues/5604 This PR is a refactoring of vllm_ascend/distributed, moving all kv_transfer realtaed codes into a dedicated folder, which has already been done in vLLM ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` --------- Signed-off-by: lty <linhebiwen@gmail.com>	2026-01-15 08:57:40 +08:00
wangxiyuan	354ee3b330	[Doc] Update doc url link (#5781 ) Drop `dev` suffix for doc url. Rename url to `https://docs.vllm.ai/projects/ascend` - vLLM version: v0.13.0 - vLLM main: `2f4e6548ef` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-12 11:21:31 +08:00
wangxiyuan	6f7a81cd9f	[CI] cleanup single/multi-card test (#5623 ) 1. speed up e2e light test. 2. create `2-cards` and `4-cards` folder in multicard 3. move ops to nightly 4. run test in Alphabetical Order - vLLM version: v0.13.0 - vLLM main: `8be6432bda` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-07 14:13:34 +08:00
lilinsiman	52863c4165	[Refactor][EAGLE] 2/N: load model and generate token (#5437 ) ### What this PR does / why we need it? 1. Refactor eagle and mtp function: load_model and generate_token_ids 2. Remove redundant code in mtp and eagle file 3. Refactor the UT of file 2/N of Refactor and merge mtp and eagle Relational RFC: https://github.com/vllm-project/vllm-ascend/issues/5467 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut and tests - vLLM version: release/v0.13.0 - vLLM main: `81786c8774` --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2026-01-05 14:07:54 +08:00
L4	c23cf30709	[Doc] eval-type not support service but server (#2920 ) ### What this PR does / why we need it? fix wrong eval-type in accuracy doc - vLLM version: v0.10.2 - vLLM main: `fec347dee1` Signed-off-by: root <root@liaolile-laptop.localdomain> Co-authored-by: root <root@liaolile-laptop.localdomain>	2026-01-05 11:17:39 +08:00
InSec	7cf65d0581	[Doc]modify the quantization user guide and add a quantization adaptation developer guide (#5554 ) ### What this PR does / why we need it? This PR makes the following modifications: 1.delete the `user_guide/feature_guide/quantization-llm-compressor.md` and merge it into `user_guide/feature_guide/quantization.md`. 2.update the content of `user_guide/feature_guide/quantization.md`. 3.add guidance `developer_guide/feature_guide/quantization.md' on the adaptation of quantization algorithms and quantized models. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: `7157596103` --------- Signed-off-by: IncSec <1790766300@qq.com> Signed-off-by: InSec <1790766300@qq.com>	2026-01-05 09:12:11 +08:00
Li Wang	2ee17e50a1	[2/N] Upgrade nightly doc (#5534 ) ### What this PR does / why we need it? Follow up https://github.com/vllm-project/vllm-ascend/pull/5479, upgrade the corresponding doc for developers - vLLM version: v0.13.0 - vLLM main: `45c1ca1ca1` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-31 09:11:42 +08:00
zhangsicheng5	8ed87dfa84	[doc] Add context parallel user guide (#5358 ) 1. Add context parallel user guide 2. Add context parallel related message in supported features/models - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>	2025-12-26 17:03:47 +08:00
Qiu	da0b113cf5	[doc]<PCP&DCP> add developer guide for PCP&DCP (#5372 ) ### What this PR does / why we need it? add developer guide for PCP&DCP - vLLM version: release/v0.13.0 - vLLM main: `bc0a5a0c08` Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-12-26 16:17:38 +08:00
wangxiyuan	29d2fe653d	cleanup ascend config (#5296 ) 1. refresh additional config doc 2. move kv config logic to platform. 3. improve `dump_config` init logic and rename it to `dump_config_path` this change is user impacted. dump_config is changed from dict to string. 4. correct `enable_async_exponential` type 5. remove useless `chunked_prefill_for_mla` - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-26 14:07:37 +08:00
Li Wang	5ab6d124e5	[Doc] Add a perf tune section (#5127 ) ### What this PR does / why we need it? This patch purpose to 1. add a section on os point of perf tune doc 2. Set some default env in the image for performance - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-19 14:52:52 +08:00
Li Wang	7d32371b7e	[Doc] Refact benchmark doc (#5173 ) ### What this PR does / why we need it? Refactor some outdated doc - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-12-18 22:26:13 +08:00
Ronald	b69b04d3a9	implement model runner v2 basic framework (#5051 ) ### What this PR does / why we need it? This PR aim to implement model runner v2 basic framework in vllm-ascend, the e2e function is not guaranteed by this pr. ### Does this PR introduce _any_ user-facing change? use envs.VLLM_USE_V2_MODEL_RUNNER to decide if choose model_runenr_v2. ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>	2025-12-18 15:51:54 +08:00
wangxiyuan	8090914d69	[CI] CI refactor (#4928 ) 1. rename workflow to better name 2. fix lint error 3. remove accuracy report doc and test - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-14 11:09:56 +08:00
wangxiyuan	e538fa6f9c	[Doc] Update tutorial index (#4920 ) Update tutorial index and remove useless doc - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-11 20:53:13 +08:00
zhangyiming	c95c271538	[E2E] Optimize nightly testcase. (#4886 ) ### What this PR does / why we need it? Optimize nightly testcase. Changes: - tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A3B.yaml: Add accuracy and performance benchmark - tests/e2e/models/configs/Qwen3-8B-Base.yaml: Delete - tests/e2e/models/configs/internlm-7b.yaml: Change to internlm3-8b-instruct - tests/e2e/nightly/models/test_deepseek_r1_w8a8_eplb.py: Change to DeepSeek-R1-0528-W8A8 model - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: menogrey <1299267905@qq.com>	2025-12-11 10:15:39 +08:00
zhangyiming	66b0781840	[E2E] Refactor the e2e testcases. (#4789 ) ### What this PR does / why we need it? Refactor the e2e testcases. - tests/e2e/multicard/test_weight_loader.py: Remove the unused code. - tests/e2e/singlecard/multi-modal/test_internvl.py: Move to accuracy test. - tests/e2e/singlecard/test_aclgraph.py: Rename the file. - tests/e2e/singlecard/test_embedding_aclgraph.py : Combine with tests/e2e/singlecard/test_bge_model.py - tests/e2e/singlecard/test_completion_with_prompt_embeds.py: Delete eager mode and modify model to Qwen3-0.6B - tests/e2e/singlecard/test_quantization.py: Modify model to Qwen3-0.6B-W8A8 - tests/e2e/singlecard/test_vlm.py: Modify model to Qwen3-VL-8B - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: menogrey <1299267905@qq.com>	2025-12-11 10:15:00 +08:00
Nengjun Ma	0eefbe75b6	[Doc] Add local running multi-node nightly test case guide (#4884 ) ### What this PR does / why we need it? Add local running multi-node nightly test case guide, help running locally at developer env. ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? Test with local running multi-node test. Using this document can successfully start multi-node night e2e in locall - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-12-11 08:56:27 +08:00
wangxiyuan	835b4c8f1d	Drop torchair (#4814 ) aclgraph is stable and fast now. Let's drop torchair graph mode now. TODO: some logic to adapt torchair should be cleaned up as well. We'll do it in the following PR. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-12-10 09:20:40 +08:00
herizhen	bb1610dc25	add hyperlink (#4588 ) ### What this PR does / why we need it? add hyperlink ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.2 --------- Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-12-02 14:09:03 +08:00
Chenxi Qian	554f16ae1f	[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 ) ### What this PR does / why we need it? This PR introduces support for adding custom CANN `aclnn` ops to `vllm-ascend`, allowing users to define and use their own custom operators. Key changes include: - Building and installing custom ops into the `vllm-ascend`-specified directory - Binding the `aclnn` op interface to the `torch.ops._C_ascend` module - Enabling invocation of these ops within `vllm-ascend` This PR includes a sample custom op: `aclnnGroupedMatmulSwigluQuantWeightNzTensorList`, which is adapted from the CANN operator [`aclnnGroupedMatmulSwigluQuantWeightNZ`](https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/API/aolapi/context/aclnnGroupedMatmulSwigluQuantWeightNZ.md). Its input parameters `weight` and `weight_scale` now accept `list[torch.Tensor]` (i.e., `at::TensorList`). ### Does this PR introduce _any_ user-facing change? No. - vLLM version: v0.11.2 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>	2025-11-28 18:06:39 +08:00
herizhen	3199fe8350	[Doc]Delete equals sign (#4537 ) ### What this PR does / why we need it? Delete equals sign in doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-28 17:09:26 +08:00
SILONG ZENG	ab37a7d5ae	[main]Upgrade cann to 8.3rc2 (#4350 ) ### What this PR does / why we need it? Upgrade cann to 8.3rc2 ### Does this PR introduce _any_ user-facing change? Yes, docker image will use 8.3.RC2 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: MrZ20 <2609716663@qq.com>	2025-11-28 14:06:01 +08:00
herizhen	e945e91933	Document error correction (#4422 ) ### What this PR does / why we need it? The "g" at the beginning of the current sentence is redundant and needs to be deleted "MindIE Turbo" is no longer required to be displayed. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM main: `2918c1b49c` --------- Signed-off-by: herizhen <you@example.com> Co-authored-by: herizhen <you@example.com>	2025-11-25 14:21:13 +08:00
Tjh-UKN	00ea61ec88	[feature] vllm-ascend support msprobe (eager mode dump) (#4241 ) ### What this PR does / why we need it? vllm-ascend need to dump data during model execution to debug some precision problems, here msprobe provide the corresponding abilities, so msprobe will join vllm-ascend to make debug easier ### Does this PR introduce _any_ user-facing change? ``` 'dump_config': '/path/to/config.json' ``` - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: Tjh-UKN <2559659915@qq.com>	2025-11-24 21:58:31 +08:00
Canlin Guo	d5fef22149	[Docs] Improve the AISBench multi-modal testing docs (#4255 ) ### What this PR does / why we need it? Add some of the pitfalls I ran into when using AISBench to test multi-modal models. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-19 16:00:39 +08:00
thonean	e38fe92f40	[Misc][Doc] Add service profiling feature with user guide (#3756 ) ### What this PR does / why we need it? To support the data collection capabilities of the msServiceProfiler on vLLM-ascned framework and enable customization of data collection points via configuration file, a default profiling configuration has been added to vllm-ascend, facilitating debugging and optimization for developers and users. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: minghangc <29514143@qq.com>	2025-11-12 09:07:14 +08:00
wangxiaoteng888	b1a00e0512	[docs] [P/D] add feature guide for disaggregated-prefill (#3950 ) ### What this PR does / why we need it? add feature guide for disaggregated-prefill ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by ci - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com> Co-authored-by: liziyu <liziyu16@huawei.com>	2025-11-10 09:31:30 +08:00
lilinsiman	a3ff765c65	[Info][main] Corrected the errors in the information (#4055 ) ### What this PR does / why we need it? Corrected the errors in the information ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-08 18:48:59 +08:00
offline893	5cff3069f4	[Doc]Add developer guide of eplb. (#3759 ) ### What this PR does / why we need it? Add developer guide of eplb - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: offline0806 <3337230449@qq.com> Co-authored-by: offline0806 <3337230449@qq.com>	2025-11-05 18:35:41 +08:00
pz1116	e0c23cb011	[docs] Add kv pool developer guide (#3752 ) ### What this PR does / why we need it? Add kv pool developer guide ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: Pz1116 <zpbzpb123123@gmail.com> Signed-off-by: pz1116 <zpbzpb123123@gmail.com>	2025-11-05 18:03:36 +08:00
zouyida2052	1ba158567c	[Doc] add mtp doc (#3770 ) ### What this PR does / why we need it? add mtp develop doc - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: zouyida2052 <zouyida2002@gmail.com>	2025-11-05 16:38:35 +08:00
zzzzwwjj	46d5a77688	[docs] add aclgraph developer guide (#3683 ) ### What this PR does / why we need it? Add aclgraph developer guide. - vLLM version: v0.11.0 - vLLM main: `83f478bb19` Signed-off-by: zzzzwwjj <1183291235@qq.com>	2025-11-05 10:34:28 +08:00

1 2 3

109 Commits