Commit Graph

  • 052cc4e61b [Docs] Fix GLM-5 deploy command (#6711) Canlin Guo 2026-02-12 08:55:48 +08:00
  • a0315f6697 [npugraph_ex]enable npugraph_ex by default (#6664) iiiklw 2026-02-12 08:44:06 +08:00
  • b86ea66b0a [doc]add GLM5.md (#6709) rika 2026-02-12 04:00:40 +08:00
  • ff3a50d011 [Model] GLM5 adaptation (#6642) yydyzr 2026-02-11 22:22:22 +08:00
  • 140fcaffc3 [Bugfix] Update target probs to target logits in rejection sample (#6685) Zetong Li 2026-02-11 21:31:40 +08:00
  • c0c2eb614e [Main][Ops] Make triton rope support index_selecting from cos_sin_cache (#5450) Angazenn 2026-02-11 21:20:53 +08:00
  • 6bc44bf49b [CI]fix nightly multi node test error for wait for pod ready (#6675) SILONG ZENG 2026-02-11 18:11:00 +08:00
  • 88773bb101 [main to main] upgrade main 0210 (#6673) Icey 2026-02-11 18:10:14 +08:00
  • 53b494b1e4 [main][Quant] Remove unused rotation functions and parameters from W4A4 LAOS quantization (#6648) Cao Yi 2026-02-11 16:38:45 +08:00
  • bb73478c00 [Test][BugFix] Fix torch.rand usage in triton penalty test (#6680) whx 2026-02-11 16:31:49 +08:00
  • 0c1cfa2bac Add Worker Interface:check_health (#6681) luomin2005 2026-02-11 15:24:48 +08:00
  • 389030a8f8 add env vars & misc v0.11.0 starkwj 2026-02-11 06:27:58 +00:00
  • 02886e2641 [Feat] 310p support MoE W8A8 quantizaition (#6641) pu-zhe 2026-02-10 17:17:44 +08:00
  • 1eb07986bf [TEST]add a qwen3-30b acc case with mooncake mempool (#6244) jiangyunfan1 2026-02-10 16:26:55 +08:00
  • 7cf285a77a [MOE Refactor] Remove QuantType in prepare_finalize.py (#6534) LI SHENGYONG 2026-02-10 15:59:58 +08:00
  • 34eecacace [EPLB] Avoiding eplb's dependency on a specified model (#6528) LI SHENGYONG 2026-02-10 15:58:44 +08:00
  • 7d4833bce9 [Doc][Misc] Restructure tutorial documentation (#6501) wangxiyuan 2026-02-10 15:03:35 +08:00
  • 77305df398 implement batch invariant with ascendc (#6590) Ronald 2026-02-10 14:15:26 +08:00
  • 66b60c9440 [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (#6629) Nengjun Ma 2026-02-10 14:14:37 +08:00
  • 2a826b5fad [Misc] upgrade to vllm main (#6646) wangxiyuan 2026-02-10 14:08:59 +08:00
  • 1c7d1163f5 [main][Docs] Fix spelling errors across documentation (#6649) Cao Yi 2026-02-10 11:14:57 +08:00
  • 5b8e47cb68 [bugfix]Fix no attribute 'data' when MLAPO is enable (#6601) meihanc 2026-02-10 09:04:32 +08:00
  • 905f0764e0 [DOC]Add Memcache Usage Guide (#6476) DreamerLeader 2026-02-09 21:55:00 +08:00
  • 9564c6bb5d [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (#6606) lilinsiman 2026-02-09 21:33:58 +08:00
  • 8d44ddacb0 [Test][LoRA] Add e2e test for base model inference (#6624) yupeng 2026-02-09 21:06:49 +08:00
  • 156976b982 [refactor]Optimized the kvcache usage of Deepseek v3.2 (#6610) Wang Kunpeng 2026-02-09 18:53:56 +08:00
  • cb7c419bc0 [Feat](sfa,dcp) support dcp for sfa (#6563) Qiu 2026-02-09 18:52:25 +08:00
  • 80e5812b39 [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (#6581) GoCHug 2026-02-09 17:17:52 +08:00
  • d060c797ed [fix bug] fix tensor mismatch bug in sigmoid operate test case (#6619) lhp-deep 2026-02-09 16:43:27 +08:00
  • 8325528368 [Kernel]: Optimize DispatchFFNCombine performance (#6468) xulei 2026-02-09 16:30:34 +08:00
  • 9c6d031797 [MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618) wangxiyuan 2026-02-09 15:38:58 +08:00
  • b7aa511daa [Patch] Remove the patch of MiniCPM (#5975) Canlin Guo 2026-02-09 14:07:44 +08:00
  • e5f0e0eaf7 [P/D] layerwise connector support recompute scheduler (#5900) liziyu 2026-02-07 15:24:42 +08:00
  • d266fd7b47 [CI] Add workflow support for lint image build (#6489) wangxiyuan 2026-02-07 09:32:01 +08:00
  • 4fa7cf6f50 [Bugfix] Fix problematic dummy_run & improper input_batch_size in eagle (#6517) Zetong Li 2026-02-07 09:30:10 +08:00
  • 1cc225711d [Refactor]310p_e2e test case update (#6539) pu-zhe 2026-02-07 09:28:37 +08:00
  • c3db1aca2f [Refactor]refactor p2p connector (#6551) lty 2026-02-07 09:27:15 +08:00
  • 4f33e25046 [Refactor]refactor 310p attention impl and add ut (#6579) pu-zhe 2026-02-07 09:26:26 +08:00
  • 23524f2ca4 [Refactor]refactor 310p ops and add ut (#6591) pu-zhe 2026-02-07 09:25:17 +08:00
  • 6c49f95da2 [Ops][Refactor] Remove custom rotary_embedding operator (#6523) wangxiyuan 2026-02-07 09:24:05 +08:00
  • 06aa6036f6 [Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8) (#6604) SILONG ZENG 2026-02-07 09:16:07 +08:00
  • c63b7a1188 [Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder (#5301) wangyu 2026-02-06 17:30:17 +08:00
  • 06c0aed124 [CI] Fix broken CI (#6599) wangxiyuan 2026-02-06 17:23:58 +08:00
  • 19b5d44ea8 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10) (#6173) SILONG ZENG 2026-02-06 15:35:06 +08:00
  • 65b7f716e6 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #11) (#6176) SILONG ZENG 2026-02-06 15:28:49 +08:00
  • 4fb3d5e1b2 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #8) (#6129) SILONG ZENG 2026-02-06 15:25:08 +08:00
  • 99aedaff63 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7) (#6023) SILONG ZENG 2026-02-06 14:56:53 +08:00
  • d0bc16859c [CI][Misc] Some improvement for github action (#6587) wangxiyuan 2026-02-06 14:06:27 +08:00
  • d018aeb5fa [Image] Bump mooncake version to v0.3.8.post1 (#6428) Li Wang 2026-02-06 10:54:03 +08:00
  • 85e33941e8 [Feat.]: 310p support MOE models (#6530) pu-zhe 2026-02-06 10:30:56 +08:00
  • c38166eefa [Doc] backport 0.13.0 release note (#6584) wangxiyuan 2026-02-06 10:29:15 +08:00
  • 11339eb48a [CI] Update UT CANN version to 8.5.0 for main branch (#6564) Nengjun Ma 2026-02-06 10:28:42 +08:00
  • 81f3c09d6d [CI] Change A2 runner (#6557) zhangxinyuehfad 2026-02-05 23:43:57 +08:00
  • 8e66299bf1 [Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (#6469) Ruowei Zheng 2026-02-05 20:58:54 +08:00
  • 922e5c163b [main2main] upgrade vllm main 0202 (#6560) meihanc 2026-02-05 19:31:17 +08:00
  • 2c1608265b [CI][npugraph_ex]Fix npugraph ex e2e test (#6553) ChenCangtao 2026-02-05 14:03:10 +08:00
  • 33b8ca4e96 [Feature]KV pool supports sparse attention (#6339) lty 2026-02-05 10:36:52 +08:00
  • 13c4a9c78b [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (#6491) Wang Kunpeng 2026-02-05 10:06:14 +08:00
  • 0ead5e8681 perf: adaptive block size selection in linear_persistent kernel (#6537) Zhijun Chen 2026-02-04 21:36:26 +08:00
  • 2ee4f23f28 [ModelRunner][Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6475) Yizhou 2026-02-04 21:11:08 +08:00
  • 2dac18afea [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide (#6126) DreamerLeader 2026-02-04 16:35:41 +08:00
  • 804a9ec4e6 [Fusion] Add rmsnorm dynamic quant fusion pass (#6274) Zhang-Bryan 2026-02-04 15:53:53 +08:00
  • e7a13beedb [Bugfix] Synchronize only the current stream to avoid device sync (#6432) IWantFight 2026-02-04 10:59:45 +08:00
  • bfcc372f75 [CI] Add long and short prompt tests for DeepSeek-V3.2 (#6499) starmountain1997 2026-02-04 09:10:50 +08:00
  • 78fad4e348 [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (#6442) Nengjun Ma 2026-02-04 09:08:18 +08:00
  • fa56abea9f [bugfix][npugraph_ex]duplicate pattern issue (#6513) ChenCangtao 2026-02-04 08:49:13 +08:00
  • 7b3921c498 [bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass (#6430) ChenCangtao 2026-02-04 08:48:28 +08:00
  • a80e524fbc [Quant] GLM4.7-Flash Support W8A8 (#6492) dsxsteven 2026-02-03 19:49:58 +08:00
  • 4d6444d5fd [Nightly][BugFix] Remove kv_cache nz test case for test_mla_preprocess_nq.py (#6505) whx 2026-02-03 18:26:51 +08:00
  • b804eb12f6 [CI]Nightly test use main (#6502) SILONG ZENG 2026-02-03 15:40:59 +08:00
  • 41d48cb974 [CI] Update doctest from 0.9.1 to 0.13.0, and copy doc test workflow to nightly CI for better monitor. (#6452) zhangyiming 2026-02-03 15:19:03 +08:00
  • 03a18ad6fd [E2E] add E2E for Prefix Caching cp & Chunked Prefill cp (#5149) Feng Liu 2026-02-03 15:04:14 +08:00
  • be5b66de6d [Doc] Contributing a Benchmark Tutorial for Suffix Speculative Decoding (#6323) zhangguinan 2026-02-03 14:52:38 +08:00
  • b1de6cbb31 [Bugfix][CI]Add qwen3Next MTP+Full Decode (#6047) LeeWenquan 2026-02-03 14:26:21 +08:00
  • 39e77fb9e4 [Feat.]: support 310p w8a8 (#6454) Shaoxu Cheng 2026-02-03 14:13:06 +08:00
  • 79803932e2 [Kernel] Add AscendC fused op transpose_kv_cache_by_block to speed up GQA transfer (#6366) lidenghui1110 2026-02-03 14:10:01 +08:00
  • f4a72f0d16 [CI]Disable early exit to complete all tests (#6482) SILONG ZENG 2026-02-03 11:25:51 +08:00
  • dffac6db73 [Refactor] Add expert processed token count output for DispatchFFNCombine/DispatchFFNCombineBF16 (#6402) guanguan0308 2026-02-03 10:41:06 +08:00
  • 26b83f8bde [Bugfix] Improve Triton stability on Ascend for large grids (#6301) zhangxinyuehfad 2026-02-03 10:32:27 +08:00
  • 05cc03d785 [Bugfix] fix hash conflict due to reset incompatible configuations (#6368) zhangxinyuehfad 2026-02-03 10:32:02 +08:00
  • b6256e8bc9 Revert "[CI] fix DS3.2 single node cudagraph_sizes config (#6241)" (#6497) starmountain1997 2026-02-03 08:42:58 +08:00
  • c1618a0427 [Bugfix]Fix the compatibility issue of may_reinitialize_input_batch (#6290) debuger 2026-02-02 19:16:26 +08:00
  • 7932255c06 [Refactor][EAGLE] 6/N route mtp to eagle except pcp/dcp+mtp (#6349) lilinsiman 2026-02-02 19:15:31 +08:00
  • c08364f761 [Bugfix] Fix intermittent kv_port conflict with AscendDirectTransport (#6455) meihanc 2026-02-02 17:31:21 +08:00
  • 45a573cff1 [Quantization][Feature] Support compressed tensors moe w4a8 dynamic weight (#5889) LHXuuu 2026-02-02 16:39:32 +08:00
  • 082aa2e5b7 [Bugfix]The service fails to be started when the memcache pool is enabled (#6229) lty 2026-02-02 16:26:18 +08:00
  • 460ea88276 [Refact.]: Refactor some leftover implementations of 300I DUO in the main branch. (#6425) Shaoxu Cheng 2026-02-02 16:12:04 +08:00
  • eeedf7c503 [Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470) wangxiyuan 2026-02-02 15:57:55 +08:00
  • d53510b26d [Misc] Print triton info in collect_env.py (#6298) zhangyiming 2026-02-02 15:53:42 +08:00
  • 8134146ab6 [CI] fix DS3.2 single node cudagraph_sizes config (#6241) starmountain1997 2026-02-02 11:47:32 +08:00
  • d1dcdfc408 [bugfix]fix some bug in dispatch_ffn_combine kernel (#6465) LQLlulu 2026-02-02 08:32:42 +08:00
  • 347eb36a59 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #9) (#6135) SILONG ZENG 2026-02-01 23:20:20 +08:00
  • f7dc7d9b86 [CI] support build wheel and docker image by workflow (#6453) wangxiyuan 2026-02-01 20:06:22 +08:00
  • b4aafd4293 [Core][Misc] Clean up ProfileExecuteDuration (#6461) wangxiyuan 2026-02-01 20:06:01 +08:00
  • 775fbc4cd2 【main】【bugfix】fix: restrict default MLAPO activation to Decode nodes only (#6451) fems14 2026-01-31 22:44:56 +08:00
  • ef02d20086 [CI] update gemini styleguide (#6463) wangxiyuan 2026-01-31 18:02:49 +08:00
  • 5b0a6bcfe9 [ModelRunner] Revert "[Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6459) Li Wang 2026-01-31 16:33:34 +08:00
  • 96cbfebede [CI]Update gemini guide (#6458) wangxiyuan 2026-01-31 15:17:39 +08:00
  • e3a1586fce [CI]Update gemini config (#6447) wangxiyuan 2026-01-31 10:47:38 +08:00
  • 638cae824d [bugfix](CP) Fix and unify the PD request discrimination logic. (#5939) Qiu 2026-01-31 10:26:02 +08:00