Commit Graph

  • b1a853b0f6 Upgrade vllm commit hash to 1216 (#5053) ZixuanWang 2025-12-17 08:48:36 +08:00
  • eb4c08f05d [bugfix] fix mtp accept rate (#5093) zhenwenqi2024 2025-12-17 01:35:26 +08:00
  • 5b1da4e914 [Feat] Support async_scheduler and disable_padded_drafter_batch in eagle (#4893) anon189Ty 2025-12-16 22:06:40 +08:00
  • cee521bad5 [Nightly][BugFix] Install triton for nightly e2e op test. (#5096) whx 2025-12-16 21:31:53 +08:00
  • c6f60e8dd8 [Nightly] Upgrade single node test to latest main (#5101) Li Wang 2025-12-16 21:28:45 +08:00
  • 8d099a5cd7 [Bugfix] EPLB nightly deepseek (#5095) LI SHENGYONG 2025-12-16 20:02:54 +08:00
  • 190ae55e9f Add a Mooncake installation tutorial for kv pool and update Mooncake installation tutorial (#5069) liziyu 2025-12-16 19:53:23 +08:00
  • 4ed2951400 【Feature】refactor npu_modelrunner for profile_run (#4993) zhenwenqi2024 2025-12-16 17:44:04 +08:00
  • af64087732 [bugfix] matmul_allreduce_add_rmsnorm aclnn interface (#5082) Trunrain 2025-12-16 17:36:40 +08:00
  • d11b74a571 Add release note for v0.11.0 (#4918) wangxiyuan 2025-12-16 17:31:45 +08:00
  • 039cc65e58 [Doc] Add user guide of speculative decoding (#5074) zhaomingyu13 2025-12-16 17:01:44 +08:00
  • ff0a1e012a [BugFix]Fix FIA input err in DSv3.1 (#5059) Wang Yixuan 2025-12-16 16:40:35 +08:00
  • 18d2395f5e [Bugfix] fix fastapi version (#5047) zhangxinyuehfad 2025-12-16 15:58:27 +08:00
  • ddd475d5be [ModelRunner] apply_grammer uses vllm function (#4974) zhenwenqi2024 2025-12-16 15:26:01 +08:00
  • 2f1aed98cc [Doc] Update version policy to the latest. (#5071) zhangyiming 2025-12-16 15:24:46 +08:00
  • 8c41770f1f [bugfix] fix fp32 trans nz (#5068) zzzzwwjj 2025-12-16 15:04:31 +08:00
  • 11e6d6c291 [doc] update developer guide (#5060) wangxiyuan 2025-12-16 14:09:52 +08:00
  • e07abfaa75 [Doc] Add new contributors. (#5066) zhangyiming 2025-12-16 12:47:40 +08:00
  • a63ef031af [Doc] Upgrade some outdated doc (#5062) Li Wang 2025-12-16 11:48:19 +08:00
  • bb3a826e08 [Refactor] Remove the process patches of Qwen2.5-VL and Qwen2.5-Omni (#5035) Canlin Guo 2025-12-16 11:43:52 +08:00
  • ca0823f238 [0.11.0][Bugfix] fix fastapi version (#5052) zhangxinyuehfad 2025-12-16 11:34:11 +08:00
  • 9c02fa9867 [bugfix] Fix mooncake kvpool accuracy issue (#4976) Chao Lei 2025-12-16 11:33:16 +08:00
  • 303c08aec9 [Doc] Update structured output doc with upstream link (#5058) Shanshan Shen 2025-12-16 11:32:53 +08:00
  • 9e24bdd44c [Feat] Refactor rejection sampler (#4975) realliujiaxu 2025-12-16 11:32:26 +08:00
  • 5f840696c1 Bump actions/checkout from 4 to 6 (#5015) dependabot[bot] 2025-12-16 11:30:41 +08:00
  • 0918de58d5 [Bugfix] dynamic eplb does't use fused_alltoall (#4919) LI SHENGYONG 2025-12-16 10:59:30 +08:00
  • 195eac665b [Core][Worker] Add UCMConnector for KV Cache Offloading (#4411) UnifiedCacheManager 2025-12-16 10:53:30 +08:00
  • 237fad635c [Fix]Revert temporary skip on mtp1/mtp2 correctness tests (aclgraph fix) (#5039) SILONG ZENG 2025-12-16 10:40:00 +08:00
  • 6063853ead [Misc] Upgrade vllm commit hash to 1215 (#5029) Li Wang 2025-12-16 09:23:02 +08:00
  • 5e0ada5395 [Bugfix] Fix the attn_metadata is None (#5038) MengLong Chen 2025-12-16 09:14:05 +08:00
  • 2b5b309133 [Bugfix]Fix precision issues in moe_mlp (vllm-ascend v0.11.0-dev) (#5023) Clorist33 2025-12-16 08:40:03 +08:00
  • d43cabc2b1 [Bugfix] Fix precision issues in moe_mlp (vllm-ascend main) (#5025) Clorist33 2025-12-16 08:39:54 +08:00
  • 87c0cfafa3 [0.11.0][Bugfix] fix fastapi version (#5048) zhangxinyuehfad 2025-12-15 23:51:38 +08:00
  • b662d914a4 [bugfix] [main] Fix KV cache query inconsistency across different TP ranks in the KV Pool (#5030) fems14 2025-12-15 21:56:05 +08:00
  • c064d11fd7 [Cleanup] Remove unused attn_metadata parameter from Proposer classes (#4862) Jade Zheng 2025-12-15 21:21:38 +08:00
  • a9625851ef [Attention] Temporarily add back pa for small batch sizes. (#4765) whx 2025-12-15 20:35:50 +08:00
  • 95e6400128 [KVPool]Fix PP get bug (#5007) baxingpiaochong 2025-12-15 20:27:57 +08:00
  • a5cb8e40f5 [doc]Modify quantization tutorials (#5026) InSec 2025-12-15 20:12:06 +08:00
  • e90e8afc94 [E2E] Collect test run time. (#5018) zhangyiming 2025-12-15 20:06:48 +08:00
  • 019c8e03c2 [CI] Delete deepseek3.2-exp nightly test (#5028) zhangxinyuehfad 2025-12-15 20:01:53 +08:00
  • 8d2998d0e4 [Misc] Upgrade vllm hash to 12_14 (#5000) Li Wang 2025-12-15 19:54:23 +08:00
  • 3b7eb5179f [Bugfix] fix the incorrect use of python's sum on tensors. (#4655) wangx700 2025-12-15 19:22:40 +08:00
  • 6029bea480 [UT]add pcp dcp ut (#4949) zengzengran 2025-12-15 18:41:38 +08:00
  • 5fae65f3a8 [Graph][Fusion] Add AddRMSNorm(with bias) and Quant Fusion Pattern (#5011) Icey 2025-12-15 18:37:56 +08:00
  • 6de4bedd04 update release note for suffix decoding (#5009) fluctlux 2025-12-15 17:22:19 +08:00
  • df7e0fe916 [Bugfix] qwen3-vl-235b-w8a8 load weight ERROR when start service (#4292) Levi 2025-12-15 16:39:58 +08:00
  • e25c57b346 [Bugfix] Add support for PP intermediate value types in graph mode (#4902) knight0528 2025-12-15 16:27:17 +08:00
  • e16444f21f [Bugfix] Fix the bug in initializing the shared_weight communication domain in sfa-cp, and fix the mtp weight load in pp>1 situation (#4913) zzhxxx 2025-12-15 16:21:49 +08:00
  • 70606e0bb9 [Test]update accuracy test of models (#4911) SILONG ZENG 2025-12-15 15:04:20 +08:00
  • b75bfc58f6 [Doc ] Supplement kvpool user guide (#5013) Chao Lei 2025-12-15 14:24:39 +08:00
  • aa02a85e4d [bugfix] Fix dummy-run and multi-node issues in MoE routing and MTP (#4947) Chen Chen 2025-12-15 14:18:23 +08:00
  • cc7b302020 Bump actions/upload-artifact from 5 to 6 (#5014) dependabot[bot] 2025-12-15 14:13:06 +08:00
  • 8fb0ef5ffa [main][BugFix] Fixed an accuracy bug of Qwen3-next-MTP when batched inferring (#4932) drslark 2025-12-15 13:22:30 +08:00
  • 545e856971 [Refactor]3/N Refactor mla_v1.py & extract mla_cp (#4933) wujinyuan1 2025-12-15 12:59:18 +08:00
  • 98b9e2e18e Add Qwen3-Next tutorials (#4607) ming1212 2025-12-15 11:48:22 +08:00
  • 6beb4434e1 [CI][Bugfix] Fix scheduleroutput has no attr get error in prompt logprobs (#4998) Mengqing Cao 2025-12-15 11:10:39 +08:00
  • 2497bbbaf6 [Misc] Update pooling example (#5002) Li Wang 2025-12-15 08:36:19 +08:00
  • bb7b74c14f add ut for model runner (#4991) LookAround0301 2025-12-14 11:16:20 +08:00
  • 8090914d69 [CI] CI refactor (#4928) wangxiyuan 2025-12-14 11:09:56 +08:00
  • ba28d54f35 [Perf]enable prefill flashcommon3 (#4065) AlvisGong 2025-12-14 09:34:13 +08:00
  • 0686b32d82 [Fix] Fixes issues in MTP with async scheduling and ACL graph (#4963) Yizhou 2025-12-14 00:10:11 +08:00
  • 42ceaf08a1 add release note for 0.12.0 (#4995) wangxiyuan 2025-12-13 22:09:59 +08:00
  • 0f92d34a70 [CI] Pull latest vllm-ascend src before tests (#4988) Li Wang 2025-12-13 19:04:14 +08:00
  • fd7c929145 [perf] replace all_reduce for kv_consumer and support different num_tokens among all ranks (#4983) wangxiyuan 2025-12-13 18:59:54 +08:00
  • 5211e991ad Revert "[Bugfix] support mtp kv transfer and pp partition by hand in kv transfer (#4892)" (#4981) wangxiyuan 2025-12-13 18:58:55 +08:00
  • 31c94b7e7b [doc][main] Correct more doc mistakes (#4958) lilinsiman 2025-12-13 18:36:58 +08:00
  • 4721e4f53f [bugfix] asyncscheduler bug fix (#4968) zhenwenqi2024 2025-12-13 17:04:54 +08:00
  • 3581946256 [Bugfix] fix eagle proposer (#4971) realliujiaxu 2025-12-12 22:39:49 +08:00
  • 45889a6185 [Bugfix] Pass vllm_config to kv_connector_no_forward in NPUModelRunner (#4970) Jade Zheng 2025-12-12 22:36:23 +08:00
  • fa367e3b1a [CI] Add mtp_proposer ut (#4397) MengLong Chen 2025-12-12 20:41:31 +08:00
  • fc818f1509 [doc][main] Correct mistakes in doc (#4945) lilinsiman 2025-12-12 19:17:10 +08:00
  • f708d919f8 [Feature] model_runner refactor (#4764) zhenwenqi2024 2025-12-12 17:27:09 +08:00
  • 5b12c068f9 [Nightly] Remove gen_ranktable logic (#4941) Li Wang 2025-12-12 17:20:18 +08:00
  • 0cdf98ac48 [usability]Modify the default value of the protocol to ascend (#4959) lty 2025-12-12 16:56:18 +08:00
  • 0983c5510a vllm-ascend support Ascend950 with Qwen dense model. (#4228) wangyao-i 2025-12-12 15:50:57 +08:00
  • 716c4dacfe update qwen2.5vl readme (#4938) liziyu 2025-12-12 15:40:07 +08:00
  • 4ae7588c52 [Doc] Upgrade outdated doc (#4957) Li Wang 2025-12-12 15:38:29 +08:00
  • 62a9fea7af 【doc】Add model feature matrix (#4950) 1092626063 2025-12-12 15:37:39 +08:00
  • cf801fdbbb [CI] fix light test (#4954) zhangxinyuehfad 2025-12-12 15:24:04 +08:00
  • 84b9d38e28 BugFix: Resolve PolicyFlashlb warm up function attribute error (#4741) Mercykid-bash 2025-12-12 14:55:26 +08:00
  • 01a13a9b77 fix nz for quantization (#4943) wangxiyuan 2025-12-12 14:54:41 +08:00
  • 5932abc446 [Bugfix] Fix the Eagle3 inference failure issue. (#4721) sunchendd 2025-12-12 14:52:29 +08:00
  • 4f0dddc9ee [Bugfix] bugfix for moe_mlp in vllm-ascend/v0.11.0-dev (#4885) Clorist33 2025-12-12 14:51:47 +08:00
  • 4984e8a284 [Bugfix] bugfix for moe_mlp (#4822) Clorist33 2025-12-12 14:51:20 +08:00
  • d65fb194d9 [Feat] Add custom Embedding tensor model parallel (#2616) lidenghui1110 2025-12-12 14:41:20 +08:00
  • 9c0ad46c1a [0.11.0][Bugfix] Remove the ZMQ communication setup on the D node (#4916) Slightwind 2025-12-12 14:37:49 +08:00
  • b8a317caac [main][Bugfix] Remove the ZMQ communication setup on the D node (#4926) Slightwind 2025-12-12 14:37:26 +08:00
  • d54db76dd2 [MoE][TorchAir] Remove FusedMoEState (#4927) weichen 2025-12-12 09:12:24 +08:00
  • bfafe30953 [CI] refect e2e test (#4799) zhangxinyuehfad 2025-12-12 08:42:08 +08:00
  • a6ef3ac4e4 [Performance] Pre-issued exponential distribution operator. (#4908) weijinqian0 2025-12-11 23:02:51 +08:00
  • 0fbe0831ec [bugfix][refactor] fix recompute_scheduler break with vllm 0.12.0 & support async scheduling & refactor recompute_scheduler.py (#4895) linfeng-yuan 2025-12-11 22:24:49 +08:00
  • e538fa6f9c [Doc] Update tutorial index (#4920) wangxiyuan 2025-12-11 20:53:13 +08:00
  • e56dba9b0d [CI]cleanup e2e test (#4800) SILONG ZENG 2025-12-11 20:35:32 +08:00
  • 3349f61769 [CI] Cancel whl build when submitting a new commit (#4925) Li Wang 2025-12-11 19:54:52 +08:00
  • c30b51e764 Refactor CI workflow (#4912) wangxiyuan 2025-12-11 19:34:43 +08:00
  • 551069e53a [Doc] Update structured output doc with upstream link (#4015) Shanshan Shen 2025-12-11 19:14:29 +08:00
  • 06a66939cd Remove mindie_turbo (#4896) wangxiyuan 2025-12-11 18:46:12 +08:00
  • b89763f1ed [CI] speed up ut (#4901) wangxiyuan 2025-12-11 18:45:43 +08:00
  • 3fade30275 [Bugfix] Prevent engine hang during KVCacheSendingThread startup (#4754) Jade Zheng 2025-12-11 18:39:25 +08:00
  • 18221c0e1d [Fusion] normalize fusion naming and enable e2e test (#4693) Icey 2025-12-11 17:53:43 +08:00