Commit Graph

  • ec3ee0289d fix sgl-kernel cu118 build (#4872) Yineng Zhang 2025-03-28 17:23:51 -07:00
  • 92941ce7b5 bump sgl-kernel 0.0.5.post4 (#4768) Yineng Zhang 2025-03-28 14:40:53 -07:00
  • 2bb0e7cf43 fix sampling issue (#4871) Yineng Zhang 2025-03-28 14:07:21 -07:00
  • 72549263c6 update sgl-kernel test ci (#4866) Yineng Zhang 2025-03-28 11:42:41 -07:00
  • 044c315970 Make torch compile configurable for biased_grouped_topk (#4749) Qingquan Song 2025-03-28 10:57:52 -07:00
  • 4db29e82ec [Feat] support deepgemm for cmake (#4864) yinfan98 2025-03-29 01:51:44 +08:00
  • c483377ed7 Fix wrong variable name when stopping memory profile (#4772) Fr4nk1in 2025-03-29 01:35:02 +08:00
  • 74e0ac1dbd Clean up import vllm in quantization/__init__.py (#4834) Lianmin Zheng 2025-03-28 10:34:10 -07:00
  • ef9a378a20 [Feature] add multi-rank support for Lora (#4492) chaobo jia 2025-03-29 00:38:44 +08:00
  • 6dea5c96bf Revert "get the python version from env (#4729)" (#4863) Yineng Zhang 2025-03-28 08:07:48 -07:00
  • 6ffb6bd47a Fix fa3 cuda graph page_size > 1 precision and page_size=1 speed (#4855) Qingquan Song 2025-03-28 01:35:59 -07:00
  • 47e6628aae Fix CI tests (#4853) Lianmin Zheng 2025-03-28 00:28:35 -07:00
  • 7907f9eb20 test: reduce mem_fraction_static for gemma3 vision test (#4840) Juwan Yoo 2025-03-27 23:20:10 -07:00
  • 8c04f0f2e1 Support with_stack and record_shapes in profiler (#4740) fzyzcjy 2025-03-28 14:01:42 +08:00
  • 265e756494 Super tiny remove unused code (#4750) fzyzcjy 2025-03-28 13:32:14 +08:00
  • d3f71f5e19 Fix torch.cuda.MemPool() internal assertion failure (#4687) fzyzcjy 2025-03-28 13:29:36 +08:00
  • 5eae67cb1f get the python version from env (#4729) DavidChan 2025-03-28 13:26:42 +08:00
  • 6dbf99982f Fix missing arguments in SchedulePolicy and RadixCache initialization in tests. (#4712) vikram singh shekhawat 2025-03-28 10:53:51 +05:30
  • e0166f8ab4 Remove empty tool function name (#4704) Kebe 2025-03-28 13:23:30 +08:00
  • 53a2c3b466 Support controlling nsys start and end range programmatically (#4688) fzyzcjy 2025-03-28 13:21:13 +08:00
  • 550586ef42 fix: Inappropriate lack of Optional type on OpenAI ChatCompletionRequest (#4681) BroadbentJim 2025-03-28 05:19:05 +00:00
  • cf29fe9e78 Fix Engine error when enabling DP attention (#4648) fzyzcjy 2025-03-28 13:17:30 +08:00
  • 26c0f13126 Support Page Size > 1 for FA3 (#4832) Stefan He 2025-03-27 22:07:14 -07:00
  • f9970bd1af fix: when use SGLANG_PORT this env,port is str (#4528) rongfu.leng 2025-03-28 12:46:06 +08:00
  • 2e0f94ab79 [Fix] fix output_top_logprobs is not exist (#4597) lambert0312 2025-03-28 12:45:57 +08:00
  • 18317ddc13 ci: add condition for daily docker build (#4487) warjiang 2025-03-28 12:44:37 +08:00
  • e2e2ab70e0 IPv6 support (#3949) Vincent 2025-03-28 00:42:13 -04:00
  • 0d3e3072ee Fix CI of test_patch_torch (#4844) fzyzcjy 2025-03-28 12:22:45 +08:00
  • 62dd95870c Remove retry in nightly tests (#4846) fzyzcjy 2025-03-28 12:18:29 +08:00
  • 72031173e4 fix: fix typo of comments in w8a8_fp8.py (#4843) Jiaqi 2025-03-28 12:06:47 +08:00
  • 9fdc6d6abc Fix the lora adapter when lora path is none (#4799) Qiaolin Yu 2025-03-28 00:03:08 -04:00
  • 42a45df043 [Fix] self.worker assignment in TpModelWorker and refactor references (#4788) XinyuanTong 2025-03-27 20:28:38 -07:00
  • 04eb6062e4 Include context length in /v1/models response. (#4809) Jon Durbin 2025-03-27 23:23:18 -04:00
  • e84f4ba0ab [Misc] Fix issues reported by torchfix (#4837) Brayden Zhong 2025-03-27 23:10:32 -04:00
  • b149b39353 [CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder (#3969) Brayden Zhong 2025-03-27 22:45:02 -04:00
  • 31dfff7da7 use default for torch.ops (#4835) Yineng Zhang 2025-03-27 19:09:58 -07:00
  • 10a9ab7b07 Fix error due to CustomAllreduce setup failure (#4815) Kebe 2025-03-28 09:52:10 +08:00
  • bb0fd749a6 [Fix] Add compressed_tensors as deps (#4819) Junrong Lin 2025-03-28 09:08:24 +08:00
  • 7f19e083c1 Support (1 <= dp < tp) in the dp attention in DeepEP (#4770) tarinkk 2025-03-27 20:09:35 -04:00
  • 98a2cfa9b2 Basic Cleanup (#4833) Daniel Holanda 2025-03-27 16:55:48 -07:00
  • 2a882e8f3a Fix the nightly eval by lowering the threshold of neuralmagic/gemma-2-2b-it-FP8 (#4830) Lianmin Zheng 2025-03-27 16:09:49 -07:00
  • e6e4d02245 Update MMMU Benchmark instructions (#4694) Ravi Theja 2025-03-28 03:14:16 +05:30
  • 188105a21b deps: lazy import optional dependencies gguf and torchvision (#4826) Juwan Yoo 2025-03-27 14:35:36 -07:00
  • b39532587b Update doc for DeepSeek-V3-0324 (#4825) Ke Bao 2025-03-28 04:30:40 +08:00
  • 5fa3058f01 fix the release doc dependency issue (#4828) Yineng Zhang 2025-03-27 13:28:12 -07:00
  • bbab97a6a8 add partial_json_parser and einops (#4827) Yineng Zhang 2025-03-27 13:24:54 -07:00
  • 0bc0bf5734 gemma3: impl get_attention_sliding_window_size for attn init (#4823) Juwan Yoo 2025-03-27 10:43:58 -07:00
  • f60f293195 [k8s] Clarified the usage of shared memory. (#4341) Jiří Suchomel 2025-03-27 16:53:19 +01:00
  • 17000d2b3a Remove Unintended Capture Batch Sizes in AMD HIP Graph Runner (#4638) AinL 2025-03-28 00:41:33 +09:00
  • 668ecc6c5b Fix ut mla-test-1-gpu-amd (#4813) strgrb 2025-03-27 23:27:51 +08:00
  • 886fcbdd09 Use apply_rope_with_cos_sin_cache_inplace for DeepSeek (#4764) strgrb 2025-03-27 16:45:37 +08:00
  • 8bf6d7f406 support cmake for sgl-kernel (#4706) Yineng Zhang 2025-03-27 01:42:28 -07:00
  • 1b9175cb23 [FA3 Attn Backend] Remove Unnecessary Device Sync for FA3 (#4745) Stefan He 2025-03-27 00:45:11 -07:00
  • 92bb49a7f9 Patch PyTorch's bug that cross-process tensor transfer will lead to wrong device (#4565) fzyzcjy 2025-03-27 15:22:33 +08:00
  • 6f5cc5eb05 update xgrammar 0.1.17 (#4804) Yineng Zhang 2025-03-27 00:21:59 -07:00
  • c913ed4046 support clip embedding model (#4506) Pan Lyu 2025-03-27 15:18:15 +08:00
  • 1afe3d0798 Align finish reason and stream mode in openai api (#4388) Xihuai Wang 2025-03-27 15:16:52 +08:00
  • 44f47d3ee1 Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace (#4628) Didier Durand 2025-03-27 08:16:16 +01:00
  • ae25d36dc6 [3/3] fix dsv3 awq issue (#4719) laixin 2025-03-27 14:13:43 +08:00
  • 1099f6c974 bump v0.4.4.post2 (#4669) Yineng Zhang 2025-03-26 19:58:00 -07:00
  • 04e3ff6975 Support compressed tensors fp8w8a8 (#4743) Xiaoyu Zhang 2025-03-27 04:21:25 +08:00
  • 45fdf1f7f3 Fix shared memory OOM on sm86 GPUs. (#4797) Yi Pan 2025-03-27 01:41:53 +08:00
  • d89c0e4b7e Use metadata to detect version of package (#4782) Kebe 2025-03-26 15:41:43 +08:00
  • fa3c9e0668 Fix popen_launch_server wait for 20 minutes when child process exits (#4777) fzyzcjy 2025-03-26 15:32:19 +08:00
  • 0d658ac3df Support recording experts workload in QWen2-MoE (#4775) Cheng Wan 2025-03-26 03:29:57 -04:00
  • ced35a0649 fix(typo): fix reply to replay in base_attn_backend.py (#4784) Thysrael 2025-03-26 15:19:12 +08:00
  • 26f07294f1 Warn users when release_memory_occupation is called without memory saver enabled (#4566) fzyzcjy 2025-03-26 15:18:14 +08:00
  • 34e07a65f1 [Fix] Fix unexpected idx bug of Phi-3-small (#4728) Baizhou Zhang 2025-03-25 22:33:48 -06:00
  • 15ddd84322 Add retry for flaky tests in CI (#4755) fzyzcjy 2025-03-26 07:53:12 +08:00
  • 52029bd1e3 Fix warmup error when dp=1 (#4753) fzyzcjy 2025-03-25 17:01:21 +08:00
  • eb934bdf3b Fix test_expert_distribution failure (#4752) fzyzcjy 2025-03-25 16:17:03 +08:00
  • e45ae444db Revert "Add DeepEP tests into CI (#4737)" (#4751) fzyzcjy 2025-03-25 15:44:01 +08:00
  • ac3fae8445 [Feature] Support "strict" in function calling (#4310) DarkSharpness 2025-03-25 14:15:25 +09:00
  • 2d1b83e57a add dsv3 int8 test (#4705) HandH1998 2025-03-25 12:57:58 +08:00
  • 199bb01d00 Add endpoints to dump selected expert ids (#4435) yuhsaun-t 2025-03-24 21:34:19 -07:00
  • 6b7038babd Speedup warmup when DP > 1 (#4695) fzyzcjy 2025-03-25 12:08:05 +08:00
  • 57eec0bfbc fix FlashMLA cudagraph config (#4691) lukec 2025-03-25 12:06:58 +08:00
  • f01b092519 Super tiny fix typo (#4738) fzyzcjy 2025-03-25 12:05:45 +08:00
  • 14269198e3 [Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul (#4735) Chunan Zeng 2025-03-24 20:56:31 -07:00
  • 9b7cf9ee6c support cu128 sgl-kernel (#4744) Yineng Zhang 2025-03-24 20:53:23 -07:00
  • 1e86457c90 model: Minicpmo (#3023) Mick 2025-03-25 11:08:40 +08:00
  • 64129fa632 Add DeepEP tests into CI (#4737) fzyzcjy 2025-03-25 10:54:31 +08:00
  • e9f8e42318 Support FP4 gemm (1/2) (#3899) Trevor Morris 2025-03-24 19:50:23 -07:00
  • 22c3702e1e [Model] Support Qwen2ForSequenceClassification (#4609) Ximingwang-09 2025-03-25 10:13:44 +08:00
  • 4c584fc632 Fix circular imports in gptq.py and unblock test explorer (#4736) Stefan He 2025-03-24 18:07:08 -07:00
  • 77cf771ebe Fix EAGLE3 for llama3.3 70b (#4716) Ke Bao 2025-03-25 08:31:19 +08:00
  • 8154de5a32 [PD] Remove invalid parameter (#4721) Xuchun Shang 2025-03-25 04:14:16 +08:00
  • c11cfda07b update pyproject (#4731) Yineng Zhang 2025-03-24 09:50:28 -07:00
  • 64edeb798f Support dynamic version name in sglang's pyproject.toml (#4720) Yuhong Guo 2025-03-24 23:56:31 +08:00
  • 65c24c28f9 [Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396) Chunan Zeng 2025-03-23 23:44:17 -07:00
  • 3980ff1be6 rename benchmark_deepgemm_fp8_group_gemm.py (#4605) Tongbao Zhang 2025-03-24 14:35:20 +08:00
  • 5d7edc8e55 Support FA3 as Attention backend by using --attention-backend fa3 (#4680) Stefan He 2025-03-23 23:28:11 -07:00
  • af6535e7aa [ROCm] Enable MTP (NextN) on AMD GPU (#4631) Alex Sun 2025-03-24 13:58:05 +08:00
  • 93cf7fc5cd Unify variable naming: replace is_in_free_group with is_not_in_free_group (#4698) c1lovez1 2025-03-24 12:51:08 +08:00
  • 2a206b22ed Fix RotaryEmbedding when using Triton backend for EXAONE-3.5-2.4B (#4064) Kyungmin Lee 2025-03-24 09:58:12 +09:00
  • 4d25305700 Move mem_state update into debug mode (#4525) Zhiqiang Xie 2025-03-23 00:52:27 -07:00
  • 11577cedb7 refactor: bug fixes and refactor for vlm (#4661) Mick 2025-03-23 13:48:49 +08:00
  • ca75741e86 Support async in DeepEP (#4610) fzyzcjy 2025-03-23 13:39:56 +08:00
  • c6d549e773 Multiple tiny code cleanups (#4608) fzyzcjy 2025-03-23 13:39:11 +08:00
  • 3c09548d1f close gemma2 in test_verl_engine.py temporarily (#4685) Yi Zhang 2025-03-23 07:36:46 +08:00