Commit Graph

  • 6fc37bd8ee Fix sgl-kernel compile for sm80 (#3046) Ke Bao 2025-01-22 16:49:08 +08:00
  • 3d8f1c9bcf Use int64 as indices for set_kv_buffer (#3039) Lianmin Zheng 2025-01-21 19:46:09 -08:00
  • a42213dbd4 fix pr-test-sgl-kernel (#3036) Yineng Zhang 2025-01-22 00:56:42 +08:00
  • 0ac019f171 Support sm90 Int8 gemm (#3035) Ke Bao 2025-01-21 22:21:54 +08:00
  • 5a0d680a14 feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033) Yineng Zhang 2025-01-21 20:44:49 +08:00
  • a4331cd260 Add accuracy and latency tests of eagle into CI (#3027) Lianmin Zheng 2025-01-21 02:55:14 -08:00
  • ec1c21cdc4 upgrade torch version for sgl-kernel (#3026) Yineng Zhang 2025-01-21 14:32:08 +08:00
  • 6c856b4f3a minor: update Makefile for sgl-kernel (#3025) Yineng Zhang 2025-01-21 13:08:15 +08:00
  • 287d07a669 Misc fixes for eagle (flush_cache, CPU overhead) (#3014) Lianmin Zheng 2025-01-20 20:25:13 -08:00
  • d2571dd5c7 Enable Cohere2 Models (#3018) Hui Liu 2025-01-20 19:21:41 -08:00
  • b730aa6b9e [EAGLE] Fix some boundary situation when retract reqs and req's max token = 1 (#2939) 996_icu 2025-01-21 09:46:43 +08:00
  • 60b2a44a80 Fix flaky tests in test_programs.py (#3022) Lianmin Zheng 2025-01-20 16:50:39 -08:00
  • 949b3fbfce [Doc] Update doc of custom logit processor (#3021) Hongpeng Guo 2025-01-20 16:50:25 -08:00
  • da4e8b3892 enable kv_scale remap (#3017) Hui Liu 2025-01-20 14:40:45 -08:00
  • af6c5357d5 deepseek v3 and r1 chat template (#3015) Enrique Shockwave 2025-01-20 22:40:12 +00:00
  • 3ad4cd4915 bump router to 0.1.3 (#3020) Byron Hsu 2025-01-20 14:38:06 -08:00
  • 3a8428ecaa [router] Expose worker startup interval (#3019) Byron Hsu 2025-01-20 14:36:54 -08:00
  • 0311ce8e1c [router] Expose worker startup secs & Return error instead of panic for router init (#3016) Byron Hsu 2025-01-20 12:45:13 -08:00
  • 5dfcacfcb1 Add compile flags for cutlass 3.x (#3013) Ke Bao 2025-01-21 00:04:12 +08:00
  • 41a0ccd4f1 Add clang-format check to sgl-kernel ci (#3012) Ke Bao 2025-01-20 23:22:19 +08:00
  • e94fb7cb10 chore: bump v0.4.1.post7 (#3009) Yineng Zhang 2025-01-20 21:50:55 +08:00
  • b5caa22dfb [kernel] port rope cuda kernel to sgl-kernel (#2993) Byron Hsu 2025-01-20 04:58:51 -08:00
  • 73401fd016 Sync distributed package from vllm 0.6.4.post1 (#3010) Lianmin Zheng 2025-01-20 04:57:14 -08:00
  • 89cd923581 Roll back to use vllm custom allreduce (#3006) Lianmin Zheng 2025-01-20 04:03:15 -08:00
  • dc1881326f Fix perf regression on small batch sizes (#3008) Lianmin Zheng 2025-01-20 03:39:49 -08:00
  • 10bfce71b3 fix moe align blocks benchmark (#3003) yiakwy-xpu-ml-framework-team 2025-01-20 19:33:29 +08:00
  • 583697cd71 [Enhancement] Custom Logit Processor Improvement (#2998) Hongpeng Guo 2025-01-20 02:00:35 -08:00
  • 2584f6d944 Docs: Add Performance Demonstaration for DPA (#3005) Chayenne 2025-01-20 01:00:52 -08:00
  • 51e87f6f21 Skip flaky custom_logit_processor tests (#3004) Lianmin Zheng 2025-01-20 00:28:47 -08:00
  • 09bcbe0123 Update TypeBasedDispatcher and balance CI tests (#3001) Lianmin Zheng 2025-01-19 23:37:27 -08:00
  • 03464890e0 Separate two entry points: Engine and HTTP server (#2996) Lianmin Zheng 2025-01-19 22:09:24 -08:00
  • 44a9669770 keep rotary_embedding only (#2997) Yineng Zhang 2025-01-20 13:21:36 +08:00
  • 1a820e38a2 Remove dependency of pynvml on ROCm (#2995) Chaitanya Sri Krishna Lolla 2025-01-20 10:30:35 +05:30
  • 0ffcfdf474 Docs: Only use X-Grammar in structed output (#2991) Chayenne 2025-01-19 20:22:47 -08:00
  • cd493b5afc Improve metrics, logging, and importing orders (#2992) Lianmin Zheng 2025-01-19 18:36:59 -08:00
  • 61f42b5732 Move sgl.Runtime under sglang/lang (#2990) Lianmin Zheng 2025-01-19 17:10:29 -08:00
  • e403d23757 [Feature] Add sampler custom logits processor (#2396) Hongpeng Guo 2025-01-19 14:46:53 -08:00
  • 3bcf5ecea7 support regex in xgrammar backend (#2983) Enrique Shockwave 2025-01-19 20:34:41 +00:00
  • 2c05f81f15 fix custom op version compatibility (#2988) Yineng Zhang 2025-01-20 04:21:29 +08:00
  • d77caa2b75 [#2812] Make the decode status dict capcity adjustable by a CLI param (#2839) Seungduk Kim 2025-01-20 04:36:53 +09:00
  • 8b6a4486ec fix missing revision arg when loading tokenizer (#2982) giorgiopiatti-dfinity 2025-01-19 20:36:07 +01:00
  • a69cb5cff7 cleanup unused header in sgl_kernel (#2986) Yineng Zhang 2025-01-20 00:44:49 +08:00
  • def5c31873 docs: update supported_models (#2987) Yineng Zhang 2025-01-20 00:44:30 +08:00
  • 3fc2b62589 update docker dev image (#2985) Yineng Zhang 2025-01-19 23:45:39 +08:00
  • 6ada05d0ed feat: check for is_cuda for sgl_kernel import (#2984) Yineng Zhang 2025-01-19 23:33:04 +08:00
  • 24cafe3177 add config to swtich from vllm custom allreduce to sgl_kernel custom allreduce (#2981) yizhang2077 2025-01-19 22:30:38 +08:00
  • 5a176c92df fix deepseek v2 with cpu device (#2975) Yineng Zhang 2025-01-19 21:33:27 +08:00
  • 4719c1d04a [router] Fix sgl router path for release (#2980) Byron Hsu 2025-01-19 17:11:06 +08:00
  • ef18b0eda2 [router] Allow empty worker list for sglang.launch_router (#2979) Byron Hsu 2025-01-19 17:05:23 +08:00
  • 53cc91e504 [devcontainer] Fix mount and GPU & Support rust dev (#2978) Byron Hsu 2025-01-19 16:34:01 +08:00
  • d33cbb7e58 remove cub and add cccl (#2976) Yineng Zhang 2025-01-19 15:51:27 +08:00
  • 23196d5254 Simplify logits processor (#2974) Lianmin Zheng 2025-01-18 23:03:49 -08:00
  • 93b77c8e8a Fix the request loggings to make it fully able to be easily replayed (#2973) Lianmin Zheng 2025-01-18 21:45:00 -08:00
  • 7906d1d298 Remove the unused write_with_records (#2972) Lianmin Zheng 2025-01-18 20:20:23 -08:00
  • 81d27c8e31 Refactor to add TypeBasedDispatcher to simplify dispatching (#2958) fzyzcjy 2025-01-19 12:13:27 +08:00
  • 4d4cdb3fe7 Frontend: better error message handling for FINISH_ABORT in scheduler.py (#2956) Chang Su 2025-01-18 19:37:30 -08:00
  • 2bd18e2d76 Memory pool: Minor optimize to avoid to (#2901) Yang Zheng 2025-01-19 11:35:12 +08:00
  • 83452dbb4a fix file name spelling mistake and useless variable in minmax-text-01-lightning_attention (#2971) Xiaoyu Zhang 2025-01-19 10:56:13 +08:00
  • 3d93f84a00 [Feature] Support minicpmv v2.6 (#2785) Mick 2025-01-19 06:14:19 +08:00
  • c2f212d672 optimize MiniMax-Text-01 lightning_attn_decode triton (#2966) Xiaoyu Zhang 2025-01-18 23:41:01 +08:00
  • e2cdc8a5b5 upgrade cutlass v3.7.0 (#2967) Yineng Zhang 2025-01-18 23:37:42 +08:00
  • 2add697d7a feat: remove vllm get_rope (#2964) Yineng Zhang 2025-01-18 19:38:01 +08:00
  • 6f98c586bd fix sgl-kernel setup.py (#2963) lukec 2025-01-18 18:50:37 +08:00
  • 656dcc1a99 Remove fp8 monkey patch (#2960) Ke Bao 2025-01-18 15:00:29 +08:00
  • 8af7048dcf Query remaining memory dynamically for PrefillAdder (#2941) Zhiqiang Xie 2025-01-17 20:20:26 -08:00
  • d3024f4fc8 support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894) bjmsong 2025-01-18 11:43:22 +08:00
  • 13387e6b7a Multi-turn benchmark for hierarchical caching (#2942) Zhiqiang Xie 2025-01-17 16:17:24 -08:00
  • 120c3634ef Fix Llama-3.1-405B References Docs (#2944) Wen Sun 2025-01-18 06:46:38 +08:00
  • 78e5b22f29 feat: use get_rope for gemma2 (#2954) Yineng Zhang 2025-01-18 02:57:18 +08:00
  • 7a15e9ad36 cleanup models unused import 2/n (#2952) Yineng Zhang 2025-01-18 01:09:19 +08:00
  • dc2ac0cbdb Update pr template (#2951) Ke Bao 2025-01-18 00:44:16 +08:00
  • d47c5101f1 Add ut for qwen model (#2947) Ke Bao 2025-01-18 00:03:54 +08:00
  • 033c715b46 cleanup models dependencies 1/n (#2948) Yineng Zhang 2025-01-17 23:46:48 +08:00
  • d06c1ab587 update ci install dependency (#2949) Yineng Zhang 2025-01-17 23:42:23 +08:00
  • c5644cace9 docs: add Cursor for adoption and sponsorship (#2950) Yineng Zhang 2025-01-17 23:41:57 +08:00
  • 53e6552fed Fix qwen accuracy issue (#2945) Ke Bao 2025-01-17 22:35:26 +08:00
  • 5dc54f1a62 feat: remove vllm distributed (#2907) Yineng Zhang 2025-01-17 22:31:51 +08:00
  • f3e9b4894b Fix sgl-kernel ci (#2938) Ke Bao 2025-01-17 17:26:21 +08:00
  • 6a7973add8 Update release-docs.yml (#2937) Lianmin Zheng 2025-01-17 00:36:40 -08:00
  • 63051738a9 Enable CPU device on SGLang (#2806) Chunyuan WU 2025-01-17 13:22:53 +08:00
  • a8ccacc8b8 [Frontend] Fix request length check and add option to disallow auto truncation in scheduler (#2876) Chang Su 2025-01-16 14:51:19 -08:00
  • 0427416b59 Fix zmq binding (#2930) Lianmin Zheng 2025-01-16 14:36:07 -08:00
  • bf3edc2c60 Docs: Update pull_request_template.md (#2928) Chayenne 2025-01-16 13:04:11 -08:00
  • 78e974b2a5 [kernel] MiniMax-Text-01 decode lightning_attn with triton (#2920) Xiaoyu Zhang 2025-01-17 04:51:38 +08:00
  • bc6915e3b9 Improve type annotation and styles (#2926) Lianmin Zheng 2025-01-16 12:51:11 -08:00
  • a883f0790d Update release-docker-amd.yml to run on amd docker runner. (#2927) saienduri 2025-01-16 12:42:29 -08:00
  • 8b6ce52e92 Support multi-node DP attention (#2925) Lianmin Zheng 2025-01-16 11:15:00 -08:00
  • 58f3f2b840 Add CI for sgl-kernel (#2924) Ke Bao 2025-01-17 01:26:51 +08:00
  • 93d690617e Simplify the process launch code in server.py (#2923) Lianmin Zheng 2025-01-16 07:52:17 -08:00
  • e00e5385e0 add profiling to bench_one_batch script (#2821) Yun Dai 2025-01-16 07:24:24 -08:00
  • a2f602b541 fixed lm_head.weight error for quantized qwen (#2910) Rin Intachuen 2025-01-16 21:51:43 +07:00
  • 8f2c522aba Improve benchmark scripts and error message printing (#2922) Lianmin Zheng 2025-01-16 06:24:31 -08:00
  • 7596417732 minor: use bear for compilation database (#2919) Yineng Zhang 2025-01-16 18:39:11 +08:00
  • 2dc957d421 fix setup for sgl kernel (#2917) Yineng Zhang 2025-01-16 18:17:34 +08:00
  • bf8d07a6f9 feat: patch linear base (#2915) Yineng Zhang 2025-01-16 18:00:03 +08:00
  • ab31793661 [kernel] MiniMax-Text-01 prefill lightning_attn with triton (#2911) Xiaoyu Zhang 2025-01-16 14:18:29 +08:00
  • b7f3fec13c minor: rename bench for sgl kernel (#2909) Yineng Zhang 2025-01-16 05:55:43 +08:00
  • 58f42b1dd8 minor: update pr test (#2908) Yineng Zhang 2025-01-16 05:51:49 +08:00
  • 767c9dec03 adapt custom allreduce for tensorrt llm (#2511) yizhang2077 2025-01-16 04:57:35 +08:00
  • a53454c55e fix: sgl-kernel link cuda (#2906) Yineng Zhang 2025-01-16 04:53:23 +08:00