Lianmin Zheng
|
38c00ed7a1
|
Fix multimodal registry and code sync scripts (#10759)
Co-authored-by: cctry <shiyang@x.ai>
|
2025-09-22 15:36:01 -07:00 |
|
Lianmin Zheng
|
60e37f8028
|
Move parsers under a single folder (#9912)
|
2025-09-02 18:25:04 -07:00 |
|
Lifu Huang
|
6e2151183b
|
Fix incorrect default get_hidden_dim logic (#8987)
|
2025-08-09 00:25:38 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
Qubitium-ModelCloud
|
56a724eba3
|
[QUANT] Add GPTQModel Dynamic Quantization + lm_head Quantization (#3790)
Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
|
2025-03-05 01:11:00 -08:00 |
|
fzyzcjy
|
a3339d8cac
|
Bug: Fix weight loader error when LM head weights are tied (#3766)
|
2025-02-21 17:53:12 -08:00 |
|
Ke Wen
|
862bcff833
|
Support loading of larger models with on-the-fly quantization (#3061)
|
2025-01-22 21:33:17 -08:00 |
|
Yineng Zhang
|
2add697d7a
|
feat: remove vllm get_rope (#2964)
|
2025-01-18 19:38:01 +08:00 |
|
Yineng Zhang
|
5dc54f1a62
|
feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
|
2025-01-17 22:31:51 +08:00 |
|
Jerry Zhang
|
9cc733b38c
|
move apply_torchao_config_ to model_runner (#2342)
|
2024-12-04 17:26:42 -08:00 |
|
Yineng Zhang
|
85e1a6f3aa
|
Update model_loader deps and qqq quantization deps (#2220) (#2318)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-02 23:22:13 +08:00 |
|
Lianmin Zheng
|
4936be8acc
|
Revert "Revert "[FEAT] Support GGUF format"" (#2287)
|
2024-11-30 22:14:48 -08:00 |
|
Lianmin Zheng
|
7e4c6dd8da
|
Revert "[FEAT] Support GGUF format" (#2285)
|
2024-11-30 19:03:26 -08:00 |
|
Yang Zheng
|
883c955489
|
[FEAT] Support GGUF format (#2215)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
|
2024-11-30 00:44:48 -08:00 |
|
Xuehai Pan
|
62a4a339eb
|
docs: fix module docstrings and copyright headers (#2077)
|
2024-11-22 22:16:53 +08:00 |
|
Lianmin Zheng
|
dfec7fca06
|
Rename sglang.bench_latency to sglang.bench_one_batch (#2118)
|
2024-11-21 20:07:48 -08:00 |
|
Jerry Zhang
|
7f8fcd39cd
|
Turn off autotune for scaled mm for fp8 dynamic quant in torchao (#2116)
|
2024-11-21 12:19:49 -08:00 |
|
Ke Wen
|
cf2489762b
|
Add Tensor Parallel to torch_native_llama (#1876)
|
2024-11-15 21:26:00 -08:00 |
|
Ke Bao
|
16eb33ffe2
|
Update vocab embedding deps and add TP switch (#1856)
|
2024-10-31 20:13:07 -07:00 |
|
Byron Hsu
|
56503d9bc9
|
[1/N] Remove CacheConfig import in all model files (#1658)
|
2024-10-14 09:06:34 -07:00 |
|
Jerry Zhang
|
9b0926ceeb
|
Add llama implementation with no tensor parallel linears (#1561)
|
2024-10-05 11:22:27 -07:00 |
|