xc-llm-kunlun

Author	SHA1	Message	Date
Li Wei	9cee025f41	Merge pull request #59 from liwei109/aicapx-quant [fix]remove weight_loader_v2 to suport cuda graph	2025-12-29 19:56:24 +08:00
baoqian426	45c6b8e927	Merge pull request #52 from liwei109/awq_gptq [dev] support AWQ/GPTQ quantization for dense models	2025-12-24 17:05:26 +08:00
Li Wei	6546323c71	[dev] support AWQ/GPTQ quantization for dense models	2025-12-24 13:46:06 +08:00
Li Wei	383eb5459a	[refactor] remove redundant code in linear	2025-12-24 12:02:09 +08:00
Xinyu Dong	75d0bdae2f	Merge pull request #40 from ldh2020/v0.11.0dev [Kernel] Optimize the performance of Qwen3-Next	2025-12-22 21:50:27 +08:00
hanhaowen	a4b9e92ca1	[Kernel] Replace native torch solve_tril by solve_tril_fwd kernel op	2025-12-22 17:37:19 +08:00
ldh2020	004e164bdb	[Kernel] Optimize the recurrent op	2025-12-21 11:18:00 +08:00
ldh2020	fce97df908	[Kernel] Use l2norm kernel op instead of triton op.	2025-12-16 16:24:47 +08:00
ldh2020	cff4727fbb	[Kernel] Optimize the performance of causal_conv1d.	2025-12-12 17:22:35 +08:00
ldh2020	9bb2ee06a4	[Bugfix] fix the bug of torch_solve_tril	2025-12-12 17:01:50 +08:00
chenyili	7c22d621fb	提交vllm0.11.0开发分支	2025-12-10 17:51:24 +08:00
zhaoyingzhuo	b614823125	[chore] Remove obsolete comments	2025-12-10 15:52:23 +08:00
dongxinyu03	c728e52505	Initial commit for vLLM-Kunlun Plugin	2025-12-10 12:05:39 +08:00