Commit Graph

17 Commits

Author SHA1 Message Date
Byron Hsu
56503d9bc9 [1/N] Remove CacheConfig import in all model files (#1658) 2024-10-14 09:06:34 -07:00
Lianmin Zheng
36d5acfca5 Rename InputMetadata -> ForwardBatch (#1543) 2024-09-30 02:41:11 -07:00
Yineng Zhang
b4408b0d16 feat: update linear deps 1/N (#1305) 2024-09-19 20:53:11 +08:00
Lianmin Zheng
1acccb364a Fix oom issues with fp8 for llama (#1454) 2024-09-18 03:45:19 -07:00
Liangsheng Yin
70b6802982 Optimize conflicts between CUDA graph and vocab mask tensors (#1392) 2024-09-13 20:27:53 -07:00
Lianmin Zheng
f64eae3a29 [Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308) 2024-09-02 21:44:45 -07:00
Liangsheng Yin
381dd57bd6 Sampler cudagraph (#1253) 2024-08-28 18:58:52 -07:00
Lianmin Zheng
bf53bf5142 [Fix] Fix llava on multi images (#1247) 2024-08-28 06:33:05 -07:00
Yineng Zhang
f25f4dfde5 hotfix: revert sampler CUDA Graph (#1242) 2024-08-28 21:16:47 +10:00
Liangsheng Yin
75ce37f401 Move sampler into CUDA graph (#1201)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-08-26 07:02:50 -07:00
Liangsheng Yin
87e8c090e9 Organize code (rename, movement) (#953) 2024-08-06 20:50:32 -07:00
Liangsheng Yin
cdcbde5fc3 Code structure refactor (#807) 2024-07-29 23:04:48 -07:00
Yineng Zhang
dd7e8b9421 chore: add copyright for srt (#790) 2024-07-28 23:07:12 +10:00
Lianmin Zheng
30db99b3d9 Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776) 2024-07-27 19:50:34 -07:00
Liangsheng Yin
c9ee3d3559 Fix model forward grad (#628) 2024-07-15 22:09:09 -07:00
Ying Sheng
dc1b8bcfaa Format (#593) 2024-07-05 10:06:17 -07:00
Lianmin Zheng
1fa15099d8 Add LlamaForClassification (#559) 2024-06-22 00:49:31 -07:00