Georgi Gerganov
fd1234cb46
llama : add gpt-oss (#15091)
* oai moe
* compat with new checkpoint
* add attn sink impl
* add rope scaling yarn
* logits match with latest transformers code
* wip chat template
* rm trailing space
* use ggml_scale_bias
* rm redundant is_swa_all
* convert interleaved gate_up
* graph : fix activation function to match reference (#7)
* vocab : handle o200k_harmony special tokens
* ggml : add attention sinks support (#1)
* llama : add attn sinks
* ggml : add attn sinks
* cuda : add attn sinks
* vulkan : add support for sinks in softmax
remove unnecessary return
* ggml : add fused swiglu_oai op (#11)
* ggml : add fused swiglu_oai op
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* update CUDA impl
* cont : metal impl
* add vulkan impl
* test-backend-ops : more test cases, clean up
* llama : remove unfused impl
* remove extra lines
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
* repack mxfp4 upon conversion
* clean up a bit
* enable thinking
* add quick hack to render only some special tokens
* fix bf16 conversion
* remove vocab hack
* webui ok
* support chat parsing for gpt-oss
* fix webui
* direct mapping mxfp4, FINALLY
* force using mxfp4
* properly use lazy tensor
* ggml : add mxfp4
ggml : use e8m0 conversion instead of powf
Co-authored-by: Diego Devesa <slarengh@gmail.com>
change kvalues_mxfp4 table to match e2m1 (#6)
metal : remove quantization for now (not used)
cuda : fix disabled CUDA graphs due to ffn moe bias
vulkan : add support for mxfp4
cont : add cm2 dequant
* ggml : add ggml_add_id (#13)
* ggml : add ggml_add_id
* add cuda impl
* llama : add weight support check for add_id
* perf opt
* add vulkan impl
* rename cuda files
* add metal impl
* allow in-place ggml_add_id
* llama : keep biases on CPU with --cpu-moe
* llama : fix compile error
ggml-ci
* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw
ggml-ci
* cleanup
ggml-ci
* sycl : fix supports_op for MXFP4
ggml-ci
* fix Unknown reasoning format
* ggml-cpu : fix AVX build
ggml-ci
* fix hip build
ggml-ci
* cuda : add mxfp4 dequantization support for cuBLAS
ggml-ci
* ggml-cpu : fix mxfp4 fallback definitions for some architectures
ggml-ci
* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: slaren <slarengh@gmail.com>
2025-08-05 22:10:36 +03:00
..
2024-12-29 09:35:11 +01:00
2025-08-05 22:10:36 +03:00
2024-12-29 09:35:11 +01:00
2025-02-17 07:55:57 +01:00
2024-11-14 18:04:35 +01:00
2024-12-29 09:35:11 +01:00
2025-06-27 22:35:30 -05:00
2024-12-29 09:35:11 +01:00
2025-05-01 20:49:39 +02:00
2025-05-02 20:54:30 +03:00
2025-08-03 14:23:57 +02:00
2025-06-04 22:02:00 +02:00
2025-08-05 22:10:36 +03:00
2025-07-15 21:32:11 +02:00
2025-05-01 20:49:39 +02:00
2024-12-29 09:35:11 +01:00
2025-02-17 07:55:57 +01:00
2024-11-14 18:04:35 +01:00
2025-08-05 22:10:36 +03:00
2025-08-05 22:10:36 +03:00
2024-11-14 18:04:35 +01:00
2025-05-20 21:35:16 +00:00
2025-02-15 09:01:40 +01:00
2025-01-29 18:29:39 +01:00
2025-01-29 18:29:39 +01:00
2025-01-29 18:29:39 +01:00
2025-01-29 18:29:39 +01:00
2025-01-29 18:29:39 +01:00
2025-01-29 18:29:39 +01:00
2025-02-06 07:09:59 +01:00
2025-08-05 22:10:36 +03:00
2025-07-15 21:51:09 +02:00
2025-07-15 21:51:09 +02:00
2024-11-14 18:04:35 +01:00
2024-11-14 18:04:35 +01:00
2025-07-15 21:51:09 +02:00
2024-11-14 18:04:35 +01:00
2024-11-14 18:04:35 +01:00
2025-07-15 21:51:09 +02:00
2025-07-15 21:51:09 +02:00
2024-11-14 18:04:35 +01:00
2025-01-23 08:01:17 +01:00
2024-12-29 09:35:11 +01:00
2025-07-05 09:26:04 +02:00
2025-07-05 09:26:04 +02:00
2025-07-05 09:26:04 +02:00
2025-07-08 20:11:42 +02:00
2025-07-05 09:26:04 +02:00
2025-07-03 23:07:22 +02:00
2025-07-03 23:07:22 +02:00
2025-06-29 11:04:10 +02:00
2025-07-01 10:14:21 +02:00
2024-11-14 18:04:35 +01:00
2024-11-14 18:04:35 +01:00
2025-07-15 21:32:11 +02:00
2024-11-14 18:04:35 +01:00
2025-01-16 22:47:10 +01:00
2025-02-28 09:42:52 +01:00
2025-05-01 20:49:39 +02:00
2025-08-05 22:10:36 +03:00
2025-06-29 11:04:10 +02:00
2024-11-26 16:45:05 +01:00
2025-07-21 13:35:40 +02:00
2025-03-18 07:27:50 +08:00
2024-11-14 18:04:35 +01:00
2024-12-03 20:29:54 +01:00
2024-12-30 18:27:11 +01:00
2025-02-15 09:01:40 +01:00
2025-02-15 09:01:40 +01:00
2025-03-21 20:27:47 +01:00
2025-02-28 09:42:52 +01:00
2025-02-28 09:42:52 +01:00
2025-03-21 20:27:47 +01:00
2025-02-28 09:42:52 +01:00
2025-08-02 10:48:30 +02:00
2025-03-22 09:40:11 +01:00
2025-03-10 19:28:11 +00:00
2025-03-10 19:28:11 +00:00
2025-01-15 19:50:13 +00:00
2025-01-15 19:50:13 +00:00
2025-03-10 19:28:11 +00:00
2025-05-01 20:49:39 +02:00
2025-07-12 11:51:58 +02:00
2025-08-05 22:10:36 +03:00
2025-08-05 22:10:36 +03:00
2025-05-09 09:23:41 +02:00
2024-12-29 09:35:11 +01:00
2024-11-14 18:04:35 +01:00
2025-02-17 07:55:57 +01:00
2024-12-29 09:35:11 +01:00
2024-11-14 18:04:35 +01:00
2025-03-31 14:37:01 +02:00
2025-06-29 11:04:10 +02:00
2025-05-04 07:17:16 +02:00
2025-02-17 07:55:57 +01:00
2024-12-29 09:35:11 +01:00
2025-02-25 12:04:45 +01:00
2025-07-22 17:35:21 +02:00
2025-07-12 14:25:44 +03:00
2025-07-15 21:32:11 +02:00
2025-07-08 15:21:21 +02:00
2025-07-08 15:21:21 +02:00
2025-07-08 15:21:21 +02:00
2025-02-16 08:52:23 +01:00
2025-07-15 21:32:11 +02:00
2025-07-09 18:16:12 +02:00
2025-05-04 07:17:16 +02:00
2025-02-25 12:04:45 +01:00
2024-11-14 18:04:35 +01:00
2024-12-29 09:35:11 +01:00
2025-02-25 12:04:45 +01:00
2025-08-05 22:10:36 +03:00
2024-12-29 09:35:11 +01:00
2025-02-17 07:55:57 +01:00
2024-11-14 18:04:35 +01:00
2025-08-05 22:10:36 +03:00
2025-06-29 11:04:10 +02:00
2025-05-04 07:17:16 +02:00
2025-05-01 20:49:39 +02:00
2024-12-08 09:05:55 +01:00
2025-01-08 09:18:13 +01:00
2025-03-31 14:37:01 +02:00
2024-11-14 18:04:35 +01:00
2025-08-05 22:10:36 +03:00
2025-07-12 14:25:44 +03:00
2025-08-05 22:10:36 +03:00
2024-12-16 22:00:46 +01:00
2025-03-18 07:27:50 +08:00