model: add support for qwen3vl series (#16780)

EngineX-Ascend/enginex-ascend-910-llama.cpp

* support qwen3vl series.

Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>

* bugfix: fix the arch check for qwen3vl-moe.

* use build_ffn

* optimize deepstack structure

* optimize deepstack feature saving

* Revert "optimize deepstack feature saving" for temporal fix

This reverts commit f321b9fdf13e59527408152e73b1071e19a87e71.

* code clean

* use fused qkv in clip

* clean up / rm is_deepstack_layers for simplification

* add test model

* move test model to "big" section

* fix imrope check

* remove trailing whitespace

* fix rope fail

* metal : add imrope support

* add imrope support for sycl

* vulkan: add imrope w/o check

* fix vulkan

* webgpu: add imrope w/o check

* Update gguf-py/gguf/tensor_mapping.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix tensor mapping

---------

Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

This commit is contained in:

JJJYmmm

2025-10-30 23:19:14 +08:00

committed by

GitHub

parent dcca0d3ab8

commit d261223d24

28 changed files with 1125 additions and 97 deletions

									
										3

src/llama-hparams.h
									
												View File
												
				@@ -183,6 +183,9 @@ struct llama_hparams {

				    std::array<float, LLAMA_MAX_LAYERS> xielu_beta;

				    std::array<float, LLAMA_MAX_LAYERS> xielu_eps;

				    // qwen3vl deepstack

				    uint32_t n_deepstack_layers = 0;

				    // needed by encoder-decoder models (e.g. T5, FLAN-T5)

				    // ref: https://github.com/ggerganov/llama.cpp/pull/8141

				    llama_token dec_start_token_id = LLAMA_TOKEN_NULL;

model: add support for qwen3vl series (#16780)

3 src/llama-hparams.h Unescape Escape View File

3

src/llama-hparams.h

View File