Support precomputed_embeddings for Llama 4 (#8156)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
This commit is contained in:
Kevin Xiang Li
2025-07-27 01:14:49 -07:00
committed by GitHub
parent 5c9c275bc8
commit 44d600cd67
6 changed files with 449 additions and 123 deletions

File diff suppressed because one or more lines are too long

View File

@@ -62,6 +62,7 @@ The core features include:
backend/quantization.md
backend/lora.ipynb
backend/pd_disaggregation.md
backend/vlm_query.ipynb
.. toctree::
:maxdepth: 1