Support precomputed_embeddings for Llama 4 (#8156)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
This commit is contained in:
File diff suppressed because one or more lines are too long
@@ -62,6 +62,7 @@ The core features include:
|
||||
backend/quantization.md
|
||||
backend/lora.ipynb
|
||||
backend/pd_disaggregation.md
|
||||
backend/vlm_query.ipynb
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
Reference in New Issue
Block a user