Add Llama4 support (#5092)

Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
This commit is contained in:
Chang Su
2025-04-07 00:29:36 -07:00
committed by GitHub
parent d1bb171180
commit f04c80dc42
27 changed files with 2214 additions and 22 deletions

View File

@@ -128,6 +128,7 @@ class ModelRunner:
self.model_config.attention_arch == AttentionArch.MLA
and not server_args.disable_mla
)
self.attention_chunk_size = model_config.attention_chunk_size
# Model-specific adjustment
self.model_specific_adjustment()