Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: ispobock <ispobaoke@163.com>
This commit is contained in:
@@ -128,6 +128,7 @@ class ModelRunner:
|
||||
self.model_config.attention_arch == AttentionArch.MLA
|
||||
and not server_args.disable_mla
|
||||
)
|
||||
self.attention_chunk_size = model_config.attention_chunk_size
|
||||
|
||||
# Model-specific adjustment
|
||||
self.model_specific_adjustment()
|
||||
|
||||
Reference in New Issue
Block a user