[Refactor] Multimodal data processing for VLM (#6659)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
This commit is contained in:
@@ -132,7 +132,7 @@
|
||||
"\n",
|
||||
"mm_item = dict(\n",
|
||||
" modality=\"IMAGE\",\n",
|
||||
" image_grid_thws=processed_prompt[\"image_grid_thw\"],\n",
|
||||
" image_grid_thw=processed_prompt[\"image_grid_thw\"],\n",
|
||||
" precomputed_features=precomputed_features,\n",
|
||||
")\n",
|
||||
"out = llm.generate(input_ids=input_ids, image_data=[mm_item])\n",
|
||||
|
||||
Reference in New Issue
Block a user