[Docs] Improve the AISBench multi-modal testing docs (#4255)

### What this PR does / why we need it? Add some of the pitfalls I ran into when using AISBench to test multi-modal models. - vLLM version: v0.11.0 - vLLM main: 2918c1b49c --------- Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2025-11-19 16:00:39 +08:00
parent d43022f3ed
commit d5fef22149
1 changed files with 41 additions and 0 deletions
--- a/docs/source/developer_guide/evaluation/using_ais_bench.md
+++ b/docs/source/developer_guide/evaluation/using_ais_bench.md
@@ -152,6 +152,9 @@ rm gsm8k.zip
 Update the file `benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py`.
 There are several arguments that you should update according to your environment.

+- `attr`: Identifier for the inference backend type, fixed as `service` (serving-based inference) or `local` (local model).
+- `type`: Used to select different backend API types.
+- `abbr`: Unique identifier for a local task, used to distinguish between multiple tasks.
 - `path`: Update to your model weight path.
 - `model`: Update to your model name in vLLM.
 - `host_ip` and `host_port`: Update to your vLLM server ip and port.
@@ -242,6 +245,8 @@ After each dataset execution, you can get the result from saved files such as `o

 #### Execute Performance Evaluation

+Text-only benchmarks:
+
 ```shell
 # run C-Eval dataset
 ais_bench --models vllm_api_general_chat --datasets ceval_gen_0_shot_cot_chat_prompt.py --summarizer default_perf --mode perf
@@ -262,6 +267,13 @@ ais_bench --models vllm_api_general_chat --datasets livecodebench_code_generate_
 ais_bench --models vllm_api_general_chat --datasets aime2024_gen_0_shot_chat_prompt.py --summarizer default_perf --mode perf
 ```

+Multi-modal benchmarks (text + images):
+
+```shell
+# run textvqa dataset
+ais_bench --models vllm_api_stream_chat --datasets textvqa_gen_base64 --summarizer default_perf --mode perf
+```
+
 After execution, you can get the result from saved files, there is an example as follows:

 ```
@@ -281,3 +293,32 @@ After execution, you can get the result from saved files, there is an example as
        |-- cevaldataset_plot.html # Final performance results (in html format)
        `-- cevaldataset_rps_distribution_plot_with_actual_rps.html # Final performance results (in html format)
 ```
+
+### 3. Troubleshooting
+
+#### Invalid Image Path Error
+
+If you download the TextVQA dataset following the AISBench documentation:
+
+```bash
+cd ais_bench/datasets
+git lfs install
+git clone https://huggingface.co/datasets/maoxx241/textvqa_subset
+mv textvqa_subset/ textvqa/
+mkdir textvqa/textvqa_json/
+mv textvqa/*.json textvqa/textvqa_json/
+mv textvqa/*.jsonl textvqa/textvqa_json/
+```
+
+you may encounter the following error:
+
+```bash
+AISBench - ERROR - /vllm-workspace/benchmark/ais_bench/benchmark/clients/base_client.py - raise_error - 35 - [AisBenchClientException] Request failed: HTTP status 400. Server response: {"error":{"message":"1 validation error for ChatCompletionContentPartImageParam\nimage_url\n  Input should be a valid dictionary [type=dict_type, input_value='data/textvqa/train_images/b2ae0f96dfbea5d8.jpg', input_type=str]\n    For further information visit https://errors.pydantic.dev/2.12/v/dict_type None","type":"BadRequestError","param":null,"code":400}}
+```
+
+You need to manually replace the dataset image paths with absolute paths, changing `/path/to/benchmark/ais_bench/datasets/textvqa/train_images/` to the actual absolute directory where the images are stored:
+
+```bash
+cd ais_bench/datasets/textvqa/textvqa_json
+sed -i 's#data/textvqa/train_images/#/path/to/benchmark/ais_bench/datasets/textvqa/train_images/#g' textvqa_val.json
+```