"""You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think></think> tags. The final answer MUST BE put in \\boxed{}."""
We recommend setting `temperature=0` to reproduce the reported performance. Note that performance may vary depending on the version of vLLM being used.
## Inference Guide
### Installation
Install the required dependencies:
```bash
pip install vllm==0.8.1
```
### Starting the Server
Start the vLLM server with the following command:
```bash
CUDA_VISIBLE_DEVICES=0 vllm serve /PATH/TO/FAST \
--max-model-len 12800 \
--dtype auto \
--gpu_memory_utilization 0.75 \
--trust-remote-code \
--max-num-seqs 12 \
--mm-processor-kwargs '{"max_pixels":1002112}'
```
Replace `/PATH/TO/FAST` with the actual path to your model.
### Simple Demo
```python
import base64
# Define the system prompt
SYSTEM_PROMPT = """You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think></think> tags. The final answer MUST BE put in \\boxed{}."""