--- frameworks: - "" tasks: [] --- # FAST-3B Model Documentation ## Overview This repository provides access to the **FAST-3B** model, which is built on the **Qwen/Qwen2.5-VL-3B-Instruct** base model. ## System Prompt ``` """You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within tags. The final answer MUST BE put in \\boxed{}.""" ``` ## Decoding Paramater We recommend setting `temperature=0` to reproduce the reported performance. Note that performance may vary depending on the version of vLLM being used. ## Inference Guide ### Installation Install the required dependencies: ```bash pip install vllm==0.8.1 ``` ### Starting the Server Start the vLLM server with the following command: ```bash CUDA_VISIBLE_DEVICES=0 vllm serve /PATH/TO/FAST \ --max-model-len 12800 \ --dtype auto \ --gpu_memory_utilization 0.75 \ --trust-remote-code \ --max-num-seqs 12 \ --mm-processor-kwargs '{"max_pixels":1002112}' ``` Replace `/PATH/TO/FAST` with the actual path to your model. ### Simple Demo ```python import base64 # Define the system prompt SYSTEM_PROMPT = """You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within tags. The final answer MUST BE put in \\boxed{}.""" def simple_inference(image_path, query, max_tokens=2048, temperature=0): """ Perform a simple inference with an image and a text query. Args: image_path (str): Path to the input image file. query (str): Text query for the model. max_tokens (int): Maximum number of tokens in the response. temperature (float): Sampling temperature for the model. Returns: str: The model's response. """ # Load the image as a base64 string with open(image_path, 'rb') as file: image_base64 = "data:image/jpeg;base64," + base64.b64encode(file.read()).decode('utf-8') # Prepare the chat request request = { "model": "/PATH/TO/FAST", # Replace with your model path "messages": [ {"role": "system", "content": SYSTEM_PROMPT}, # Add the system prompt { "role": "user", "content": [ {"type": "text", "text": query}, {"type": "image_url", "image_url": {"url": image_base64}}, ], }, ], "temperature": temperature, "max_tokens": max_tokens, } # Call the chat API try: from openai import OpenAI client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY") chat_response = client.chat.completions.create(**request) return chat_response.choices[0].message.content except Exception as e: print(f"Error during inference: {e}") return None ```