Files

ybyang b5e14b2b78 [1/2][feature] support openai like classification api (#11618 )

2025-10-18 19:32:48 -07:00

4.7 KiB

Raw Blame History

Classification API

This document describes the /v1/classify API endpoint implementation in SGLang, which is compatible with vLLM's classification API format.

Overview

The classification API allows you to classify text inputs using classification models. This implementation follows the same format as vLLM's 0.7.0 classification API.

API Endpoint

POST /v1/classify

Request Format

{
  "model": "model_name",
  "input": "text to classify"
}

Parameters

model (string, required): The name of the classification model to use
input (string, required): The text to classify
user (string, optional): User identifier for tracking
rid (string, optional): Request ID for tracking
priority (integer, optional): Request priority

Response Format

{
  "id": "classify-9bf17f2847b046c7b2d5495f4b4f9682",
  "object": "list",
  "created": 1745383213,
  "model": "jason9693/Qwen2.5-1.5B-apeach",
  "data": [
    {
      "index": 0,
      "label": "Default",
      "probs": [0.565970778465271, 0.4340292513370514],
      "num_classes": 2
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}

Response Fields

id: Unique identifier for the classification request
object: Always "list"
created: Unix timestamp when the request was created
model: The model used for classification
data: Array of classification results
- index: Index of the result
- label: Predicted class label
- probs: Array of probabilities for each class
- num_classes: Total number of classes
usage: Token usage information
- prompt_tokens: Number of input tokens
- total_tokens: Total number of tokens
- completion_tokens: Number of completion tokens (always 0 for classification)
- prompt_tokens_details: Additional token details (optional)

Example Usage

Using curl

curl -v "http://127.0.0.1:8000/v1/classify" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jason9693/Qwen2.5-1.5B-apeach",
    "input": "Loved the new café—coffee was great."
  }'

Using Python

import requests
import json

# Make classification request
response = requests.post(
    "http://127.0.0.1:8000/v1/classify",
    headers={"Content-Type": "application/json"},
    json={
        "model": "jason9693/Qwen2.5-1.5B-apeach",
        "input": "Loved the new café—coffee was great."
    }
)

# Parse response
result = response.json()
print(json.dumps(result, indent=2))

Supported Models

The classification API works with any classification model supported by SGLang, including:

Classification Models (Multi-class)

LlamaForSequenceClassification - Multi-class classification
Qwen2ForSequenceClassification - Multi-class classification
Qwen3ForSequenceClassification - Multi-class classification
BertForSequenceClassification - Multi-class classification
Gemma2ForSequenceClassification - Multi-class classification

Label Mapping: The API automatically uses the id2label mapping from the model's config.json file to provide meaningful label names instead of generic class names. If id2label is not available, it falls back to LABEL_0, LABEL_1, etc., or Class_0, Class_1 as a last resort.

Reward Models (Single score)

InternLM2ForRewardModel - Single reward score
Qwen2ForRewardModel - Single reward score
LlamaForSequenceClassificationWithNormal_Weights - Special reward model

Note: The /classify endpoint in SGLang was originally designed for reward models but now supports all non-generative models. Our /v1/classify endpoint provides a standardized vLLM-compatible interface for classification tasks.

Error Handling

The API returns appropriate HTTP status codes and error messages:

400 Bad Request: Invalid request format or missing required fields
500 Internal Server Error: Server-side processing error

Error response format:

{
  "error": "Error message",
  "type": "error_type",
  "code": 400
}

Implementation Details

The classification API is implemented using:

Rust Router: Handles routing and request/response models in sgl-router/src/protocols/spec.rs
Python HTTP Server: Implements the actual endpoint in python/sglang/srt/entrypoints/http_server.py
Classification Service: Handles the classification logic in python/sglang/srt/entrypoints/openai/serving_classify.py

Testing

Use the provided test script to verify the implementation:

python test_classify_api.py

Compatibility

This implementation is compatible with vLLM's classification API format, allowing seamless migration from vLLM to SGLang for classification tasks.

4.7 KiB Raw Blame History