Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)

Co-authored-by: Ying Sheng <sqy1415@gmail.com>
This commit is contained in:
Chayenne
2024-08-26 01:29:12 +08:00
committed by GitHub
parent 66e7dcaf70
commit 30b4f771b0
15 changed files with 167 additions and 55 deletions

View File

@@ -187,6 +187,13 @@ response = client.chat.completions.create(
max_tokens=64,
)
print(response)
# Text embedding
response = client.embeddings.create(
model="default",
input="How are you today",
)
print(response)
```
It supports streaming, vision, and most features of the Chat/Completions/Models/Batch endpoints specified by the [OpenAI API Reference](https://platform.openai.com/docs/api-reference/).
@@ -223,6 +230,8 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
### Supported Models
**Generative Models**
- Llama / Llama 2 / Llama 3 / Llama 3.1
- Mistral / Mixtral / Mistral NeMo
- Gemma / Gemma 2
@@ -243,6 +252,12 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
- ChatGLM
- InternLM 2
**Embedding Models**
- e5-mistral
- gte-Qwen2
- `python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct --is-embedding`
Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/en/model_support.md).
#### Use Models From ModelScope