Files

46 lines
1.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 天数智芯 天垓100 文本生成引擎(基于 vLLM 优化适配Qwen3.6-35B-A3B
```
# 本地构建
docker build -t enginex-iluvatar-vllm:bi100-qwen3.6 -f Dockerfile .
```
启动容器镜像
下载Qwen3.6-35B-A3B模型并且需要将模型的config.json文件中architectures字段改成
```json
"architectures": [
"Qwen3_5MoeForCausalLM"
]
```
```bash
docker run -dit --network=host --ipc=host \
-v /usr/src:/usr/src -v /lib/modules:/lib/modules -v /dev:/dev --privileged \
-v /mnt/disk1/models/Qwen3.6-35B-A3B:/model:ro --entrypoint=python3 \
-e CUDA_VISIBLE_DEVICES=4,5,6,7 -e VLLM_ENGINE_ITERATION_TIMEOUT_S=3600 \
enginex-iluvatar-vllm:bi100-qwen3.6 \
-m vllm.entrypoints.openai.api_server \
--model /model --port 1111 --served-model-name llm \
--max-model-len 100000 --trust-remote-code -tp 4 --gpu-memory-utilization 0.95 \
--max-num-seqs 1 --disable-log-requests --disable-frontend-multiprocessing \
--max-num-batched-tokens 4096 --enable-chunked-prefill \
--max-seq-len-to-capture 32768 --enable-auto-tool-choice \
--tool-call-parser qwen3_coder --reasoning-parser qwen3
```
请求
```bash
curl http://localhost:1111/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llm",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Can you tell me the story of Snow White?"}
],
"max_tokens": 200,
"temperature": 0.7
}'
```