Files

46 lines
1.5 KiB
Markdown
Raw Permalink Normal View History

2026-06-23 17:17:22 +08:00
# 天数智芯 天垓100 文本生成引擎(基于 vLLM 优化适配Qwen3.6-35B-A3B
2025-08-29 15:40:07 +08:00
```
# 本地构建
2026-06-23 17:17:22 +08:00
docker build -t enginex-iluvatar-vllm:bi100-qwen3.6 -f Dockerfile .
2025-08-29 15:40:07 +08:00
```
2026-06-23 17:17:22 +08:00
启动容器镜像
2025-08-29 15:40:07 +08:00
2026-06-23 17:17:22 +08:00
下载Qwen3.6-35B-A3B模型并且需要将模型的config.json文件中architectures字段改成
```json
"architectures": [
"Qwen3_5MoeForCausalLM"
]
2025-08-29 15:40:07 +08:00
```
2026-06-23 17:17:22 +08:00
```bash
docker run -dit --network=host --ipc=host \
-v /usr/src:/usr/src -v /lib/modules:/lib/modules -v /dev:/dev --privileged \
-v /mnt/disk1/models/Qwen3.6-35B-A3B:/model:ro --entrypoint=python3 \
-e CUDA_VISIBLE_DEVICES=4,5,6,7 -e VLLM_ENGINE_ITERATION_TIMEOUT_S=3600 \
enginex-iluvatar-vllm:bi100-qwen3.6 \
-m vllm.entrypoints.openai.api_server \
--model /model --port 1111 --served-model-name llm \
--max-model-len 100000 --trust-remote-code -tp 4 --gpu-memory-utilization 0.95 \
--max-num-seqs 1 --disable-log-requests --disable-frontend-multiprocessing \
--max-num-batched-tokens 4096 --enable-chunked-prefill \
--max-seq-len-to-capture 32768 --enable-auto-tool-choice \
--tool-call-parser qwen3_coder --reasoning-parser qwen3
```
2025-08-29 15:40:07 +08:00
2026-06-23 17:17:22 +08:00
请求
2025-08-29 15:40:07 +08:00
```bash
2026-06-23 17:17:22 +08:00
curl http://localhost:1111/v1/chat/completions \
2025-08-29 15:40:07 +08:00
-H "Content-Type: application/json" \
-d '{
2026-06-23 17:17:22 +08:00
"model": "llm",
2025-08-29 15:40:07 +08:00
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
2026-06-23 17:17:22 +08:00
{"role": "user", "content": "Can you tell me the story of Snow White?"}
2025-08-29 15:40:07 +08:00
],
2026-06-23 17:17:22 +08:00
"max_tokens": 200,
"temperature": 0.7
2025-08-29 15:40:07 +08:00
}'
2026-06-23 17:17:22 +08:00
```