enginex-vllm-bi100-qwen36/README.md

# 天数智芯 天垓100 文本生成引擎（基于 vLLM 优化适配Qwen3.6-35B-A3B）

```
# 本地构建
docker build -t enginex-iluvatar-vllm:bi100-qwen3.6 -f Dockerfile .
```


启动容器镜像

下载Qwen3.6-35B-A3B模型，并且需要将模型的config.json文件中architectures字段改成
```json
"architectures": [
        "Qwen3_5MoeForCausalLM"
    ]
```

```bash
docker run -dit --network=host --ipc=host \
  -v /usr/src:/usr/src -v /lib/modules:/lib/modules -v /dev:/dev --privileged \
  -v /mnt/disk1/models/Qwen3.6-35B-A3B:/model:ro --entrypoint=python3 \
  -e CUDA_VISIBLE_DEVICES=4,5,6,7 -e VLLM_ENGINE_ITERATION_TIMEOUT_S=3600 \
  enginex-iluvatar-vllm:bi100-qwen3.6 \
  -m vllm.entrypoints.openai.api_server \
  --model /model --port 1111 --served-model-name llm \
  --max-model-len 100000 --trust-remote-code -tp 4 --gpu-memory-utilization 0.95 \
  --max-num-seqs 1 --disable-log-requests --disable-frontend-multiprocessing \
  --max-num-batched-tokens 4096 --enable-chunked-prefill \
  --max-seq-len-to-capture 32768 --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder --reasoning-parser qwen3
```

请求
```bash
curl http://localhost:1111/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llm",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Can you tell me the story of Snow White?"}
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'
```
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								# 天数智芯 天垓100 文本生成引擎（基于 vLLM 优化适配Qwen3.6-35B-A3B）
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
 								```
 								# 本地构建
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								docker build -t enginex-iluvatar-vllm:bi100-qwen3.6 -f Dockerfile .
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
+								```
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								启动容器镜像
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								下载Qwen3.6-35B-A3B模型，并且需要将模型的config.json文件中architectures字段改成
 								```json
 								"architectures": [
 								        "Qwen3_5MoeForCausalLM"
 								    ]
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
+								```
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								```bash
 								docker run -dit --network=host --ipc=host \
 								  -v /usr/src:/usr/src -v /lib/modules:/lib/modules -v /dev:/dev --privileged \
 								  -v /mnt/disk1/models/Qwen3.6-35B-A3B:/model:ro --entrypoint=python3 \
 								  -e CUDA_VISIBLE_DEVICES=4,5,6,7 -e VLLM_ENGINE_ITERATION_TIMEOUT_S=3600 \
 								  enginex-iluvatar-vllm:bi100-qwen3.6 \
 								  -m vllm.entrypoints.openai.api_server \
 								  --model /model --port 1111 --served-model-name llm \
 								  --max-model-len 100000 --trust-remote-code -tp 4 --gpu-memory-utilization 0.95 \
 								  --max-num-seqs 1 --disable-log-requests --disable-frontend-multiprocessing \
 								  --max-num-batched-tokens 4096 --enable-chunked-prefill \
 								  --max-seq-len-to-capture 32768 --enable-auto-tool-choice \
 								  --tool-call-parser qwen3_coder --reasoning-parser qwen3
 								```
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								请求
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
+								```bash
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								curl http://localhost:1111/v1/chat/completions \
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
+								  -H "Content-Type: application/json" \
 								  -d '{
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								    "model": "llm",
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
+								    "messages": [
 								      {"role": "system", "content": "You are a helpful assistant."},
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								      {"role": "user", "content": "Can you tell me the story of Snow White?"}
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
+								    ],
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								    "max_tokens": 200,
 								    "temperature": 0.7
-												Update README

											
										
										
											2025-08-29 15:40:07 +08:00
+								  }'
-												Add README and start commands

											
										
										
											2026-06-23 17:17:22 +08:00
+								```