[Bugs] Fix Docs Build Problem (#97)

* [Bugs] Docs fixed

* Update contributing.md

* Update index.md

* fix lua to text

* fix title size
This commit is contained in:
Xinyu Dong
2026-01-10 05:55:40 +08:00
committed by GitHub
parent 8c9cabd760
commit 7be26ca617
17 changed files with 721 additions and 151 deletions

View File

@@ -17,9 +17,10 @@ docker run -itd \
-v /usr/local/bin/:/usr/local/bin/ \
-v /lib/x86_64-linux-gnu/libxpunvidia-ml.so.1:/lib/x86_64-linux-gnu/libxpunvidia-ml.so.1 \
iregistry.baidu-int.com/hac_test/aiak-inference-llm:xpu_dev_20251113_221821 bash
docker exec -it glm-vllm-01011 /bin/bash
```
### Offline Inference on multi XPU
Start the server in a container:
@@ -30,7 +31,7 @@ import os
from vllm import LLM, SamplingParams
def main():
model_path = "/data/GLM-4.5"
llm_params = {
@@ -50,7 +51,7 @@ def main():
"content": [
{
"type": "text",
"text": "你好,请问你是谁?"
"text": "Hello, who are you?"
}
]
}
@@ -68,8 +69,8 @@ def main():
response = outputs[0].outputs[0].text
print("=" * 50)
print("输入内容:", messages)
print("模型回复:\n", response)
print("Input content:", messages)
print("Model response:\n", response)
print("=" * 50)
if __name__ == "__main__":
@@ -83,12 +84,10 @@ If you run this script successfully, you can see the info shown below:
```bash
==================================================
输入内容: [{'role': 'user', 'content': [{'type': 'text', 'text': '你好,请问你是谁?'}]}]
模型回复:
Input content: [{'role': 'user', 'content': [{'type': 'text', 'text': 'Hello, who are you?'}]}]
Model response:
<think>
嗯,用户问了一个相当身份的直接问题。这个问题看似简单,但背后可能
有几种可能性意—ta或许初次测试我的可靠性或者单纯想确认对话方。从AI助手的常见定位用户给出清晰平的方式明确身份同时为后续可能
的留出生进行的空间。\n\n用户用“你”这个“您”语气更倾向非正式交流所以回复风格可以轻松些。不过既然是初次回复保持适度的专业性比较好稳妥。提到
Well, the user asked a rather direct question about identity. This question seems simple, but there could be several underlying intentions—perhaps they are testing my reliability for the first time, or they simply want to confirm the identity of the conversational partner. From the common positioning of AI assistants, the user has provided a clear and flat way to define identity while leaving room for potential follow-up questions.\n\nThe user used "you" instead of "your", which leans towards a more informal tone, so the response style can be a bit more relaxed. However, since this is the initial response, it is better to maintain a moderate level of professionalism. Mentioning
==================================================
```
@@ -114,8 +113,9 @@ python -m vllm.entrypoints.openai.api_server \
--no-enable-chunked-prefill \
--distributed-executor-backend mp \
--served-model-name GLM-4.5 \
--compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun", "vllm.unified_attention", "vllm.unified_attention_with_output", "vllm.mamba_mixer2"]}' > log_glm_plugin.txt 2>&1 &
--compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun", "vllm.unified_attention", "vllm.unified_attention_with_output", "vllm.mamba_mixer2"]}' > log_glm_plugin.txt 2>&1 &
```
If your service start successfully, you can see the info shown below:
```bash
@@ -132,7 +132,7 @@ curl http://localhost:8989/v1/chat/completions \
-d '{
"model": "GLM-4.5",
"messages": [
{"role": "user", "content": "你好,请问你是谁?"}
{"role": "user", "content": "Hello, who are you?"}
],
"max_tokens": 100,
"temperature": 0.7
@@ -142,7 +142,7 @@ curl http://localhost:8989/v1/chat/completions \
If you query the server successfully, you can see the info shown below (client):
```bash
{"id":"chatcmpl-6af7318de7394bc4ae569e6324a162fa","object":"chat.completion","created":1763101638,"model":"GLM-4.5","choices":[{"index":0,"message":{"role":"assistant","content":"\n<think>用户问“你好请问你是谁这是一个应该是个了解我的身份。首先我需要确认用户的需求是什么。可能他们是第一次使用这个服务或者之前没有接触过类似的AI助手所以想确认我的背景和能力。 \n\n接下来我要确保回答清晰明了同时友好关键点我是谁由谁开发能做什么。需要避免使用专业术语保持口语化让不同容易理解。 \n\n然后用户可能有潜在的需求比如想了解我能","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":11,"total_tokens":111,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_tr
{"id":"chatcmpl-6af7318de7394bc4ae569e6324a162fa","object":"chat.completion","created":1763101638,"model":"GLM-4.5","choices":[{"index":0,"message":{"role":"assistant","content":"\n<think>The user asked, \"Hello, who are you?\" This is a question about my identity. First, I need to confirm the user's intent. They might be using this service for the first time or have never interacted with similar AI assistants before, so they want to know my background and capabilities.\n\nNext, I should ensure my answer is clear and friendly, focusing on key points: who I am, who developed me, and what I can do. I should avoid technical jargon and keep the response conversational so it's easy to understand.\n\nAdditionally, the user may have potential needs, such as wanting to know what I am capable of.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":11,"total_tokens":111,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_tr
```
Logs of the vllm server:
@@ -150,4 +150,4 @@ Logs of the vllm server:
```bash
(APIServer pid=54567) INFO: 127.0.0.1:60338 - "POST /v1/completions HTTP/1.1" 200 OK
(APIServer pid=54567) INFO 11-13 14:35:48 [loggers.py:123] Engine 000: Avg prompt throughput: 0.5 tokens/s, Avg generation throughput: 0.7 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
```
```

View File

@@ -16,7 +16,7 @@ if [ $XPU_NUM -gt 0 ]; then
DOCKER_DEVICE_CONFIG="${DOCKER_DEVICE_CONFIG} --device=/dev/xpuctrl:/dev/xpuctrl"
fi
export build_image="xxxxxxxxxxxxxxxxx"
export build_image="xxxxxxxxxxxxxxxxx"
docker run -itd ${DOCKER_DEVICE_CONFIG} \
--net=host \
@@ -58,7 +58,7 @@ def main():
"content": [
{
"type": "text",
"text": "说个笑话"
"text": "tell a joke"
}
]
}
@@ -76,8 +76,8 @@ def main():
response = outputs[0].outputs[0].text
print("=" * 50)
print("输入内容:", messages)
print("模型回复:\n", response)
print("Input content:", messages)
print("Model response:\n", response)
print("=" * 50)
if __name__ == "__main__":
@@ -91,16 +91,18 @@ If you run this script successfully, you can see the info shown below:
```bash
==================================================
输入内容: [{'role': 'user', 'content': [{'type': 'text', 'text': '说个笑话'}]}]
模型回复:
Input content: [{'role': 'user', 'content': [{'type': 'text', 'text': 'tell a joke'}]}]
Model response:
<think>
好的,用户让我讲个笑话。首先,我需要考虑用户的需求。他们可能只是想轻松一下,或者需要一些娱乐。接下来,我要选择一个适合的笑话,不要太复杂,容易理解,同时也要有趣味性。
用户可能希望笑话是中文的,所以我要确保笑话符合中文的语言习惯和文化背景。我需要避免涉及敏感话题,比如政治、宗教或者可能引起误解的内容。然后,我得考虑笑话的结构,通常是一个设置和一个出人意料的结尾,这样能带来笑点。
Okay, the user asked me to tell a joke. First, I need to consider the user's needs. They might just want to relax or need some entertainment. Next, I need to choose a suitable joke that is not too complicated, easy to understand, and also interesting.
例如,可以讲一个关于日常生活的小幽默,比如动物或者常见的场景。比如,一只乌龟和兔子赛跑的故事,但加入一些反转。不过要确保笑话的长度适中,不要太长,以免用户失去兴趣。另外,要注意用词口语化,避免生硬或复杂的句子结构。
可能还要检查一下这个笑话是否常见,避免重复。如果用户之前听过类似的,可能需要
The user might expect the joke to be in Chinese, so I need to ensure that the joke conforms to the language habits and cultural background of Chinese. I need to avoid sensitive topics, such as politics, religion, or anything that might cause misunderstanding. Then, I have to consider the structure of the joke, which usually involves a setup and an unexpected ending to create humor.
For example, I could tell a light-hearted story about everyday life, such as animals or common scenarios. For instance, the story of a turtle and a rabbit racing, but with a twist. However, I need to ensure that the joke is of moderate length and not too long, so the user doesn't lose interest. Additionally, I should pay attention to using colloquial language and avoid stiff or complex sentence structures.
I might also need to check if this joke is common to avoid repetition. If the user has heard something similar before, I may need to come up with a different angle.
==================================================
```
@@ -130,6 +132,7 @@ python -m vllm.entrypoints.openai.api_server \
"vllm.unified_attention", "vllm.unified_attention_with_output",
"vllm.mamba_mixer2"]}' \
```
If your service start successfully, you can see the info shown below:
```bash
@@ -162,4 +165,4 @@ Logs of the vllm server:
```bash
(APIServer pid=54567) INFO: 127.0.0.1:60338 - "POST /v1/completions HTTP/1.1" 200 OK
(APIServer pid=54567) INFO 11-13 14:35:48 [loggers.py:123] Engine 000: Avg prompt throughput: 0.5 tokens/s, Avg generation throughput: 0.7 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
```
```