[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)

### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
Li Wang
2025-07-25 22:16:10 +08:00
committed by GitHub
parent d629f0b2b5
commit bdfb065b5d
31 changed files with 215 additions and 64 deletions

View File

@@ -30,7 +30,7 @@ docker run --rm \
## Install modelslim and convert model
:::{note}
You can choose to convert the model yourself or use the quantized model we uploaded,
You can choose to convert the model yourself or use the quantized model we uploaded,
see https://www.modelscope.cn/models/vllm-ascend/QwQ-32B-W8A8
:::
@@ -55,6 +55,7 @@ python3 quant_qwen.py --model_path $MODEL_PATH --save_directory $SAVE_PATH --cal
## Verify the quantized model
The converted model files looks like:
```bash
.
|-- config.json
@@ -72,11 +73,13 @@ Run the following script to start the vLLM server with quantized model:
:::{note}
The value "ascend" for "--quantization" argument will be supported after [a specific PR](https://github.com/vllm-project/vllm-ascend/pull/877) is merged and released, you can cherry-pick this commit for now.
:::
```bash
vllm serve /home/models/QwQ-32B-w8a8 --tensor-parallel-size 4 --served-model-name "qwq-32b-w8a8" --max-model-len 4096 --quantization ascend
```
Once your server is started, you can query the model with input prompts
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
@@ -93,7 +96,7 @@ curl http://localhost:8000/v1/completions \
Run the following script to execute offline inference on multi-NPU with quantized model:
:::{note}
To enable quantization for ascend, quantization method must be "ascend"
To enable quantization for ascend, quantization method must be "ascend"
:::
```python
@@ -131,4 +134,4 @@ for output in outputs:
del llm
clean_up()
```
```