[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)
### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code
### How was this patch tested?
- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
@@ -11,6 +11,7 @@ To quantize a model, users should install [ModelSlim](https://gitee.com/ascend/m
|
||||
Currently, only the specific tag [modelslim-VLLM-8.1.RC1.b020_001](https://gitee.com/ascend/msit/blob/modelslim-VLLM-8.1.RC1.b020_001/msmodelslim/README.md) of modelslim works with vLLM Ascend. Please do not install other version until modelslim master version is available for vLLM Ascend in the future.
|
||||
|
||||
Install modelslim:
|
||||
|
||||
```bash
|
||||
git clone https://gitee.com/ascend/msit -b modelslim-VLLM-8.1.RC1.b020_001
|
||||
cd msit/msmodelslim
|
||||
@@ -22,7 +23,6 @@ pip install accelerate
|
||||
|
||||
Take [DeepSeek-V2-Lite](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite) as an example, you just need to download the model, and then execute the convert command. The command is shown below. More info can be found in modelslim doc [deepseek w8a8 dynamic quantization docs](https://gitee.com/ascend/msit/blob/modelslim-VLLM-8.1.RC1.b020_001/msmodelslim/example/DeepSeek/README.md#deepseek-v2-w8a8-dynamic%E9%87%8F%E5%8C%96).
|
||||
|
||||
|
||||
```bash
|
||||
cd example/DeepSeek
|
||||
python3 quant_deepseek.py --model_path {original_model_path} --save_directory {quantized_model_save_path} --device_type cpu --act_method 2 --w_bit 8 --a_bit 8 --is_dynamic True
|
||||
@@ -39,6 +39,7 @@ Once convert action is done, there are two important files generated.
|
||||
2. [quant_model_description.json](https://www.modelscope.cn/models/vllm-ascend/DeepSeek-V2-Lite-W8A8/file/view/master/quant_model_description.json?status=1). All the converted weights info are recorded in this file.
|
||||
|
||||
Here is the full converted model files:
|
||||
|
||||
```bash
|
||||
.
|
||||
├── config.json
|
||||
@@ -103,4 +104,4 @@ submit a issue, maybe some new models need to be adapted.
|
||||
|
||||
### 2. How to solve the error "Could not locate the configuration_deepseek.py"?
|
||||
|
||||
Please convert DeepSeek series models using `modelslim-VLLM-8.1.RC1.b020_001` modelslim, this version has fixed the missing configuration_deepseek.py error.
|
||||
Please convert DeepSeek series models using `modelslim-VLLM-8.1.RC1.b020_001` modelslim, this version has fixed the missing configuration_deepseek.py error.
|
||||
|
||||
@@ -6,7 +6,6 @@ Sleep Mode is an API designed to offload model weights and discard KV cache from
|
||||
|
||||
Since the generation and training phases may employ different model parallelism strategies, it becomes crucial to free KV cache and even offload model parameters stored within vLLM during training. This ensures efficient memory utilization and avoids resource contention on the NPU.
|
||||
|
||||
|
||||
## Getting started
|
||||
|
||||
With `enable_sleep_mode=True`, the way we manage memory(malloc, free) in vllm will under a specific memory pool, during loading model and initialize kv_caches, we tag the memory as a map: `{"weight": data, "kv_cache": data}`.
|
||||
|
||||
Reference in New Issue
Block a user