[Lint]Style: reformat markdown files via markdownlint (#5884)
### What this PR does / why we need it?
reformat markdown files via markdownlint
- vLLM version: v0.13.0
- vLLM main:
bde38c11df
---------
Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
This commit is contained in:
@@ -3,19 +3,21 @@
|
||||
## Environmental Dependencies
|
||||
|
||||
* Software:
|
||||
* Python >= 3.10, < 3.12
|
||||
* CANN == 8.3.rc2
|
||||
* PyTorch == 2.8.0, torch-npu == 2.8.0
|
||||
* vLLM:main branch
|
||||
* vLLM-Ascend:main branch
|
||||
* Python >= 3.10, < 3.12
|
||||
* CANN == 8.3.rc2
|
||||
* PyTorch == 2.8.0, torch-npu == 2.8.0
|
||||
* vLLM:main branch
|
||||
* vLLM-Ascend:main branch
|
||||
|
||||
### KV Pool Parameter Description
|
||||
|
||||
**kv_connector_extra_config**: Additional Configurable Parameters for Pooling.
|
||||
**lookup_rpc_port**: Port for RPC Communication Between Pooling Scheduler Process and Worker Process: Each Instance Requires a Unique Port Configuration.
|
||||
**load_async**: Whether to Enable Asynchronous Loading. The default value is false.
|
||||
**backend**: Set the storage backend for kvpool, with the default being mooncake.
|
||||
|
||||
### Environment Variable Configuration
|
||||
|
||||
To guarantee uniform hash generation, it is required to synchronize the PYTHONHASHSEED environment variable across all nodes upon enabling KV Pool.
|
||||
|
||||
```bash
|
||||
@@ -23,6 +25,7 @@ export PYTHONHASHSEED=0
|
||||
```
|
||||
|
||||
## Example of using Mooncake as a KV Pool backend
|
||||
|
||||
* Software:
|
||||
* Check NPU HCCN Configuration:
|
||||
|
||||
@@ -35,7 +38,7 @@ export PYTHONHASHSEED=0
|
||||
* Install Mooncake
|
||||
|
||||
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
|
||||
Installation and Compilation Guide: https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries.
|
||||
Installation and Compilation Guide: <https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries>.
|
||||
First, we need to obtain the Mooncake project. Refer to the following command:
|
||||
|
||||
```shell
|
||||
@@ -75,8 +78,8 @@ export PYTHONHASHSEED=0
|
||||
|
||||
**Note:**
|
||||
|
||||
- Adjust the Python path according to your specific Python installation
|
||||
- Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH`
|
||||
* Adjust the Python path according to your specific Python installation
|
||||
* Ensure `/usr/local/lib` and `/usr/local/lib64` are in your `LD_LIBRARY_PATH`
|
||||
|
||||
```shell
|
||||
export LD_LIBRARY_PATH=/usr/local/lib64/python3.11/site-packages/mooncake:$LD_LIBRARY_PATH
|
||||
@@ -88,7 +91,7 @@ export PYTHONHASHSEED=0
|
||||
|
||||
The environment variable **MOONCAKE_CONFIG_PATH** is configured to the full path where mooncake.json is located.
|
||||
|
||||
```
|
||||
```shell
|
||||
{
|
||||
"metadata_server": "P2PHANDSHAKE",
|
||||
"protocol": "ascend",
|
||||
@@ -108,7 +111,7 @@ The environment variable **MOONCAKE_CONFIG_PATH** is configured to the full path
|
||||
|
||||
Under the mooncake folder:
|
||||
|
||||
```
|
||||
```shell
|
||||
mooncake_master --port 50088 --eviction_high_watermark_ratio 0.9 --eviction_ratio 0.1
|
||||
```
|
||||
|
||||
@@ -122,13 +125,13 @@ Using `MultiConnector` to simultaneously utilize both `MooncakeConnectorV1` and
|
||||
|
||||
`prefill` Node:
|
||||
|
||||
```
|
||||
```shell
|
||||
bash multi_producer.sh
|
||||
```
|
||||
|
||||
The content of the multi_producer.sh script:
|
||||
|
||||
```
|
||||
```shell
|
||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:$LD_LIBRARY_PATH
|
||||
export PYTHONHASHSEED=0
|
||||
export PYTHONPATH=$PYTHONPATH:/xxxxx/vllm
|
||||
@@ -195,13 +198,13 @@ python3 -m vllm.entrypoints.openai.api_server \
|
||||
|
||||
`decode` Node:
|
||||
|
||||
```
|
||||
```shell
|
||||
bash multi_consumer.sh
|
||||
```
|
||||
|
||||
The content of multi_consumer.sh:
|
||||
|
||||
```
|
||||
```shell
|
||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:$LD_LIBRARY_PATH
|
||||
export PYTHONPATH=$PYTHONPATH:/xxxxx/vllm
|
||||
export PYTHONHASHSEED=0
|
||||
@@ -259,7 +262,7 @@ python3 -m vllm.entrypoints.openai.api_server \
|
||||
|
||||
Currently, the key-value pool in PD Disaggregate only stores the kv cache generated by the Prefill node by default. In models using MLA, it is now supported that the Decode node stores the kv cache for use by the Prefill node, enabled by adding `consumer_is_to_put: true` to the AscendStoreConnector. If the Prefill node enables PP, `prefill_pp_size` or `prefill_pp_layer_partition` also needs to be set. Example as follows:
|
||||
|
||||
```
|
||||
```python
|
||||
{
|
||||
"kv_connector": "AscendStoreConnector",
|
||||
"kv_role": "kv_consumer",
|
||||
@@ -273,9 +276,9 @@ Currently, the key-value pool in PD Disaggregate only stores the kv cache genera
|
||||
}
|
||||
```
|
||||
|
||||
#### 2、Start proxy_server.
|
||||
#### 2、Start proxy_server
|
||||
|
||||
```
|
||||
```shell
|
||||
python vllm-ascend/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py \
|
||||
--host localhost\
|
||||
--prefiller-hosts localhost \
|
||||
@@ -292,13 +295,13 @@ Configure the localhost, port, and model weight path in the command to your own
|
||||
|
||||
Short question:
|
||||
|
||||
```
|
||||
```shell
|
||||
curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "/xxxxx/Qwen2.5-7B-Instruct", "prompt": "Hello. I have a question. The president of the United States is", "max_tokens": 200, "temperature":0.0 }'
|
||||
```
|
||||
|
||||
Long question:
|
||||
|
||||
```
|
||||
```shell
|
||||
curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "/xxxxx/Qwen2.5-7B-Instruct", "prompt": "Given the accelerating impacts of climate change—including rising sea levels, increasing frequency of extreme weather events, loss of biodiversity, and adverse effects on agriculture and human health—there is an urgent need for a robust, globally coordinated response. However, international efforts are complicated by a range of factors: economic disparities between high-income and low-income countries, differing levels of industrialization, varying access to clean energy technologies, and divergent political systems that influence climate policy implementation. In this context, how can global agreements like the Paris Accord be redesigned or strengthened to not only encourage but effectively enforce emission reduction targets? Furthermore, what mechanisms can be introduced to promote fair and transparent technology transfer, provide adequate financial support for climate adaptation in vulnerable regions, and hold nations accountable without exacerbating existing geopolitical tensions or disproportionately burdening those with historically lower emissions?", "max_tokens": 256, "temperature":0.0 }'
|
||||
```
|
||||
|
||||
@@ -306,13 +309,13 @@ curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json"
|
||||
|
||||
#### 1.Run Mixed Department Script
|
||||
|
||||
```
|
||||
```shell
|
||||
bash mixed_department.sh
|
||||
```
|
||||
|
||||
Content of mixed_department.sh:
|
||||
|
||||
```
|
||||
```shell
|
||||
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/python/site-packages:$LD_LIBRARY_PATH
|
||||
export PYTHONPATH=$PYTHONPATH:/xxxxx/vllm
|
||||
export MOONCAKE_CONFIG_PATH="/xxxxxx/mooncake.json"
|
||||
@@ -351,12 +354,12 @@ Configure the localhost, port, and model weight path in the command to your own
|
||||
|
||||
Short question:
|
||||
|
||||
```
|
||||
```shell
|
||||
curl -s http://localhost:8100/v1/completions -H "Content-Type: application/json" -d '{ "model": "/xxxxx/Qwen2.5-7B-Instruct", "prompt": "Hello. I have a question. The president of the United States is", "max_tokens": 200, "temperature":0.0 }'
|
||||
```
|
||||
|
||||
Long question:
|
||||
|
||||
```
|
||||
```shell
|
||||
curl -s http://localhost:8100/v1/completions -H "Content-Type: application/json" -d '{ "model": "/xxxxx/Qwen2.5-7B-Instruct", "prompt": "Given the accelerating impacts of climate change—including rising sea levels, increasing frequency of extreme weather events, loss of biodiversity, and adverse effects on agriculture and human health—there is an urgent need for a robust, globally coordinated response. However, international efforts are complicated by a range of factors: economic disparities between high-income and low-income countries, differing levels of industrialization, varying access to clean energy technologies, and divergent political systems that influence climate policy implementation. In this context, how can global agreements like the Paris Accord be redesigned or strengthened to not only encourage but effectively enforce emission reduction targets? Furthermore, what mechanisms can be introduced to promote fair and transparent technology transfer, provide adequate financial support for climate adaptation in vulnerable regions, and hold nations accountable without exacerbating existing geopolitical tensions or disproportionately burdening those with historically lower emissions?", "max_tokens": 256, "temperature":0.0 }'
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user