[Feature] Add PD separation feature (#432)

### What this PR does / why we need it?
Adapt Disaggregated Prefill feature onto Ascend device

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

The test usage has been provided alongwith the PR, in
examples/offline_disaggregated_prefill_npu.py
To run it, do this
```
export PROMPT_DEVICE_ID=0,1
export DECODE_DEVICE_ID=2,3
python examples/offline_disaggregated_prefill_npu.py
```

---------

Signed-off-by: ZihuiQian <qianzihui@huawei.com>
Co-authored-by: ZihuiQian <qianzihui@huawei.com>
This commit is contained in:
eeethenQ
2025-04-15 15:11:35 +08:00
committed by GitHub
parent c7f6584d75
commit 44a8301424
8 changed files with 634 additions and 8 deletions

View File

@@ -18,6 +18,17 @@ env_variables: Dict[str, Callable[[], Any]] = {
lambda: os.getenv("ASCEND_HOME_PATH", None),
"LD_LIBRARY_PATH":
lambda: os.getenv("LD_LIBRARY_PATH", None),
# Used for disaggregated prefilling
"HCCN_PATH":
lambda: os.getenv("HCCN_PATH", "/usr/local/Ascend/driver/tools/hccn_tool"),
"PROMPT_DEVICE_ID":
lambda: os.getenv("PROMPT_DEVICE_ID", None),
"DECODE_DEVICE_ID":
lambda: os.getenv("DECODE_DEVICE_ID", None),
"LLMDATADIST_COMM_PORT":
lambda: os.getenv("LLMDATADIST_COMM_PORT", "26000"),
"LLMDATADIST_SYNC_CACHE_WAIT_TIME":
lambda: os.getenv("LLMDATADIST_SYNC_CACHE_WAIT_TIME", "5000")
}