[Worker] Implement update max_model_len interface for NPUWorker (#6193)
### What this PR does / why we need it?
This patch purpose to add the `update_max_model_len` interface.
- vLLM version: v0.14.0
- vLLM main:
d68209402d
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
This commit is contained in:
1
.github/workflows/_e2e_test.yaml
vendored
1
.github/workflows/_e2e_test.yaml
vendored
@@ -92,6 +92,7 @@ jobs:
|
||||
# We found that if running aclgraph tests in batch, it will cause AclmdlRICaptureBegin error. So we run
|
||||
# the test separately.
|
||||
# basic
|
||||
pytest -sv --durations=0 tests/e2e/singlecard/test_auto_fit_max_mode_len.py
|
||||
pytest -sv --durations=0 tests/e2e/singlecard/test_aclgraph_accuracy.py
|
||||
pytest -sv --durations=0 tests/e2e/singlecard/test_aclgraph_mem.py
|
||||
pytest -sv --durations=0 tests/e2e/singlecard/test_async_scheduling.py
|
||||
|
||||
Reference in New Issue
Block a user