xc-llm-ascend

Files

hucong e38fab011d [Doc][PD] Restore the default configuration items in examples/disaggregate_prefill_v1/README.md (#2165 )

### What this PR does / why we need it?
- In the D node, the max-num-batched-tokens parameter can be set to a
smaller value since the D node processes at most max-num-seqs batches
concurrently. As the profile_run only needs to handle max-num-seqs
sequences at a time, we can safely set max-num-batched-tokens equal to
max-num-seqs. This optimization will help reduce activation memory
consumption.
- Restore the default configuration items for PD separation.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.0
- vLLM main:
61dcc280fa

Signed-off-by: underfituu <hzhucong@163.com>

2025-08-04 20:30:53 +08:00

disaggregated_prefill_v1

[Doc][PD] Restore the default configuration items in examples/disaggregate_prefill_v1/README.md (#2165 )

2025-08-04 20:30:53 +08:00

eplb

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_data_parallel.py

[Misc] Add extra checking to torchair_graph_config. (#1939 )

2025-08-01 09:24:11 +08:00

offline_disaggregated_prefill_npu.py

[BugFix] update the kv transfer config (#2121 )