### What this PR does / why we need it?
Replaces the hardcoded `mc2_tokens_capacity` with the max graph capture
size for a more accurate allocation.
This change ensures the capacity is correctly sized relative to the
graph capture configuration, removing a magic number and making the
setup more robust.
This PR fixes two issues:
1. <del>MC2 op restrictions differ between SoCs.</del> @Angazenn This
requires an overhaul, hence removed from this PR, please commit another
PR.
2. The hardcoded value `512` allocates too much buffer for large models.
### Does this PR introduce _any_ user-facing change?
None.
### How was this patch tested?
Tested in daily checks.
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>