xc-llm-ascend

Files

Canlin Guo e68464a1d6 [Bugfix] Fix slow hasattr in ACLGraphWrapper.__getattr__ (#7442 )

### What this PR does / why we need it?

Follow https://github.com/vllm-project/vllm/pull/37425,
https://github.com/vllm-project/vllm-omni/pull/1982

Copied from them:

Notice that `hasattr(self.model, "flush_pending_metadata")` cost 6ms per
decode step when profiling Qwen3 Omni.

The original `CUDAGraphWrapper.__getattr__` raises:
```python
  raise AttributeError(f"... cudagraph wrapper: {self.runnable}")
  ```
When hasattr() is called for a non-existent attribute, Python internally
calls __getattr__ which constructs this AttributeError. The
{self.runnable} triggers `__repr__()` on the underlying model (e.g.,
`Qwen3OmniMoeForConditionalGeneration`), which recursivelytraverses the
entire nn.Module tree to generate an 18,000+ character string. This
takes ~6-7ms per call.
Since `hasattr(self.model, "flush_pending_metadata") ` is called every
decode step in the Talker forward path, this adds ~6ms overhead per
step, severely impacting audio inter-chunk latency (ICL).

```Python
hasattr(self.model, "flush_pending_metadata")
  → getattr(self.model, "flush_pending_metadata")
    → not found in CUDAGraphWrapper.__dict__
    → not found in the CUDAGraphWrapper class hierarchy
    → triggers CUDAGraphWrapper.__getattr__("flush_pending_metadata")
      → hasattr(self.runnable, "flush_pending_metadata")  # runnable also doesn't have it
      → executes raise AttributeError(f"... {self.runnable}")
        → Python needs to construct the exception object
        → the f-string triggers self.runnable.__repr__()
        → Qwen3OmniMoeForConditionalGeneration.__repr__()
          → recursively traverses the entire nn.Module tree
          → generates a 18,000+ character string
          → takes ~6 ms
        → AttributeError object is created
    → hasattr catches the AttributeError and returns False
    → the 18,000-character string is immediately discarded (no one ever sees it)
```

### Does this PR introduce _any_ user-facing change?

NO.

### How was this patch tested?

See https://github.com/vllm-project/vllm-omni/pull/1982


- vLLM version: v0.17.0
- vLLM main:
4497431df6

---------

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

2026-03-23 09:26:24 +08:00

passes

[main2main] upgrade vllm to 0308 (#7213 )

2026-03-18 09:24:43 +08:00

__init__.py

[Bugfix] add compilation/__init__.py to fix import error (#1152 )

2025-06-10 17:14:25 +08:00

acl_graph.py

[Bugfix] Fix slow hasattr in ACLGraphWrapper.__getattr__ (#7442 )

2026-03-23 09:26:24 +08:00

compiler_interface.py

upgrade to 0.18.0 (#7502 )

2026-03-21 16:05:38 +08:00

graph_fusion_pass_manager.py

[Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518 )

2026-03-02 17:54:25 +08:00