xc-llm-ascend/examples at 9b30d4e774394fb62895d374e45f125fa410bd98 - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

NJX 9b30d4e774 [Doc][Misc] Add metrics usage documentation and example (#6962 )

## What this PR does / why we need it?

This PR addresses issue #5027 where users find that `output.metrics`
returns `None` when using the vLLM offline inference API.

**Root Cause**: vLLM disables log stats by default
(`disable_log_stats=True`), which causes `output.metrics` to be `None`.

**Changes**:
1. Added a NOTE comment in `examples/offline_inference_npu.py`
explaining how to enable metrics
2. Created a new example `examples/offline_inference_metrics.py`
demonstrating how to access request-level metrics (`first_token_time`,
`finished_time`, etc.) by setting `disable_log_stats=False`

## Does this PR introduce _any_ user-facing change?

Yes - adds documentation and example code to help users understand how
to access output metrics.

## How was this patch tested?

- Documentation/example change only
- Verified example code follows the same patterns as existing examples

Closes #5027
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: NJX-njx <3771829673@qq.com>

2026-03-10 10:09:50 +08:00

..

[MM][Doc] Update online serving tutorials for Qwen2-Audio (#3606 )

2025-10-27 16:58:03 +08:00

disaggregated_encoder

[Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder (#5301 )

2026-02-06 17:30:17 +08:00

disaggregated_prefill_v1

[CI]Fixed the spell check function in typos.toml (#6753 )

2026-02-14 11:57:26 +08:00

[CI]Fixed the spell check function in typos.toml (#6753 )

2026-02-14 11:57:26 +08:00

external_online_dp

[CI]Fixed the spell check function in typos.toml (#6753 )

2026-02-14 11:57:26 +08:00

quantization/llm-compressor

[Quantization][Feature] Support compressed tensors moe w4a8 dynamic weight (#5889 )

2026-02-02 16:39:32 +08:00

offline_data_parallel.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

offline_disaggregated_prefill_npu.py

[Refactor]Refactor of vllm_ascend/distributed module (#5719 )

2026-01-15 08:57:40 +08:00

offline_embed.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

offline_external_launcher.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

offline_inference_audio_language.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

offline_inference_metrics.py

[Doc][Misc] Add metrics usage documentation and example (#6962 )

2026-03-10 10:09:50 +08:00

offline_inference_npu_long_seq.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

offline_inference_npu_tp2.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

offline_inference_npu.py

[Doc][Misc] Add metrics usage documentation and example (#6962 )

2026-03-10 10:09:50 +08:00

offline_inference_sleep_mode_npu.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

offline_weight_load.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

prompt_embed_inference.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

prompt_embedding_inference.py

[Lint]Style: Convert example to ruff format (#5863 )

2026-01-13 20:46:50 +08:00

run_dp_server.sh

Drop torchair (#4814 )

2025-12-10 09:20:40 +08:00

save_sharded_state_310.py

[Feat][310p] 310P support w8a8s quantization and saving w8a8sc state (#6878 )

2026-03-02 20:09:15 +08:00