xc-llm-ascend/examples at 216fc0e8e444a05a765116c940d15520da82378c - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

Song Zhixin 216fc0e8e4 [feature] Prompt Embeddings Support for v1 Engine (#3026 )

### What this PR does / why we need it?
this PR based on
[19746](https://github.com/vllm-project/vllm/issues/19746), support
Prompt Embeddings for v1 engine on NPU

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

```python
python examples/prompt_embed_inference.py
```


- vLLM version: v0.11.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: jesse <szxfml@gmail.com>

2025-10-30 17:15:57 +08:00

..

[MM][Doc] Update online serving tutorials for Qwen2-Audio (#3606 )

2025-10-27 16:58:03 +08:00

disaggregated_prefill_v1

[Build] Force torch version (#3791 )

2025-10-30 15:53:15 +08:00

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

external_online_dp

[Feature] Support moe multi-stream for aclgraph. (#2946 )

2025-09-19 11:06:45 +08:00

offline_data_parallel.py

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

offline_disaggregated_prefill_npu.py

Fix VLLM_ASCEND_LLMDD_RPC_PORT renaming (#3108 )

2025-09-23 10:33:04 +08:00

offline_embed.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_external_launcher.py

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

offline_inference_audio_language.py

Fix some ci issue and refactor modelrunner (#2445 )

2025-08-20 09:01:04 +08:00

offline_inference_npu_long_seq.py

support cp&dcp (#3260 )

2025-10-24 10:32:01 +08:00

offline_inference_npu_tp2.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_inference_npu.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_inference_sleep_mode_npu.py

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

offline_weight_load.py

Upgrade to new vllm commit (#3719 )

2025-10-25 15:36:32 +08:00

prompt_embed_inference.py

[feature] Prompt Embeddings Support for v1 Engine (#3026 )

2025-10-30 17:15:57 +08:00

prompt_embedding_inference.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

run_dp_server.sh

[Feature] Support moe multi-stream for aclgraph. (#2946 )

2025-09-19 11:06:45 +08:00