Run vllm-ascend on Single NPU
What this PR does / why we need it?
Add vllm-ascend tutorial doc for Qwen/Qwen2.5-VL-7B-Instruct model
Inference/Serving doc
Does this PR introduce any user-facing change?
no
How was this patch tested?
no
Signed-off-by: xiemingda <xiemingda1002@gmail.com>
### What this PR does / why we need it?
Fix `ValueError: Unrecognized distributed executor backend tp. Supported
values are 'ray', 'mp' 'uni', 'external_launcher' or custom ExecutorBase
subclass.`
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Test on my local node
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Re-arch on tutorials, move singe npu / multi npu / multi node to index.
- Unifiy docker run cmd
- Use dropdown to hide build from source installation doc
- Re-arch tutorials to include Qwen/QwQ/DeepSeek
- Make QwQ doc works
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI test
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Bump torch_npu version to dev20250308.3 to fix performance regression on
multi-stream case:
e04c580d07
.
### Does this PR introduce _any_ user-facing change?
NO
### How was this patch tested?
CI passed
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Update torch-npu version to fix torch npu exponential_ accuracy
With this update, the percision issue when setting `temperature > 0` is
fixed.
---------
Signed-off-by: Mengqing Cao <cmq0113@163.com>
### What this PR does / why we need it?
Add initial FAQs
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Preview
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
This PR added pooling support for vllm-ascend
Tested with `bge-base-en-v1.5` by encode:
```
from vllm import LLM
# Sample prompts.
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create an LLM.
model = LLM(model="./bge-base-en-v1.5", enforce_eager=True)
# Generate embedding. The output is a list of EmbeddingRequestOutputs.
outputs = model.encode(prompts)
# Print the outputs.
for output in outputs:
print(output.outputs.embedding) # list of 4096 floats
```
Tested by embedding:
```
from vllm import LLM, SamplingParams
llm = LLM(model="./bge-base-en-v1.5", task="embed")
(output,) = llm.embed("Hello, my name is")
embeds = output.outputs.embedding
print(f"Embeddings: {embeds!r} (size={len(embeds)})")
```
Related: https://github.com/vllm-project/vllm-ascend/issues/200
## Known issue
The accuracy is not correct since this feature rely on `enc-dec`
support. It'll be done in the following PR by @MengqingCao
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Update Feature Support doc.
### Does this PR introduce _any_ user-facing change?
no.
### How was this patch tested?
no.
---------
Signed-off-by: Shanshan Shen <467638484@qq.com>
### What this PR does / why we need it?
Recover vllm-ascend dev image
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
1. Fix cuda hard code in model runner.
2. Fix tutorials doc rendering error.
### Does this PR introduce _any_ user-facing change?
no.
### How was this patch tested?
no.
Signed-off-by: Shanshan Shen <467638484@qq.com>
### What this PR does / why we need it?
Fix vllm and vllm-ascend version
| branch/tag | vllm_version |
vllm_ascend_version|pip_vllm_ascend_version|pip_vllm_version|
|----|----|----|----|----|
| main | main | main | v0.7.1rc1 | v0.7.1 |
| v0.7.1-dev | v0.7.1 | v0.7.1rc1 | v0.7.1rc1 | v0.7.1 |
| v0.7.1rc1 | v0.7.1 | v0.7.1rc1 | v0.7.1rc1 | v0.7.1 |
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Refeactor installation doc
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI, preview
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Update tutorials.
### Does this PR introduce _any_ user-facing change?
no.
### How was this patch tested?
no.
---------
Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
### What this PR does / why we need it?
1. Add vllm-ascend tutorial doc for Qwen/Qwen2.5-7B-Instruct model
serving doc
2. fix format of files in `docs` dir, e.g. format tables, add underline
for links, add line feed...
### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
no.
### How was this patch tested?
doc CI passed
---------
Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
Check and update the feature support table.
- both multi-step and speculative decoding require adaptation of corresponding workers
- prompt adapter (finetune method) require adaption in worker.py and model_runner.py
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
This patch enables the doc build for vllm-ascend
- Add sphinx build for vllm-ascend
- Enable readthedocs for vllm-ascend
- Fix CI:
- exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree
to share your contact information to access this model` which introduce
in
314cfade02
- Install test req to fix
https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770:
```
vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module>
import pytest_asyncio
E ModuleNotFoundError: No module named 'pytest_asyncio'
```
- exclude docs PR
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
1. test locally:
```bash
# Install dependencies.
pip install -r requirements-docs.txt
# Build the docs and preview
make clean; make html; python -m http.server -d build/html/
```
Launch browser and open http://localhost:8000/.
2. CI passed with preview:
https://vllm-ascend--55.org.readthedocs.build/en/55/
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>