[Doc] Update doc to work with release (#85)
1. Update CANN image name 2. Add pta install step 3. update vllm-ascend docker image name to ghcr 4. update quick_start to use vllm-ascend image directly. 5. fix `note` style Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
@@ -2,19 +2,33 @@
|
||||
|
||||
## v0.7.1.rc1
|
||||
|
||||
We are excited to announce the release candidate of v0.7.1 for vllm-ascend. vllm-ascend is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
|
||||
🎉 Hello, World!
|
||||
|
||||
Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
|
||||
We are excited to announce the first release candidate of v0.7.1 for vllm-ascend.
|
||||
|
||||
vLLM Ascend Plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU. With this release, users can now enjoy the latest features and improvements of vLLM on the Ascend NPU.
|
||||
|
||||
Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.1rc1) to start the journey. Note that this is a release candidate, and there may be some bugs or issues. We appreciate your feedback and suggestions [here](https://github.com/vllm-project/vllm-ascend/issues/19)
|
||||
|
||||
### Highlights
|
||||
|
||||
- The first release which official supports the Ascend NPU on vLLM originally. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/latest/) to start the journey.
|
||||
- Initial supports for Ascend NPU on vLLM. [#3](https://github.com/vllm-project/vllm-ascend/pull/3)
|
||||
- DeepSeek is now supported. [#88](https://github.com/vllm-project/vllm-ascend/pull/88) [#68](https://github.com/vllm-project/vllm-ascend/pull/68)
|
||||
- Qwen, Llama series and other popular models are also supported, you can see more details in [here](https://vllm-ascend.readthedocs.io/en/latest/user_guide/supported_models.html).
|
||||
|
||||
### Other changes
|
||||
### Core
|
||||
|
||||
- Added the Ascend quantization config option, the implementation will comming soon.
|
||||
- Added the Ascend quantization config option, the implementation will comming soon. [#7](https://github.com/vllm-project/vllm-ascend/pull/7) [#73](https://github.com/vllm-project/vllm-ascend/pull/73)
|
||||
- Add silu_and_mul and rope ops and add mix ops into attention layer. [#18](https://github.com/vllm-project/vllm-ascend/pull/18)
|
||||
|
||||
### Other
|
||||
|
||||
- [CI] Enable Ascend CI to actively monitor and improve quality for vLLM on Ascend. [#3](https://github.com/vllm-project/vllm-ascend/pull/3)
|
||||
- [Docker] Add vllm-ascend container image [#64](https://github.com/vllm-project/vllm-ascend/pull/64)
|
||||
- [Docs] Add a [live doc](https://vllm-ascend.readthedocs.org) [#55](https://github.com/vllm-project/vllm-ascend/pull/55)
|
||||
|
||||
### Known issues
|
||||
|
||||
- This release relies on an unreleased torch_npu version. Please [install](https://vllm-ascend.readthedocs.io/en/latest/installation.html) it manually.
|
||||
- This release relies on an unreleased torch_npu version. It has been installed within official container image already. Please [install](https://vllm-ascend.readthedocs.io/en/v0.7.1.rc1/installation.html) it manually if you are using non-container environment.
|
||||
- There are logs like `No platform deteced, vLLM is running on UnspecifiedPlatform` or `Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")` shown when runing vllm-ascend. It actually doesn't affect any functionality and performance. You can just ignore it. And it has been fixed in this [PR](https://github.com/vllm-project/vllm/pull/12432) which will be included in v0.7.3 soon.
|
||||
- There are logs like `# CPU blocks: 35064, # CPU blocks: 2730` shown when runing vllm-ascend which should be `# NPU blocks:` . It actually doesn't affect any functionality and performance. You can just ignore it. And it has been fixed in this [PR](https://github.com/vllm-project/vllm/pull/13378) which will be included in v0.7.3 soon.
|
||||
|
||||
@@ -5,6 +5,8 @@
|
||||
| Qwen 2.5 | ✅ ||
|
||||
| Mistral | | Need test |
|
||||
| DeepSeek v2.5 | |Need test |
|
||||
| DeepSeek v3 | ✅|||
|
||||
| DeepSeek Distill (Qwen/llama) |✅||
|
||||
| LLama3.1/3.2 | ✅ ||
|
||||
| Gemma-2 | |Need test|
|
||||
| baichuan | |Need test|
|
||||
|
||||
Reference in New Issue
Block a user