diff --git a/docs/source/installation.md b/docs/source/installation.md index e3bbfdf..b06777e 100644 --- a/docs/source/installation.md +++ b/docs/source/installation.md @@ -171,7 +171,7 @@ vllm-ascend will build custom ops by default. If you don't want to build it, set ```{note} If you are building from v0.7.3-dev and intend to use sleep mode feature, you should set `COMPILE_CUSTOM_KERNELS=1` manually. -To build custom ops, gcc/g++ higher than 8 and c++ 17 or higher is required. If you're using `pip install -e .` and encourage a torch-npu version conflict, please install with `pip install --no-build-isolation -e .` to build on system env. +To build custom ops, gcc/g++ higher than 8 and c++ 17 or higher is required. If you're using `pip install -e .` and encounter a torch-npu version conflict, please install with `pip install --no-build-isolation -e .` to build on system env. If you encounter other problems during compiling, it is probably because unexpected compiler is being used, you may export `CXX_COMPILER` and `C_COMPILER` in env to specify your g++ and gcc locations before compiling. ``` diff --git a/docs/source/user_guide/feature_guide/lora.md b/docs/source/user_guide/feature_guide/lora.md index dfa75cc..59afd08 100644 --- a/docs/source/user_guide/feature_guide/lora.md +++ b/docs/source/user_guide/feature_guide/lora.md @@ -1,8 +1,23 @@ # LoRA Adapters Guide +## Overview Like vLLM, vllm-ascend supports LoRA as well. The usage and more details can be found in [vLLM official document](https://docs.vllm.ai/en/latest/features/lora.html). -You can also refer to [this](https://docs.vllm.ai/en/latest/models/supported_models.html#list-of-text-only-language-models) to find which models support LoRA in vLLM. +You can refer to [Supported Models](https://docs.vllm.ai/en/latest/models/supported_models.html#list-of-text-only-language-models) to find which models support LoRA in vLLM. -## Tips -If you fail to run vllm-ascend with LoRA, you may follow [this instruction](https://vllm-ascend.readthedocs.io/en/latest/user_guide/feature_guide/graph_mode.html#fallback-to-eager-mode) to disable graph mode and try again. +You can run LoRA with ACLGraph mode now. Please refer to [Graph Mode Guide](./graph_mode.md) for a better LoRA performance. + +## Example +We show a simple LoRA example here, which enables the ACLGraph mode as default. + +```shell +vllm serve meta-llama/Llama-2-7b \ + --enable-lora \ + --lora-modules '{"name": "sql-lora", "path": "/path/to/lora", "base_model_name": "meta-llama/Llama-2-7b"}' +``` + +## Custom LoRA Operators + +We have implemented LoRA-related AscendC operators, such as bgmv_shrink, bgmv_expand, sgmv_shrink and sgmv_expand. You can find them under the "csrc/kernels" directory of [vllm-ascend repo](https://github.com/vllm-project/vllm-ascend.git). + +When you install vllm and vllm-ascend, those operators mentioned above will be compiled and installed automatically. If you don't want to use AscendC operators when you run vllm-ascend, you should set `COMPILE_CUSTOM_KERNELS=0` and reinstall vllm-ascend. To require more instructions about installation and compilation, you can refer to [installation guide](../../installation.md).