[Doc] Add sphinx build for vllm-ascend (#55)

### What this PR does / why we need it?

This patch enables the doc build for vllm-ascend

- Add sphinx build for vllm-ascend
- Enable readthedocs for vllm-ascend
- Fix CI:
- exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree
to share your contact information to access this model` which introduce
in
314cfade02
- Install test req to fix
https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770:
      ```
      vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module>
          import pytest_asyncio
      E   ModuleNotFoundError: No module named 'pytest_asyncio'
      ```
  - exclude docs PR

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. test locally:
    ```bash
    # Install dependencies.
    pip install -r requirements-docs.txt
    
    # Build the docs and preview
    make clean; make html; python -m http.server -d build/html/
    ```
    
    Launch browser and open http://localhost:8000/.

2. CI passed with preview:
    https://vllm-ascend--55.org.readthedocs.build/en/55/

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
This commit is contained in:
Yikun Jiang
2025-02-13 18:44:17 +08:00
committed by GitHub
parent 63b11ec7e9
commit 46977f9f06
22 changed files with 255 additions and 36 deletions

105
docs/source/conf.py Normal file
View File

@@ -0,0 +1,105 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# This file is a part of the vllm-ascend project.
# Adapted from vllm-project/vllm/docs/source/conf.py
# Copyright 2023 The vLLM team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# -- Project information -----------------------------------------------------
project = 'vllm-ascend'
copyright = '2025, vllm-ascend team'
author = 'the vllm-ascend team'
# The full version, including alpha/beta/rc tags
release = ''
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
# Copy from https://github.com/vllm-project/vllm/blob/main/docs/source/conf.py
extensions = [
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinx_copybutton",
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"myst_parser",
"sphinxarg.ext",
"sphinx_design",
"sphinx_togglebutton",
]
myst_enable_extensions = [
"colon_fence",
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = 'en'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = [
'_build',
'Thumbs.db',
'.DS_Store',
'.venv',
'README.md',
# TODO(yikun): Remove this after zh supported
'developer_guide/contributing.zh.md'
]
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_title = project
html_theme = 'sphinx_book_theme'
html_logo = 'logos/vllm-ascend-logo-text-light.png'
html_theme_options = {
'path_to_docs': 'docs/source',
'repository_url': 'https://github.com/vllm-project/vllm-ascend',
'use_repository_button': True,
'use_edit_page_button': True,
}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = ['_static']
def setup(app):
pass

View File

@@ -0,0 +1,107 @@
# Contributing
## Building and testing
It's recommended to set up a local development environment to build and test
before you submit a PR.
### Prepare environment and build
Theoretically, the vllm-ascend build is only supported on Linux because
`vllm-ascend` dependency `torch_npu` only supports Linux.
But you can still set up dev env on Linux/Windows/macOS for linting and basic
test as following commands:
```bash
# Choose a base dir (~/vllm-project/) and set up venv
cd ~/vllm-project/
python3 -m venv .venv
source ./.venv/bin/activate
# Clone vllm code and install
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -r requirements-build.txt
VLLM_TARGET_DEVICE="empty" pip install .
cd ..
# Clone vllm-ascend and install
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -r requirements-dev.txt
# Then you can run lint and mypy test
bash format.sh
# Build:
# - only supported on Linux (torch_npu available)
# pip install -e .
# - build without deps for debugging in other OS
# pip install -e . --no-deps
# Commit changed files using `-s`
git commit -sm "your commit info"
```
### Testing
Although vllm-ascend CI provide integration test on [Ascend](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml), you can run it
locally. The simplest way to run these integration tests locally is through a container:
```bash
# Under Ascend NPU environment
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
IMAGE=vllm-ascend-dev-image
CONTAINER_NAME=vllm-ascend-dev
DEVICE=/dev/davinci1
# The first build will take about 10 mins (10MB/s) to download the base image and packages
docker build -t $IMAGE -f ./Dockerfile .
# You can also specify the mirror repo via setting VLLM_REPO to speedup
# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
docker run --name $CONTAINER_NAME --network host --device $DEVICE \
--device /dev/davinci_manager --device /dev/devmm_svm \
--device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-ti --rm $IMAGE bash
cd vllm-ascend
pip install -r requirements-dev.txt
pytest tests/
```
## DCO and Signed-off-by
When contributing changes to this project, you must agree to the DCO. Commits must include a `Signed-off-by:` header which certifies agreement with the terms of the DCO.
Using `-s` with `git commit` will automatically add this header.
## PR Title and Classification
Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
- `[Attention]` for new features or optimization in attention.
- `[Communicator]` for new features or optimization in communicators.
- `[ModelRunner]` for new features or optimization in model runner.
- `[Platform]` for new features or optimization in platform.
- `[Worker]` for new features or optimization in worker.
- `[Core]` for new features or optimization in the core vllm-ascend logic (such as platform, attention, communicators, model runner)
- `[Kernel]` changes affecting compute kernels and ops.
- `[Bugfix]` for bug fixes.
- `[Doc]` for documentation fixes and improvements.
- `[Test]` for tests (such as unit tests).
- `[CI]` for build or continuous integration improvements.
- `[Misc]` for PRs that do not fit the above categories. Please use this sparingly.
> [!NOTE]
> If the PR spans more than one category, please include all relevant prefixes.
## Others
You may find more information about contributing to vLLM Ascend backend plugin on [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html).
If you find any problem when contributing, you can feel free to submit a PR to improve the doc to help other developers.

View File

@@ -0,0 +1,102 @@
# 贡献指南
## 构建与测试
我们推荐您在提交PR之前在本地开发环境进行构建和测试。
### 环境准备与构建
理论上vllm-ascend 构建仅支持 Linux因为`vllm-ascend` 依赖项 `torch_npu` 仅支持 Linux。
但是您仍然可以在 Linux/Windows/macOS 上配置开发环境进行代码检查和基本测试,如下命令所示:
```bash
# 选择基础文件夹 (~/vllm-project/) 创建python虚拟环境
cd ~/vllm-project/
python3 -m venv .venv
source ./.venv/bin/activate
# 克隆并安装vllm
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -r requirements-build.txt
VLLM_TARGET_DEVICE="empty" pip install .
cd ..
# 克隆并安装vllm-ascend
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -r requirements-dev.txt
# 通过执行以下脚本以运行 lint 及 mypy 测试
bash format.sh
# 构建:
# - 目前仅支持在Linux上进行完整构建torch_npu 限制)
# pip install -e .
# - 在其他操作系统上构建安装,需要跳过依赖
# - build without deps for debugging in other OS
# pip install -e . --no-deps
# 使用 `-s` 提交更改
git commit -sm "your commit info"
```
### 测试
虽然 vllm-ascend CI 提供了对 [Ascend](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml) 的集成测试,但您也可以在本地运行它。在本地运行这些集成测试的最简单方法是通过容器:
```bash
# 基于昇腾NPU环境
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
IMAGE=vllm-ascend-dev-image
CONTAINER_NAME=vllm-ascend-dev
DEVICE=/dev/davinci1
# 首次构建会花费10分钟10MB/s下载基础镜像和包
docker build -t $IMAGE -f ./Dockerfile .
# 您还可以通过设置 VLLM_REPO 来指定镜像仓库以加速
# docker build -t $IMAGE -f ./Dockerfile . --build-arg VLLM_REPO=https://gitee.com/mirrors/vllm
docker run --name $CONTAINER_NAME --network host --device $DEVICE \
--device /dev/davinci_manager --device /dev/devmm_svm \
--device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-ti --rm $IMAGE bash
cd vllm-ascend
pip install -r requirements-dev.txt
pytest tests/
```
## 开发者来源证书(DCO)
在向本项目提交贡献时,您必须同意 DCO。提交必须包含“Signed-off-by:”标头,以证明同意 DCO 的条款。
`git commit`时使用`-s`将会自动添加该标头。
## PR 标题和分类
仅特定类型的 PR 会被审核。PR 标题会以适当的前缀来表明变更类型。请使用以下之一:
- `[Attention]` 关于`attention`的新特性或优化
- `[Communicator]` 关于`communicators`的新特性或优化
- `[ModelRunner]` 关于`model runner`的新特性或优化
- `[Platform]` 关于`platform`的新特性或优化
- `[Worker]` 关于`worker`的新特性或优化
- `[Core]` 关于`vllm-ascend`核心逻辑 (如 `platform, attention, communicators, model runner`)的新特性或优化
- `[Kernel]` 影响计算内核和操作的更改.
- `[Bugfix]` bug修复
- `[Doc]` 文档的修复与更新
- `[Test]` 测试 (如:单元测试)
- `[CI]` 构建或持续集成改进
- `[Misc]` 适用于更改内容对于上述类别均不适用的PR请谨慎使用该前缀
> [!注意]
> 如果 PR 涉及多个类别,请添加所有相关前缀
## 其他
您可以在 [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html) 上找到更多有关为 vLLM 昇腾插件贡献的信息。
如果您在贡献过程中发现任何问题,您可以随时提交 PR 来改进文档以帮助其他开发人员。

View File

@@ -0,0 +1,24 @@
# Supported Models
| Model | Supported | Note |
|---------|-----------|------|
| Qwen 2.5 | ✅ ||
| Mistral | | Need test |
| DeepSeek v2.5 | |Need test |
| LLama3.1/3.2 | ✅ ||
| Gemma-2 | |Need test|
| baichuan | |Need test|
| minicpm | |Need test|
| internlm | ✅ ||
| ChatGLM | ✅ ||
| InternVL 2.5 | ✅ ||
| Qwen2-VL | ✅ ||
| GLM-4v | |Need test|
| Molomo | ✅ ||
| LLaVA 1.5 | ✅ ||
| Mllama | |Need test|
| LLaVA-Next | |Need test|
| LLaVA-Next-Video | |Need test|
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
| Ultravox | |Need test|
| Qwen2-Audio | ✅ ||

View File

@@ -0,0 +1,19 @@
# Feature Support
| Feature | Supported | Note |
|---------|-----------|------|
| Chunked Prefill | ✗ | Plan in 2025 Q1 |
| Automatic Prefix Caching | ✅ | Improve performance in 2025 Q1 |
| LoRA | ✗ | Plan in 2025 Q1 |
| Prompt adapter | ✅ ||
| Speculative decoding | ✅ | Improve accuracy in 2025 Q1|
| Pooling | ✗ | Plan in 2025 Q1 |
| Enc-dec | ✗ | Plan in 2025 Q1 |
| Multi Modality | ✅ (LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Add more model support in 2025 Q1 |
| LogProbs | ✅ ||
| Prompt logProbs | ✅ ||
| Async output | ✅ ||
| Multi step scheduler | ✅ ||
| Best of | ✅ ||
| Beam search | ✅ ||
| Guided Decoding | ✗ | Plan in 2025 Q1 |

53
docs/source/index.md Normal file
View File

@@ -0,0 +1,53 @@
# Welcome to vLLM Ascend Plugin
:::{figure} ./logos/vllm-ascend-logo-text-light.png
:align: center
:alt: vLLM
:class: no-scaled-link
:width: 70%
:::
:::{raw} html
<p style="text-align:center">
<strong>vLLM Ascend Plugin
</strong>
</p>
<p style="text-align:center">
<script async defer src="https://buttons.github.io/buttons.js"></script>
<a class="github-button" href="https://github.com/vllm-project/vllm-ascend" data-show-count="true" data-size="large" aria-label="Star">Star</a>
<a class="github-button" href="https://github.com/vllm-project/vllm-ascend/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
<a class="github-button" href="https://github.com/vllm-project/vllm-ascend/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
</p>
:::
vLLM Ascend plugin (vllm-ascend) is a community maintained hardware plugin for running vLLM on the Ascend NPU.
This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162), providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM.
By using vLLM Ascend plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, Multi-modal LLMs can run seamlessly on the Ascend NPU.
## Documentation
% How to start using vLLM on Ascend NPU?
:::{toctree}
:caption: Getting Started
:maxdepth: 1
quick_start
installation
:::
% What does vLLM Ascend Plugin support?
:::{toctree}
:caption: Features
:maxdepth: 1
features/suppoted_features
features/supported_models
:::
% How to contribute to the vLLM project
:::{toctree}
:caption: Developer Guide
:maxdepth: 1
developer_guide/contributing
:::

View File

@@ -0,0 +1,55 @@
# Installation
## Dependencies
| Requirement | Supported version | Recommended version | Note |
| ------------ | ------- | ----------- | ----------- |
| Python | >= 3.9 | [3.10](https://www.python.org/downloads/) | Required for vllm |
| CANN | >= 8.0.RC2 | [8.0.RC3](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.0.beta1) | Required for vllm-ascend and torch-npu |
| torch-npu | >= 2.4.0 | [2.5.1rc1](https://gitee.com/ascend/pytorch/releases/tag/v6.0.0.alpha001-pytorch2.5.1) | Required for vllm-ascend |
| torch | >= 2.4.0 | [2.5.1](https://github.com/pytorch/pytorch/releases/tag/v2.5.1) | Required for torch-npu and vllm required |
## Prepare Ascend NPU environment
Below is a quick note to install recommended version software:
### Containerized installation
You can use the [container image](https://hub.docker.com/r/ascendai/cann) directly with one line command:
```bash
docker run \
--name vllm-ascend-env \
--device /dev/davinci1 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-it quay.io/ascend/cann:8.0.rc3.beta1-910b-ubuntu22.04-py3.10 bash
```
You do not need to install `torch` and `torch_npu` manually, they will be automatically installed as `vllm-ascend` dependencies.
### Manual installation
Or follow the instructions provided in the [Ascend Installation Guide](https://ascend.github.io/docs/sources/ascend/quick_install.html) to set up the environment.
## Building
### Build Python package from source
```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e .
```
### Build container image from source
```bash
git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
docker build -t vllm-ascend-dev-image -f ./Dockerfile .
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 153 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 153 KiB

185
docs/source/quick_start.md Normal file
View File

@@ -0,0 +1,185 @@
# Quickstart
## Prerequisites
### Supported Devices
- Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 Box16, Atlas 300T A2)
- Atlas 800I A2 Inference series (Atlas 800I A2)
<!-- TODO(yikun): replace "Prepare Environment" and "Installation" with "Running with vllm-ascend container image" -->
### Prepare Environment
You can use the container image directly with one line command:
```bash
# Update DEVICE according to your device (/dev/davinci[0-7])
DEVICE=/dev/davinci7
IMAGE=quay.io/ascend/cann:8.0.rc3.beta1-910b-ubuntu22.04-py3.10
docker run \
--name vllm-ascend-env --device $DEVICE \
--device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-it --rm $IMAGE bash
```
You can verify by running below commands in above container shell:
```bash
npu-smi info
```
You will see following message:
```
+-------------------------------------------------------------------------------------------+
| npu-smi 23.0.2 Version: 23.0.2 |
+----------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+======================+===============+====================================================+
| 0 xxx | OK | 0.0 40 0 / 0 |
| 0 | 0000:C1:00.0 | 0 882 / 15169 0 / 32768 |
+======================+===============+====================================================+
```
## Installation
Prepare:
```bash
apt update
apt install git curl vim -y
# Config pypi mirror to speedup
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
```
Create your venv
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
```
You can install vLLM and vllm-ascend plugin by using:
```bash
# Install vLLM main branch (About 5 mins)
git clone --depth 1 https://github.com/vllm-project/vllm.git
cd vllm
VLLM_TARGET_DEVICE=empty pip install .
cd ..
# Install vLLM Ascend Plugin:
git clone --depth 1 https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e .
cd ..
```
## Usage
After vLLM and vLLM Ascend plugin installation, you can start to
try [vLLM QuickStart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html).
You have two ways to start vLLM on Ascend NPU:
### Offline Batched Inference with vLLM
With vLLM installed, you can start generating texts for list of input prompts (i.e. offline batch inferencing).
```bash
# Use Modelscope mirror to speed up download
pip install modelscope
export VLLM_USE_MODELSCOPE=true
```
Try to run below Python script directly or use `python3` shell to generate texts:
```python
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# The first run will take about 3-5 mins (10 MB/s) to download models
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```
### OpenAI Completions API with vLLM
vLLM can also be deployed as a server that implements the OpenAI API protocol. Run
the following command to start the vLLM server with the
[Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model:
```bash
# Use Modelscope mirror to speed up download
pip install modelscope
export VLLM_USE_MODELSCOPE=true
# Deploy vLLM server (The first run will take about 3-5 mins (10 MB/s) to download models)
vllm serve Qwen/Qwen2.5-0.5B-Instruct &
```
If you see log as below:
```
INFO: Started server process [3594]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
```
Congratulations, you have successfully started the vLLM server!
You can query the list the models:
```bash
curl http://localhost:8000/v1/models | python3 -m json.tool
```
You can also query the model with input prompts:
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-0.5B-Instruct",
"prompt": "Beijing is a",
"max_tokens": 5,
"temperature": 0
}' | python3 -m json.tool
```
vLLM is serving as background process, you can use `kill -2 $VLLM_PID` to stop the background process gracefully,
it's equal to `Ctrl-C` to stop foreground vLLM process:
```bash
ps -ef | grep "/.venv/bin/vllm serve" | grep -v grep
VLLM_PID=`ps -ef | grep "/.venv/bin/vllm serve" | grep -v grep | awk '{print $2}'`
kill -2 $VLLM_PID
```
You will see output as below:
```
INFO 02-12 03:34:10 launcher.py:59] Shutting down FastAPI HTTP server.
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
```
Finally, you can exit container by using `ctrl-D`.