Sync from v0.13

2026-01-19 10:38:50 +08:00
parent b2ef04d792
commit 5aef6c175a
3714 changed files with 854317 additions and 89342 deletions
--- a/docs/serving/integrations/langchain.md
+++ b/docs/serving/integrations/langchain.md
@@ -0,0 +1,32 @@
+# LangChain
+
+vLLM is also available via [LangChain](https://github.com/langchain-ai/langchain) .
+
+To install LangChain, run
+
+```bash
+pip install langchain langchain_community -q
+```
+
+To run inference on a single or multiple GPUs, use `VLLM` class from `langchain`.
+
+??? code
+
+    ```python
+    from langchain_community.llms import VLLM
+
+    llm = VLLM(
+        model="mosaicml/mpt-7b",
+        trust_remote_code=True,  # mandatory for hf models
+        max_new_tokens=128,
+        top_k=10,
+        top_p=0.95,
+        temperature=0.8,
+        # for distributed inference
+        # tensor_parallel_size=...,
+    )
+
+    print(llm("What is the capital of France ?"))
+    ```
+
+Please refer to this [Tutorial](https://python.langchain.com/docs/integrations/llms/vllm) for more details.
--- a/docs/serving/integrations/llamaindex.md
+++ b/docs/serving/integrations/llamaindex.md
@@ -0,0 +1,24 @@
+# LlamaIndex
+
+vLLM is also available via [LlamaIndex](https://github.com/run-llama/llama_index) .
+
+To install LlamaIndex, run
+
+```bash
+pip install llama-index-llms-vllm -q
+```
+
+To run inference on a single or multiple GPUs, use `Vllm` class from `llamaindex`.
+
+```python
+from llama_index.llms.vllm import Vllm
+
+llm = Vllm(
+    model="microsoft/Orca-2-7b",
+    tensor_parallel_size=4,
+    max_new_tokens=100,
+    vllm_kwargs={"swap_space": 1, "gpu_memory_utilization": 0.5},
+)
+```
+
+Please refer to this [Tutorial](https://docs.llamaindex.ai/en/latest/examples/llm/vllm/) for more details.