Support radix cache for Lora feature (#7216)
This commit is contained in:
@@ -80,7 +80,6 @@
|
||||
" --enable-lora \\\n",
|
||||
" --lora-paths lora0=algoprog/fact-generation-llama-3.1-8b-instruct-lora \\\n",
|
||||
" --max-loras-per-batch 1 --lora-backend triton \\\n",
|
||||
" --disable-radix-cache\n",
|
||||
"\"\"\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
@@ -140,7 +139,6 @@
|
||||
" --lora-paths lora0=algoprog/fact-generation-llama-3.1-8b-instruct-lora \\\n",
|
||||
" lora1=Nutanix/Meta-Llama-3.1-8B-Instruct_lora_4_alpha_16 \\\n",
|
||||
" --max-loras-per-batch 2 --lora-backend triton \\\n",
|
||||
" --disable-radix-cache\n",
|
||||
"\"\"\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
@@ -215,7 +213,6 @@
|
||||
" --enable-lora \\\n",
|
||||
" --cuda-graph-max-bs 2 \\\n",
|
||||
" --max-loras-per-batch 2 --lora-backend triton \\\n",
|
||||
" --disable-radix-cache\n",
|
||||
" --max-lora-rank 256\n",
|
||||
" --lora-target-modules all\n",
|
||||
" \"\"\"\n",
|
||||
@@ -462,7 +459,7 @@
|
||||
"source": [
|
||||
"## Future Works\n",
|
||||
"\n",
|
||||
"The development roadmap for LoRA-related features can be found in this [issue](https://github.com/sgl-project/sglang/issues/2929). Currently radix attention is incompatible with LoRA and must be manually disabled. Other features, including Unified Paging, Cutlass backend, and dynamic loading/unloadingm, are still under development."
|
||||
"The development roadmap for LoRA-related features can be found in this [issue](https://github.com/sgl-project/sglang/issues/2929). Other features, including Embedding Layer, Unified Paging, Cutlass backend are still under development."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
Reference in New Issue
Block a user