Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com> Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
8 lines
410 B
Bash
8 lines
410 B
Bash
# Assuming the model is downdloaded at /home/ubuntu/model_weights/Llama-2-7b-chat-hf
|
|
docker run --name tgi --rm -ti --gpus all --network host \
|
|
-v /home/ubuntu/model_weights/Llama-2-7b-chat-hf:/Llama-2-7b-chat-hf \
|
|
ghcr.io/huggingface/text-generation-inference:1.1.0 \
|
|
--model-id /Llama-2-7b-chat-hf --num-shard 1 --trust-remote-code \
|
|
--max-input-length 2048 --max-total-tokens 4096 \
|
|
--port 24000
|