Fix logit processor bugs (#427)

This commit is contained in:
Lianmin Zheng
2024-05-12 04:54:07 -07:00
committed by GitHub
parent 7023f413c6
commit aee4f523cf
26 changed files with 166 additions and 257 deletions

View File

@@ -297,7 +297,6 @@ curl http://localhost:30000/generate \
Learn more about the argument format [here](docs/sampling_params.md).
### OpenAI Compatible API
In addition, the server supports an experimental OpenAI-compatible API.
```python
@@ -386,7 +385,6 @@ python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port
Instructions for supporting a new model are [here](https://github.com/sgl-project/sglang/blob/main/docs/model_support.md).
## Benchmark And Performance
- Llama-7B on NVIDIA A10G, FP16, Tensor Parallelism=1
![llama_7b](assets/llama_7b.jpg)
@@ -410,7 +408,4 @@ https://github.com/sgl-project/sglang/issues/157
}
```
[![Paper page](https://huggingface.co/datasets/huggingface/badges/resolve/main/paper-page-md.svg)](https://huggingface.co/papers/2312.07104)
We learned from the design and reused some code of the following projects: [Guidance](https://github.com/guidance-ai/guidance), [vLLM](https://github.com/vllm-project/vllm), [LightLLM](https://github.com/ModelTC/lightllm), [FlashInfer](https://github.com/flashinfer-ai/flashinfer), [Outlines](https://github.com/outlines-dev/outlines), [LMQL](https://github.com/eth-sri/lmql).