Sync from v0.13
This commit is contained in:
23
docs/configuration/model_resolution.md
Normal file
23
docs/configuration/model_resolution.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# Model Resolution
|
||||
|
||||
vLLM loads HuggingFace-compatible models by inspecting the `architectures` field in `config.json` of the model repository
|
||||
and finding the corresponding implementation that is registered to vLLM.
|
||||
Nevertheless, our model resolution may fail for the following reasons:
|
||||
|
||||
- The `config.json` of the model repository lacks the `architectures` field.
|
||||
- Unofficial repositories refer to a model using alternative names which are not recorded in vLLM.
|
||||
- The same architecture name is used for multiple models, creating ambiguity as to which model should be loaded.
|
||||
|
||||
To fix this, explicitly specify the model architecture by passing `config.json` overrides to the `hf_overrides` option.
|
||||
For example:
|
||||
|
||||
```python
|
||||
from vllm import LLM
|
||||
|
||||
llm = LLM(
|
||||
model="cerebras/Cerebras-GPT-1.3B",
|
||||
hf_overrides={"architectures": ["GPT2LMHeadModel"]}, # GPT-2
|
||||
)
|
||||
```
|
||||
|
||||
Our [list of supported models](../models/supported_models.md) shows the model architectures that are recognized by vLLM.
|
||||
Reference in New Issue
Block a user