13 lines
386 B
Markdown
13 lines
386 B
Markdown
|
|
# Tokenizer Note
|
||
|
|
|
||
|
|
This model was trained with an internal hash-based tokenizer (vocab_size=50002).
|
||
|
|
The tokenizer is not saved in standard HuggingFace format.
|
||
|
|
|
||
|
|
For inference, use a tokenizer with vocab_size=50002 or the CodeLlama tokenizer
|
||
|
|
(`codellama/CodeLlama-7b-hf`) as the intended base.
|
||
|
|
|
||
|
|
Special token IDs:
|
||
|
|
- `<pad>`: 0
|
||
|
|
- `<eos>`: 2
|
||
|
|
- `[/INST]` sequence: [518, 29914, 25580, 29162]
|