初始化项目,由ModelHub XC社区提供模型
Model: kojima-lab/molcrawl-molecule-nat-lang-mol-instructions-gpt2-small Source: Original Platform
This commit is contained in:
12
TOKENIZER_NOTE.md
Normal file
12
TOKENIZER_NOTE.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# Tokenizer Note
|
||||
|
||||
This model was trained with an internal hash-based tokenizer (vocab_size=50002).
|
||||
The tokenizer is not saved in standard HuggingFace format.
|
||||
|
||||
For inference, use a tokenizer with vocab_size=50002 or the CodeLlama tokenizer
|
||||
(`codellama/CodeLlama-7b-hf`) as the intended base.
|
||||
|
||||
Special token IDs:
|
||||
- `<pad>`: 0
|
||||
- `<eos>`: 2
|
||||
- `[/INST]` sequence: [518, 29914, 25580, 29162]
|
||||
Reference in New Issue
Block a user