molcrawl-molecule-nat-lang-…/TOKENIZER_NOTE.md

# Tokenizer Note

This model was trained with an internal hash-based tokenizer (vocab_size=50002).
The tokenizer is not saved in standard HuggingFace format.

For inference, use a tokenizer with vocab_size=50002 or the CodeLlama tokenizer
(`codellama/CodeLlama-7b-hf`) as the intended base.

Special token IDs:
- `<pad>`: 0
- `<eos>`: 2
- `[/INST]` sequence: [518, 29914, 25580, 29162]
初始化项目，由ModelHub XC社区提供模型 Model: kojima-lab/molcrawl-molecule-nat-lang-mol-instructions-gpt2-small Source: Original Platform 2026-05-30 04:10:25 +08:00			`# Tokenizer Note`

			`This model was trained with an internal hash-based tokenizer (vocab_size=50002).`
			`The tokenizer is not saved in standard HuggingFace format.`

			`For inference, use a tokenizer with vocab_size=50002 or the CodeLlama tokenizer`
			(`codellama/CodeLlama-7b-hf`) as the intended base.

			`Special token IDs:`
			- `<pad>`: 0
			- `<eos>`: 2
			- `[/INST]` sequence: [518, 29914, 25580, 29162]