Files
molcrawl-molecule-nat-lang-…/TOKENIZER_NOTE.md
ModelHub XC 3da01878b8 初始化项目,由ModelHub XC社区提供模型
Model: kojima-lab/molcrawl-molecule-nat-lang-mol-instructions-gpt2-small
Source: Original Platform
2026-05-30 04:10:25 +08:00

386 B

Tokenizer Note

This model was trained with an internal hash-based tokenizer (vocab_size=50002). The tokenizer is not saved in standard HuggingFace format.

For inference, use a tokenizer with vocab_size=50002 or the CodeLlama tokenizer (codellama/CodeLlama-7b-hf) as the intended base.

Special token IDs:

  • <pad>: 0
  • <eos>: 2
  • [/INST] sequence: [518, 29914, 25580, 29162]