Align tokenizer with mistral-common (#45)

- Align tokenizer with mistral-common (53f216c52ce4534a38a71c21861acd514fa8a904)
- Defend the honour of the Hugging Face tokenizer (684c1751c210aa11e0b187c0eac1b7b2bd4d7967)
- Update to tokenizer v3 with correct proper special tokens (106a1b0c338ddbd0e3e42dbeb63634bc85d6f71b)
- Re-add chat template (3256c7e7ea279386e0cdd18553202ed78c4d735b)

Co-authored-by: Matthew Carrigan <Rocketknight1@users.noreply.huggingface.co>
This commit is contained in:
ai-modelscope
2024-07-31 08:03:57 +08:00
parent 634e800d57
commit 04ccb687cb
44 changed files with 13150 additions and 48 deletions

View File

@@ -13,11 +13,6 @@ extra_gated_description: If you want to learn more about how we process your per
# Model Card for Codestral-22B-v0.1
###
> [!WARNING]
> 🚫
> The `transformers` tokenizer is not properly configured. Make sure that your encoding and decoding is correct by using `mistral-common` as shown below:
## Encode and Decode with `mistral_common`