Align tokenizer with mistral-common (#45)

- Align tokenizer with mistral-common (53f216c52ce4534a38a71c21861acd514fa8a904)
- Defend the honour of the Hugging Face tokenizer (684c1751c210aa11e0b187c0eac1b7b2bd4d7967)
- Update to tokenizer v3 with correct proper special tokens (106a1b0c338ddbd0e3e42dbeb63634bc85d6f71b)
- Re-add chat template (3256c7e7ea279386e0cdd18553202ed78c4d735b)

Co-authored-by: Matthew Carrigan <Rocketknight1@users.noreply.huggingface.co>
This commit is contained in:
ai-modelscope
2024-07-31 08:03:57 +08:00
parent 634e800d57
commit 04ccb687cb
44 changed files with 13150 additions and 48 deletions

View File

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
8bb723a164dbf9c032306c441e85029f756312f5
1722383887.242612

View File

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
c29064486e2a7987586f42f08954868d8ca66f83
1722383887.347987

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
6675b83f2de8ab76c9c19e0b28508a2565598c141899b95671f039f89a945cf4
1722384235.744555

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
b6bea2642bc3fe80f392111d52af91d1563a8de2
1722383887.2291193

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
6c911e66544527032c9e49f602ed0645f748045248eb8fb8ec9982866b899674
1722383993.896287

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
cc1de07197a04eaeeaa6dcb7ed6604f729ed822e92273c25c112f85c366b5696
1722383968.7079918

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
90dc483e3b22d3d21a03edd588a8ffe5743b8dea33fc9f1ffc01eb1e529aedf8
1722383968.4000177

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
76ee31da7cdd8fde0a257030ffdf7d3fb293935a62b8469f6dec1c1a19e14eee
1722383968.709336

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
1d56824727ffaf568f7a1c7770fd5cb531df71ebe143567b1cb3968aca7f98cd
1722383968.6492853

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
9f30bb3fdbcad8d1c00e0b421908bebc6cb5544669cd3c916ae592acb7263ae4
1722383974.951245

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
54eb704485dce4f8c7c245169d25f394ea08dec1562a1ab981715f294ef93314
1722383968.3274906

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
cfbe26e02d475904ecc92cbe54a614607156aabed3503867d8af5023673d6374
1722384025.965309

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
c834720ddae75dc683e52284ffe27ea35f48eb2c5500c71025925fe0dd398a8c
1722384026.840935

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
93ef940253837d1885e343f14d51263bef0c520c
1722383968.9637132

View File

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
b9305994a17001d70bf99885035ca41a4568d5bf
1722383968.9474473

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
451134b2ddc2e78555d1e857518c54b4bdc2e87d
1722383968.9358094

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
387cfa3eb95ef70c2061ea94aaf58d7fdc8f483d
1722383969.2597454

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
9addc8bdce5988448ae81b729336f43a81262160ae8da760674badab9d4c7d33
1722383969.4252708

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
055b6adaaeb7110f2892a575417c4ab74f09d8d7
1722383969.4520195

View File

@@ -0,0 +1,3 @@
1c56091ac7f0ab22f0f4af40655eab46e1be34f7
0b62a2400f493bcd1f586beb95bffe478058f9b3
1722383969.610869

View File

@@ -13,11 +13,6 @@ extra_gated_description: If you want to learn more about how we process your per
# Model Card for Codestral-22B-v0.1
###
> [!WARNING]
> 🚫
> The `transformers` tokenizer is not properly configured. Make sure that your encoding and decoding is correct by using `mistral-common` as shown below:
## Encode and Decode with `mistral_common`

File diff suppressed because it is too large Load Diff

View File

@@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
size 587404
oid sha256:9addc8bdce5988448ae81b729336f43a81262160ae8da760674badab9d4c7d33
size 587591

File diff suppressed because it is too large Load Diff