初始化项目,由ModelHub XC社区提供模型

Model: bofenghuang/asr-wav2vec2-ctc-french
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-21 11:36:18 +08:00
commit ddba1294a7
60 changed files with 519273 additions and 0 deletions

34
.gitattributes vendored Normal file
View File

@@ -0,0 +1,34 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

217
README.md Normal file
View File

@@ -0,0 +1,217 @@
---
license: apache-2.0
language: fr
library_name: transformers
thumbnail: null
tags:
- automatic-speech-recognition
- hf-asr-leaderboard
- robust-speech-event
- CTC
- Wav2vec2
datasets:
- common_voice
- mozilla-foundation/common_voice_11_0
- facebook/multilingual_librispeech
- facebook/voxpopuli
- gigant/african_accented_french
metrics:
- wer
model-index:
- name: Fine-tuned wav2vec2-FR-7K-large model for ASR in French
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 11.0
type: mozilla-foundation/common_voice_11_0
args: fr
metrics:
- name: Test WER
type: wer
value: 11.44
- name: Test WER (+LM)
type: wer
value: 9.66
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Multilingual LibriSpeech (MLS)
type: facebook/multilingual_librispeech
args: french
metrics:
- name: Test WER
type: wer
value: 5.93
- name: Test WER (+LM)
type: wer
value: 5.13
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: VoxPopuli
type: facebook/voxpopuli
args: fr
metrics:
- name: Test WER
type: wer
value: 9.33
- name: Test WER (+LM)
type: wer
value: 8.51
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: African Accented French
type: gigant/african_accented_french
args: fr
metrics:
- name: Test WER
type: wer
value: 16.22
- name: Test WER (+LM)
type: wer
value: 15.39
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Robust Speech Event - Dev Data
type: speech-recognition-community-v2/dev_data
args: fr
metrics:
- name: Test WER
type: wer
value: 16.56
- name: Test WER (+LM)
type: wer
value: 12.96
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Fleurs
type: google/fleurs
args: fr_fr
metrics:
- name: Test WER
type: wer
value: 10.10
- name: Test WER (+LM)
type: wer
value: 8.84
---
# Fine-tuned wav2vec2-FR-7K-large model for ASR in French
<style>
img {
display: inline;
}
</style>
![Model architecture](https://img.shields.io/badge/Model_Architecture-Wav2Vec2--CTC-lightgrey)
![Model size](https://img.shields.io/badge/Params-315M-lightgrey)
![Language](https://img.shields.io/badge/Language-French-lightgrey)
This model is a fine-tuned version of [LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large), trained on a composite dataset comprising of over 2200 hours of French speech audio, using the train and validation splits of [Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0), [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech), [Voxpopuli](https://github.com/facebookresearch/voxpopuli), [Multilingual TEDx](http://www.openslr.org/100), [MediaSpeech](https://www.openslr.org/108), and [African Accented French](https://huggingface.co/datasets/gigant/african_accented_french). When using the model make sure that your speech input is also sampled at 16Khz.
## Usage
1. To use on a local audio file with the language model
```python
import torch
import torchaudio
from transformers import AutoModelForCTC, Wav2Vec2ProcessorWithLM
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = AutoModelForCTC.from_pretrained("bhuang/asr-wav2vec2-french").to(device)
processor_with_lm = Wav2Vec2ProcessorWithLM.from_pretrained("bhuang/asr-wav2vec2-french")
model_sample_rate = processor_with_lm.feature_extractor.sampling_rate
wav_path = "example.wav" # path to your audio file
waveform, sample_rate = torchaudio.load(wav_path)
waveform = waveform.squeeze(axis=0) # mono
# resample
if sample_rate != model_sample_rate:
resampler = torchaudio.transforms.Resample(sample_rate, model_sample_rate)
waveform = resampler(waveform)
# normalize
input_dict = processor_with_lm(waveform, sampling_rate=model_sample_rate, return_tensors="pt")
with torch.inference_mode():
logits = model(input_dict.input_values.to(device)).logits
predicted_sentence = processor_with_lm.batch_decode(logits.cpu().numpy()).text[0]
```
2. To use on a local audio file without the language model
```python
import torch
import torchaudio
from transformers import AutoModelForCTC, Wav2Vec2Processor
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = AutoModelForCTC.from_pretrained("bhuang/asr-wav2vec2-french").to(device)
processor = Wav2Vec2Processor.from_pretrained("bhuang/asr-wav2vec2-french")
model_sample_rate = processor.feature_extractor.sampling_rate
wav_path = "example.wav" # path to your audio file
waveform, sample_rate = torchaudio.load(wav_path)
waveform = waveform.squeeze(axis=0) # mono
# resample
if sample_rate != model_sample_rate:
resampler = torchaudio.transforms.Resample(sample_rate, model_sample_rate)
waveform = resampler(waveform)
# normalize
input_dict = processor(waveform, sampling_rate=model_sample_rate, return_tensors="pt")
with torch.inference_mode():
logits = model(input_dict.input_values.to(device)).logits
# decode
predicted_ids = torch.argmax(logits, dim=-1)
predicted_sentence = processor.batch_decode(predicted_ids)[0]
```
## Evaluation
1. To evaluate on `mozilla-foundation/common_voice_11_0`
```bash
python eval.py \
--model_id "bhuang/asr-wav2vec2-french" \
--dataset "mozilla-foundation/common_voice_11_0" \
--config "fr" \
--split "test" \
--log_outputs \
--outdir "outputs/results_mozilla-foundatio_common_voice_11_0_with_lm"
```
2. To evaluate on `speech-recognition-community-v2/dev_data`
```bash
python eval.py \
--model_id "bhuang/asr-wav2vec2-french" \
--dataset "speech-recognition-community-v2/dev_data" \
--config "fr" \
--split "validation" \
--chunk_length_s 30.0 \
--stride_length_s 5.0 \
--log_outputs \
--outdir "outputs/results_speech-recognition-community-v2_dev_data_with_lm"
```

4
added_tokens.json Normal file
View File

@@ -0,0 +1,4 @@
{
"</s>": 49,
"<s>": 48
}

1
alphabet.json Normal file
View File

@@ -0,0 +1 @@
{"labels": [" ", "'", "-", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "\u00e0", "\u00e2", "\u00e4", "\u00e7", "\u00e8", "\u00e9", "\u00ea", "\u00eb", "\u00ee", "\u00ef", "\u00f1", "\u00f4", "\u00f6", "\u00f9", "\u00fb", "\u00fc", "\u00ff", "\u2047", "", "<s>", "</s>"], "is_bpe": false}

115
config.json Normal file
View File

@@ -0,0 +1,115 @@
{
"_name_or_path": "LeBenchmark/wav2vec2-FR-7K-large",
"activation_dropout": 0.05,
"adapter_kernel_size": 3,
"adapter_stride": 2,
"add_adapter": false,
"apply_spec_augment": true,
"architectures": [
"Wav2Vec2ForCTC"
],
"attention_dropout": 0.05,
"bos_token_id": 1,
"classifier_proj_size": 256,
"codevector_dim": 256,
"contrastive_logits_temperature": 0.1,
"conv_bias": true,
"conv_dim": [
512,
512,
512,
512,
512,
512,
512
],
"conv_kernel": [
10,
3,
3,
3,
3,
2,
2
],
"conv_stride": [
5,
2,
2,
2,
2,
2,
2
],
"ctc_loss_reduction": "mean",
"ctc_zero_infinity": true,
"diversity_loss_weight": 0.1,
"do_stable_layer_norm": true,
"eos_token_id": 2,
"feat_extract_activation": "gelu",
"feat_extract_dropout": 0.0,
"feat_extract_norm": "layer",
"feat_proj_dropout": 0.05,
"feat_quantizer_dropout": 0.0,
"final_dropout": 0.05,
"hidden_act": "gelu",
"hidden_dropout": 0.05,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"layerdrop": 0.1,
"mask_channel_length": 10,
"mask_channel_min_space": 1,
"mask_channel_other": 0.0,
"mask_channel_prob": 0.0,
"mask_channel_selection": "static",
"mask_feature_length": 10,
"mask_feature_min_masks": 0,
"mask_feature_prob": 0.0,
"mask_time_length": 10,
"mask_time_min_masks": 2,
"mask_time_min_space": 1,
"mask_time_other": 0.0,
"mask_time_prob": 0.05,
"mask_time_selection": "static",
"model_type": "wav2vec2",
"num_adapter_layers": 3,
"num_attention_heads": 16,
"num_codevector_groups": 2,
"num_codevectors_per_group": 320,
"num_conv_pos_embedding_groups": 16,
"num_conv_pos_embeddings": 128,
"num_feat_extract_layers": 7,
"num_hidden_layers": 24,
"num_negatives": 100,
"output_hidden_size": 1024,
"pad_token_id": 47,
"proj_codevector_dim": 256,
"tdnn_dilation": [
1,
2,
3,
1,
1
],
"tdnn_dim": [
512,
512,
512,
512,
1500
],
"tdnn_kernel": [
5,
3,
3,
1,
1
],
"torch_dtype": "float32",
"transformers_version": "4.25.0.dev0",
"use_weighted_layer_sum": false,
"vocab_size": 50,
"xvector_output_dim": 512
}

181
eval.py Normal file
View File

@@ -0,0 +1,181 @@
#!/usr/bin/env python
import argparse
import re
from typing import Dict
import torch
from datasets import Audio, Dataset, load_dataset, load_metric
from transformers import (
AutoConfig,
AutoFeatureExtractor,
AutoModelForCTC,
AutoTokenizer,
Wav2Vec2Processor,
Wav2Vec2ProcessorWithLM,
pipeline,
)
def log_results(result: Dataset, args: Dict[str, str]):
""" DO NOT CHANGE. This function computes and logs the result metrics. """
log_outputs = args.log_outputs
dataset_id = "_".join(args.dataset.split("/") + [args.config, args.split])
# load metric
wer = load_metric("wer")
cer = load_metric("cer")
# compute metrics
wer_result = wer.compute(references=result["target"], predictions=result["prediction"])
cer_result = cer.compute(references=result["target"], predictions=result["prediction"])
# print & log results
result_str = f"WER: {wer_result}\n" f"CER: {cer_result}"
print(result_str)
with open(f"{dataset_id}_eval_results.txt", "w") as f:
f.write(result_str)
# log all results in text file. Possibly interesting for analysis
if log_outputs is not None:
pred_file = f"log_{dataset_id}_predictions.txt"
target_file = f"log_{dataset_id}_targets.txt"
with open(pred_file, "w") as p, open(target_file, "w") as t:
# mapping function to write output
def write_to_file(batch, i):
p.write(f"{i}" + "\n")
p.write(batch["prediction"] + "\n")
t.write(f"{i}" + "\n")
t.write(batch["target"] + "\n")
result.map(write_to_file, with_indices=True)
def normalize_text(text: str, invalid_chars_regex: str) -> str:
""" DO ADAPT FOR YOUR USE CASE. this function normalizes the target text. """
text = text.lower()
text = re.sub(r"|´||ʼ||ʻ|`", "'", text)
text = re.sub(invalid_chars_regex, " ", text)
text = re.sub(r"\s+", " ", text).strip()
return text
def main(args):
# load dataset
dataset = load_dataset(args.dataset, args.config, split=args.split, use_auth_token=True)
# for testing: only process the first two examples as a test
# dataset = dataset.select(range(10))
# load processor
if args.greedy:
processor = Wav2Vec2Processor.from_pretrained(args.model_id)
decoder = None
else:
processor = Wav2Vec2ProcessorWithLM.from_pretrained(args.model_id)
decoder = processor.decoder
feature_extractor = processor.feature_extractor
tokenizer = processor.tokenizer
sampling_rate = feature_extractor.sampling_rate
# resample audio
dataset = dataset.cast_column("audio", Audio(sampling_rate=sampling_rate))
# load eval pipeline
if args.device is None:
args.device = 0 if torch.cuda.is_available() else -1
config = AutoConfig.from_pretrained(args.model_id)
model = AutoModelForCTC.from_pretrained(args.model_id)
# asr = pipeline("automatic-speech-recognition", model=args.model_id, device=args.device)
asr = pipeline(
"automatic-speech-recognition",
config=config,
model=model,
tokenizer=tokenizer,
feature_extractor=feature_extractor,
decoder=decoder,
device=args.device,
)
# build normalizer config
tokenizer = AutoTokenizer.from_pretrained(args.model_id)
tokens = [x for x in tokenizer.convert_ids_to_tokens(range(0, tokenizer.vocab_size))]
special_tokens = [
tokenizer.pad_token,
tokenizer.word_delimiter_token,
tokenizer.unk_token,
tokenizer.bos_token,
tokenizer.eos_token,
]
non_special_tokens = [x for x in tokens if x not in special_tokens]
invalid_chars_regex = f"[^\s{re.escape(''.join(set(non_special_tokens)))}]"
# normalize_to_lower = False
# for token in non_special_tokens:
# if token.isalpha() and token.islower():
# normalize_to_lower = True
# break
# map function to decode audio
def map_to_pred(batch):
prediction = asr(batch["audio"]["array"], chunk_length_s=args.chunk_length_s, stride_length_s=args.stride_length_s)
batch["prediction"] = prediction["text"]
batch["target"] = normalize_text(batch["sentence"], invalid_chars_regex)
return batch
# run inference on all examples
result = dataset.map(map_to_pred, remove_columns=dataset.column_names)
# filtering out empty targets
result = result.filter(lambda example: example["target"] != "")
# compute and log_results
# do not change function below
log_results(result, args)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model_id", type=str, required=True, help="Model identifier. Should be loadable with 🤗 Transformers")
parser.add_argument(
"--dataset",
type=str,
required=True,
help="Dataset name to evaluate the `model_id`. Should be loadable with 🤗 Datasets",
)
parser.add_argument("--config", type=str, required=True, help="Config of the dataset. *E.g.* `'en'` for Common Voice")
parser.add_argument("--split", type=str, required=True, help="Split of the dataset. *E.g.* `'test'`")
parser.add_argument(
"--chunk_length_s",
type=float,
default=None,
help="Chunk length in seconds. Defaults to None. For long audio files a good value would be 5.0 seconds.",
)
parser.add_argument(
"--stride_length_s",
type=float,
default=None,
help="Stride of the audio chunks. Defaults to None. For long audio files a good value would be 1.0 seconds.",
)
parser.add_argument("--log_outputs", action="store_true", help="If defined, write outputs to log file for analysis.")
parser.add_argument("--greedy", action="store_true", help="If defined, the LM will be ignored during inference.")
parser.add_argument(
"--device",
type=int,
default=None,
help="The device to run the pipeline on. -1 for CPU (default), 0 for the first GPU and so on.",
)
args = parser.parse_args()
main(args)

View File

@@ -0,0 +1 @@
{"alpha": 0.5, "beta": 1.5, "unk_score_offset": -10.0, "score_boundary": true}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3d240fcf833130720ab9789729b2e510dafa012227a74800254b314a481f764a
size 999781632

334912
language_model/unigrams.txt Normal file

File diff suppressed because it is too large Load Diff

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ef044c4666c50ec9169c38a1a5846f85ec2414e64b44b4d1bdc43bb4659756da
size 1262012432

10
preprocessor_config.json Normal file
View File

@@ -0,0 +1,10 @@
{
"do_normalize": true,
"feature_extractor_type": "Wav2Vec2FeatureExtractor",
"feature_size": 1,
"padding_side": "right",
"padding_value": 0.0,
"processor_class": "Wav2Vec2ProcessorWithLM",
"return_attention_mask": true,
"sampling_rate": 16000
}

3
pytorch_model.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c3f8ef2fa0ee8cdf063590f19f68dd038b326c78df4acb8bb52f6d9df1107b54
size 1262103729

View File

@@ -0,0 +1,2 @@
WER: 0.16223776223776223
CER: 0.030996116879182616

View File

@@ -0,0 +1,2 @@
WER: 0.15391976444608024
CER: 0.029825672661177055

View File

@@ -0,0 +1,2 @@
WER: 0.0933944140682725
CER: 0.05192920390245901

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,2 @@
WER: 0.08514007051024931
CER: 0.051649188467461866

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,2 @@
WER: 0.10104321907600596
CER: 0.04789153974821727

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,2 @@
WER: 0.08846497764530552
CER: 0.04616016668133932

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,2 @@
WER: 0.11441598191493416
CER: 0.0338415442043304

View File

@@ -0,0 +1,2 @@
WER: 0.09662302035839927
CER: 0.030784767445014533

View File

@@ -0,0 +1,2 @@
WER: 0.05938023091119791
CER: 0.0251748962097513

View File

@@ -0,0 +1,2 @@
WER: 0.051321945147860426
CER: 0.02437166220530059

View File

@@ -0,0 +1,2 @@
WER: 0.19954995499549955
CER: 0.09941896000804636

View File

@@ -0,0 +1,2 @@
WER: 0.16566156615661567
CER: 0.08239781510394503

View File

@@ -0,0 +1,2 @@
WER: 0.1477047704770477
CER: 0.09565883436104942

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:44ebfbaea972eb37c367e6bb88cca55c38d6647a8df4273f30a581d0e8c0b6db
size 5314

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4aa083158f159b353248a4a673c4ee5d5f00104a31100de215a9249d452020b2
size 233487

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c25541a62eb090857c403318382553b9de67775f65faa0fc6a3c7d664f30629c
size 364

232
special_tokens_map.json Normal file
View File

@@ -0,0 +1,232 @@
{
"additional_special_tokens": [
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "<s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
{
"content": "</s>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
}
],
"bos_token": "<s>",
"eos_token": "</s>",
"pad_token": "[PAD]",
"unk_token": "[UNK]"
}

14
tokenizer_config.json Normal file
View File

@@ -0,0 +1,14 @@
{
"bos_token": "<s>",
"do_lower_case": false,
"eos_token": "</s>",
"model_max_length": 1000000000000000019884624838656,
"name_or_path": "outputs/big/wav2vec2-FR-7K-large-ft",
"pad_token": "[PAD]",
"processor_class": "Wav2Vec2ProcessorWithLM",
"replace_word_delimiter_char": " ",
"special_tokens_map_file": null,
"tokenizer_class": "Wav2Vec2CTCTokenizer",
"unk_token": "[UNK]",
"word_delimiter_token": "|"
}

50
vocab.json Normal file
View File

@@ -0,0 +1,50 @@
{
"'": 1,
"-": 2,
"[PAD]": 47,
"[UNK]": 46,
"a": 3,
"b": 4,
"c": 5,
"d": 6,
"e": 7,
"f": 8,
"g": 9,
"h": 10,
"i": 11,
"j": 12,
"k": 13,
"l": 14,
"m": 15,
"n": 16,
"o": 17,
"p": 18,
"q": 19,
"r": 20,
"s": 21,
"t": 22,
"u": 23,
"v": 24,
"w": 25,
"x": 26,
"y": 27,
"z": 28,
"|": 0,
"à": 29,
"â": 30,
"ä": 31,
"ç": 32,
"è": 33,
"é": 34,
"ê": 35,
"ë": 36,
"î": 37,
"ï": 38,
"ñ": 39,
"ô": 40,
"ö": 41,
"ù": 42,
"û": 43,
"ü": 44,
"ÿ": 45
}