初始化项目,由ModelHub XC社区提供模型
Model: Cornebidouil/moonshine-tiny-fr Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
257
README.md
Normal file
257
README.md
Normal file
@@ -0,0 +1,257 @@
|
|||||||
|
---
|
||||||
|
license: mit
|
||||||
|
language:
|
||||||
|
- fr
|
||||||
|
metrics:
|
||||||
|
- wer
|
||||||
|
- cer
|
||||||
|
base_model:
|
||||||
|
- UsefulSensors/moonshine-tiny
|
||||||
|
pipeline_tag: automatic-speech-recognition
|
||||||
|
library_name: transformers
|
||||||
|
arvix: https://arxiv.org/abs/2410.15608
|
||||||
|
datasets:
|
||||||
|
- facebook/multilingual_librispeech
|
||||||
|
tags:
|
||||||
|
- audio
|
||||||
|
- automatic-speech-recognition
|
||||||
|
- speech-to-text
|
||||||
|
- speech
|
||||||
|
- french
|
||||||
|
- moonshine
|
||||||
|
- asr
|
||||||
|
---
|
||||||
|
|
||||||
|
# Moonshine-Tiny-FR: French Speech Recognition Model
|
||||||
|
|
||||||
|
**Fine-tuned Moonshine ASR model for French language**
|
||||||
|
|
||||||
|
This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.
|
||||||
|
|
||||||
|
**Links:**
|
||||||
|
- [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/)
|
||||||
|
- [[Original Paper]](https://arxiv.org/abs/2410.15608)
|
||||||
|
- [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr)
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
```bash
|
||||||
|
pip install --upgrade pip
|
||||||
|
pip install --upgrade transformers datasets[audio]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import MoonshineForConditionalGeneration, AutoProcessor
|
||||||
|
import torch
|
||||||
|
import torchaudio
|
||||||
|
|
||||||
|
# Load model and processor
|
||||||
|
model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')
|
||||||
|
processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')
|
||||||
|
|
||||||
|
# Load and resample audio to 16kHz
|
||||||
|
audio, sr = torchaudio.load("french_audio.wav")
|
||||||
|
if sr != 16000:
|
||||||
|
audio = torchaudio.functional.resample(audio, sr, 16000)
|
||||||
|
audio = audio[0].numpy() # Convert to mono
|
||||||
|
|
||||||
|
# Prepare inputs
|
||||||
|
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
|
||||||
|
|
||||||
|
# Generate transcription
|
||||||
|
# Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)
|
||||||
|
audio_duration = len(audio) / 16000
|
||||||
|
max_new_tokens = int(audio_duration * 5)
|
||||||
|
|
||||||
|
generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
|
||||||
|
transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
|
||||||
|
print(transcription)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Advanced Usage
|
||||||
|
|
||||||
|
For production deployments with:
|
||||||
|
- **Live transcription** with Voice Activity Detection
|
||||||
|
- **ONNX optimization** (20-30% faster)
|
||||||
|
- **Batch processing** scripts
|
||||||
|
- **Complete inference pipeline**
|
||||||
|
|
||||||
|
See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).
|
||||||
|
|
||||||
|
## Model Details
|
||||||
|
|
||||||
|
### Model Description
|
||||||
|
|
||||||
|
- **Base Model:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
|
||||||
|
- **Language:** French (fr)
|
||||||
|
- **Model Size:** 27M parameters
|
||||||
|
- **Fine-tuned on:** Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model
|
||||||
|
- **Training Duration:** 8,000 steps
|
||||||
|
- **Optimizer:** Schedule-free AdamW
|
||||||
|
- **License:** MIT
|
||||||
|
|
||||||
|
### Model Architecture
|
||||||
|
|
||||||
|
Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:
|
||||||
|
- **Encoder:** Convolutional feature extraction + Transformer blocks
|
||||||
|
- **Decoder:** Autoregressive Transformer decoder
|
||||||
|
- **Parameters:** 27M (tiny variant)
|
||||||
|
- **Input:** 16kHz mono audio
|
||||||
|
- **Output:** French text transcription
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
### Evaluation Metrics
|
||||||
|
|
||||||
|
Evaluated on Multilingual LibriSpeech (MLS) French test set:
|
||||||
|
|
||||||
|
| Metric | Score |
|
||||||
|
|--------|-------|
|
||||||
|
| **Word Error Rate (WER)** | 21.8% |
|
||||||
|
| **Character Error Rate (CER)** | ~10% |
|
||||||
|
| **Real-Time Factor (RTF)** | 0.11x (CPU) |
|
||||||
|
|
||||||
|
**Inference Speed:** ~9x faster than real-time on CPU, enabling live transcription.
|
||||||
|
|
||||||
|
### Comparison
|
||||||
|
|
||||||
|
| Model | Size | Language | WER (MLS-FR) |
|
||||||
|
|-------|------|----------|--------------|
|
||||||
|
| Whisper-tiny | 39M | Multilingual | ~25% |
|
||||||
|
| **Moonshine-tiny-fr** | 27M | French | **21.8%** |
|
||||||
|
| Whisper-base | 74M | Multilingual | ~18% |
|
||||||
|
|
||||||
|
*Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.*
|
||||||
|
|
||||||
|
## Training Details / Fine tuning
|
||||||
|
|
||||||
|
Please refer to my Github repo for the training procedure :
|
||||||
|
|
||||||
|
## Use Cases
|
||||||
|
|
||||||
|
### Primary Applications
|
||||||
|
|
||||||
|
✅ **French Speech Recognition**
|
||||||
|
- Real-time transcription
|
||||||
|
- Audio file transcription
|
||||||
|
- Voice commands
|
||||||
|
- Accessibility tools
|
||||||
|
|
||||||
|
✅ **Resource-Constrained Environments**
|
||||||
|
- On-device transcription (mobile, edge devices)
|
||||||
|
- Low-latency applications
|
||||||
|
- Offline transcription
|
||||||
|
|
||||||
|
✅ **Hogwarts Legacy SpellCaster**
|
||||||
|
- Ultra-lightweight and low latency spell speech recognition
|
||||||
|
- https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster
|
||||||
|
|
||||||
|
## Limitations and Biases
|
||||||
|
|
||||||
|
### Known Limitations for this tiny model
|
||||||
|
|
||||||
|
- **Hallucination:** Like all seq2seq models, may generate text not present in audio
|
||||||
|
- **Repetition:** May repeat phrases, especially with greedy decoding (use beam search)
|
||||||
|
- **Short Segments:** Performance may degrade on very short audio clips (<0.5s)
|
||||||
|
- **Domain Specificity:** Trained primarily on audiobooks (read speech)
|
||||||
|
- **Accents:** Best performance on metropolitan French; regional accents may have higher WER
|
||||||
|
- **Background Noise:** Performance degrades with significant background noise
|
||||||
|
|
||||||
|
## Model Card Author
|
||||||
|
|
||||||
|
**Pierre Chéneau (Cornebidouil)**
|
||||||
|
|
||||||
|
Geologist, Developer and maintainer of this fine-tuned French model.
|
||||||
|
|
||||||
|
**Links:**
|
||||||
|
- 🌐 [Personal Website](https://pcheneau.fr)
|
||||||
|
- 💼 [GitHub](https://github.com/pierre-cheneau)
|
||||||
|
- 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr)
|
||||||
|
|
||||||
|
## Citations
|
||||||
|
|
||||||
|
### This Model
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{cheneau2026moonshine-tiny-fr,
|
||||||
|
author = {Pierre Chéneau (Cornebidouil)},
|
||||||
|
title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},
|
||||||
|
year = {2026},
|
||||||
|
publisher = {HuggingFace},
|
||||||
|
url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fine tuning Guide
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{cheneau2026moonshine-finetune,
|
||||||
|
author = {Pierre Chéneau (Cornebidouil)},
|
||||||
|
title = {Moonshine ASR Fine-Tuning Guide},
|
||||||
|
year = {2026},
|
||||||
|
publisher = {GitHub},
|
||||||
|
url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Original Moonshine Model
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{jeffries2024moonshinespeechrecognitionlive,
|
||||||
|
title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
|
||||||
|
author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
|
||||||
|
year={2024},
|
||||||
|
eprint={2410.15608},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.SD},
|
||||||
|
url={https://arxiv.org/abs/2410.15608},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multilingual LibriSpeech Dataset
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@inproceedings{panayotov2015librispeech,
|
||||||
|
title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},
|
||||||
|
author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
|
||||||
|
booktitle={Interspeech},
|
||||||
|
year={2020}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Additional Resources
|
||||||
|
|
||||||
|
- **Fine-Tuning Guide:** [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr)
|
||||||
|
- **Original Moonshine:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
|
||||||
|
- **Dataset:** [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech)
|
||||||
|
- **Issues/Support:** [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues)
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This model is released under the MIT License, consistent with the base Moonshine model.
|
||||||
|
|
||||||
|
```
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2026 Pierre Chéneau (Cornebidouil)
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Acknowledgments
|
||||||
|
|
||||||
|
- **Useful Sensors** for the original Moonshine architecture and pre-trained model
|
||||||
|
- **Meta AI** for the Multilingual LibriSpeech dataset
|
||||||
|
- **HuggingFace** for the transformers library and model hosting
|
||||||
|
- **Schedule-Free Learning** for the optimizer implementation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Questions?** Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation.
|
||||||
|
|
||||||
|
**Want to fine-tune for your language?** See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).
|
||||||
33
config.json
Normal file
33
config.json
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"MoonshineForConditionalGeneration"
|
||||||
|
],
|
||||||
|
"attention_bias": false,
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"decoder_hidden_act": "silu",
|
||||||
|
"decoder_num_attention_heads": 8,
|
||||||
|
"decoder_num_hidden_layers": 6,
|
||||||
|
"decoder_num_key_value_heads": 8,
|
||||||
|
"decoder_start_token_id": 1,
|
||||||
|
"dtype": "float32",
|
||||||
|
"encoder_hidden_act": "gelu",
|
||||||
|
"encoder_num_attention_heads": 8,
|
||||||
|
"encoder_num_hidden_layers": 6,
|
||||||
|
"encoder_num_key_value_heads": 8,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"hidden_size": 288,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 1152,
|
||||||
|
"is_encoder_decoder": true,
|
||||||
|
"max_position_embeddings": 194,
|
||||||
|
"model_type": "moonshine",
|
||||||
|
"pad_head_dim_to_multiple_of": 8,
|
||||||
|
"pad_token_id": 2,
|
||||||
|
"partial_rotary_factor": 0.9,
|
||||||
|
"rope_scaling": null,
|
||||||
|
"rope_theta": 10000.0,
|
||||||
|
"transformers_version": "4.57.6",
|
||||||
|
"use_cache": false,
|
||||||
|
"vocab_size": 32768
|
||||||
|
}
|
||||||
16
generation_config.json
Normal file
16
generation_config.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 1,
|
||||||
|
"decoder_start_token_id": 1,
|
||||||
|
"early_stopping": true,
|
||||||
|
"eos_token_id": [
|
||||||
|
2
|
||||||
|
],
|
||||||
|
"length_penalty": 1.2,
|
||||||
|
"max_length": 194,
|
||||||
|
"no_repeat_ngram_size": 2,
|
||||||
|
"num_beams": 5,
|
||||||
|
"pad_token_id": 2,
|
||||||
|
"repetition_penalty": 1.2,
|
||||||
|
"transformers_version": "4.57.6"
|
||||||
|
}
|
||||||
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:f980d9f1b34ec113b6abe8cb79e4c30f5caf42aaa5af2714b41f14d298101b59
|
||||||
|
size 108389192
|
||||||
10
preprocessor_config.json
Normal file
10
preprocessor_config.json
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
{
|
||||||
|
"do_normalize": false,
|
||||||
|
"feature_extractor_type": "Wav2Vec2FeatureExtractor",
|
||||||
|
"feature_size": 1,
|
||||||
|
"padding_side": "right",
|
||||||
|
"padding_value": 0.0,
|
||||||
|
"processor_class": "Wav2Vec2Processor",
|
||||||
|
"return_attention_mask": true,
|
||||||
|
"sampling_rate": 16000
|
||||||
|
}
|
||||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "</s>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
284051
tokenizer.json
Normal file
284051
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
6180
tokenizer_config.json
Normal file
6180
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user