moonshine-tiny-fr/README.md

---
license: mit
language:
- fr
metrics:
- wer
- cer
base_model:
- UsefulSensors/moonshine-tiny
pipeline_tag: automatic-speech-recognition
library_name: transformers
arvix: https://arxiv.org/abs/2410.15608
datasets:
- facebook/multilingual_librispeech
tags:
- audio
- automatic-speech-recognition
- speech-to-text
- speech
- french
- moonshine
- asr
---

# Moonshine-Tiny-FR: French Speech Recognition Model

**Fine-tuned Moonshine ASR model for French language**

This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.

**Links:**
- [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/)
- [[Original Paper]](https://arxiv.org/abs/2410.15608)
- [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr)

## Usage

### Installation
```bash
pip install --upgrade pip
pip install --upgrade transformers datasets[audio]
```

### Basic Usage

```python
from transformers import MoonshineForConditionalGeneration, AutoProcessor
import torch
import torchaudio

# Load model and processor
model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')
processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')

# Load and resample audio to 16kHz
audio, sr = torchaudio.load("french_audio.wav")
if sr != 16000:
    audio = torchaudio.functional.resample(audio, sr, 16000)
audio = audio[0].numpy()  # Convert to mono

# Prepare inputs
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")

# Generate transcription
# Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)
audio_duration = len(audio) / 16000
max_new_tokens = int(audio_duration * 5)

generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
print(transcription)
```

### Advanced Usage

For production deployments with:
- **Live transcription** with Voice Activity Detection
- **ONNX optimization** (20-30% faster)
- **Batch processing** scripts
- **Complete inference pipeline**

See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).

## Model Details

### Model Description

- **Base Model:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
- **Language:** French (fr)
- **Model Size:** 27M parameters
- **Fine-tuned on:** Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model
- **Training Duration:** 8,000 steps
- **Optimizer:** Schedule-free AdamW
- **License:** MIT

### Model Architecture

Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:
- **Encoder:** Convolutional feature extraction + Transformer blocks
- **Decoder:** Autoregressive Transformer decoder
- **Parameters:** 27M (tiny variant)
- **Input:** 16kHz mono audio
- **Output:** French text transcription

## Performance

### Evaluation Metrics

Evaluated on Multilingual LibriSpeech (MLS) French test set:

| Metric | Score |
|--------|-------|
| **Word Error Rate (WER)** | 21.8% |
| **Character Error Rate (CER)** | ~10% |
| **Real-Time Factor (RTF)** | 0.11x (CPU) |

**Inference Speed:** ~9x faster than real-time on CPU, enabling live transcription.

### Comparison

| Model | Size | Language | WER (MLS-FR) |
|-------|------|----------|--------------|
| Whisper-tiny | 39M | Multilingual | ~25% |
| **Moonshine-tiny-fr** | 27M | French | **21.8%** |
| Whisper-base | 74M | Multilingual | ~18% |

*Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.*

## Training Details / Fine tuning

Please refer to my Github repo for the training procedure : 

## Use Cases

### Primary Applications

✅ **French Speech Recognition**
- Real-time transcription
- Audio file transcription
- Voice commands
- Accessibility tools

✅ **Resource-Constrained Environments**
- On-device transcription (mobile, edge devices)
- Low-latency applications
- Offline transcription

✅ **Hogwarts Legacy SpellCaster**
- Ultra-lightweight and low latency spell speech recognition
- https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster

## Limitations and Biases

### Known Limitations for this tiny model

- **Hallucination:** Like all seq2seq models, may generate text not present in audio
- **Repetition:** May repeat phrases, especially with greedy decoding (use beam search)
- **Short Segments:** Performance may degrade on very short audio clips (<0.5s)
- **Domain Specificity:** Trained primarily on audiobooks (read speech)
- **Accents:** Best performance on metropolitan French; regional accents may have higher WER
- **Background Noise:** Performance degrades with significant background noise

## Model Card Author

**Pierre Chéneau (Cornebidouil)**

Geologist, Developer and maintainer of this fine-tuned French model.

**Links:**
- 🌐 [Personal Website](https://pcheneau.fr)
- 💼 [GitHub](https://github.com/pierre-cheneau)
- 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr)

## Citations

### This Model

```bibtex
@misc{cheneau2026moonshine-tiny-fr,
  author = {Pierre Chéneau (Cornebidouil)},
  title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}
}
```

### Fine tuning Guide

```bibtex
@misc{cheneau2026moonshine-finetune,
  author = {Pierre Chéneau (Cornebidouil)},
  title = {Moonshine ASR Fine-Tuning Guide},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}
}
```

### Original Moonshine Model

```bibtex
@misc{jeffries2024moonshinespeechrecognitionlive,
      title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
      author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
      year={2024},
      eprint={2410.15608},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2410.15608},
}
```

### Multilingual LibriSpeech Dataset

```bibtex
@inproceedings{panayotov2015librispeech,
  title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},
  author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
  booktitle={Interspeech},
  year={2020}
}
```

## Additional Resources

- **Fine-Tuning Guide:** [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr)
- **Original Moonshine:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
- **Dataset:** [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech)
- **Issues/Support:** [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues)

## License

This model is released under the MIT License, consistent with the base Moonshine model.

```
MIT License

Copyright (c) 2026 Pierre Chéneau (Cornebidouil)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...
```

## Acknowledgments

- **Useful Sensors** for the original Moonshine architecture and pre-trained model
- **Meta AI** for the Multilingual LibriSpeech dataset
- **HuggingFace** for the transformers library and model hosting
- **Schedule-Free Learning** for the optimizer implementation

---

**Questions?** Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation.

**Want to fine-tune for your language?** See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).
初始化项目，由ModelHub XC社区提供模型 Model: Cornebidouil/moonshine-tiny-fr Source: Original Platform 2026-05-08 11:35:50 +08:00			`---`
			`license: mit`
			`language:`
			`- fr`
			`metrics:`
			`- wer`
			`- cer`
			`base_model:`
			`- UsefulSensors/moonshine-tiny`
			`pipeline_tag: automatic-speech-recognition`
			`library_name: transformers`
			`arvix: https://arxiv.org/abs/2410.15608`
			`datasets:`
			`- facebook/multilingual_librispeech`
			`tags:`
			`- audio`
			`- automatic-speech-recognition`
			`- speech-to-text`
			`- speech`
			`- french`
			`- moonshine`
			`- asr`
			`---`

			`# Moonshine-Tiny-FR: French Speech Recognition Model`

			`Fine-tuned Moonshine ASR model for French language`

			`This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.`

			`Links:`
			`- [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/)`
			`- [[Original Paper]](https://arxiv.org/abs/2410.15608)`
			`- [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr)`

			`## Usage`

			`### Installation`
			```bash
			`pip install --upgrade pip`
			`pip install --upgrade transformers datasets[audio]`
			```

			`### Basic Usage`

			```python
			`from transformers import MoonshineForConditionalGeneration, AutoProcessor`
			`import torch`
			`import torchaudio`

			`# Load model and processor`
			`model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')`
			`processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')`

			`# Load and resample audio to 16kHz`
			`audio, sr = torchaudio.load("french_audio.wav")`
			`if sr != 16000:`
			`audio = torchaudio.functional.resample(audio, sr, 16000)`
			`audio = audio[0].numpy() # Convert to mono`

			`# Prepare inputs`
			`inputs = processor(audio, sampling_rate=16000, return_tensors="pt")`

			`# Generate transcription`
			`# Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)`
			`audio_duration = len(audio) / 16000`
			`max_new_tokens = int(audio_duration * 5)`

			`generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)`
			`transcription = processor.decode(generated_ids[0], skip_special_tokens=True)`
			`print(transcription)`
			```

			`### Advanced Usage`

			`For production deployments with:`
			`- Live transcription with Voice Activity Detection`
			`- ONNX optimization (20-30% faster)`
			`- Batch processing scripts`
			`- Complete inference pipeline`

			See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).

			`## Model Details`

			`### Model Description`

			`- Base Model: [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)`
			`- Language: French (fr)`
			`- Model Size: 27M parameters`
			`- Fine-tuned on: Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model`
			`- Training Duration: 8,000 steps`
			`- Optimizer: Schedule-free AdamW`
			`- License: MIT`

			`### Model Architecture`

			`Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:`
			`- Encoder: Convolutional feature extraction + Transformer blocks`
			`- Decoder: Autoregressive Transformer decoder`
			`- Parameters: 27M (tiny variant)`
			`- Input: 16kHz mono audio`
			`- Output: French text transcription`

			`## Performance`

			`### Evaluation Metrics`

			`Evaluated on Multilingual LibriSpeech (MLS) French test set:`

			`\| Metric \| Score \|`
			`\|--------\|-------\|`
			`\| Word Error Rate (WER) \| 21.8% \|`
			`\| Character Error Rate (CER) \| ~10% \|`
			`\| Real-Time Factor (RTF) \| 0.11x (CPU) \|`

			`Inference Speed: ~9x faster than real-time on CPU, enabling live transcription.`

			`### Comparison`

			`\| Model \| Size \| Language \| WER (MLS-FR) \|`
			`\|-------\|------\|----------\|--------------\|`
			`\| Whisper-tiny \| 39M \| Multilingual \| ~25% \|`
			`\| Moonshine-tiny-fr \| 27M \| French \| 21.8% \|`
			`\| Whisper-base \| 74M \| Multilingual \| ~18% \|`

			`Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.`

			`## Training Details / Fine tuning`

			`Please refer to my Github repo for the training procedure :`

			`## Use Cases`

			`### Primary Applications`

			`✅ French Speech Recognition`
			`- Real-time transcription`
			`- Audio file transcription`
			`- Voice commands`
			`- Accessibility tools`

			`✅ Resource-Constrained Environments`
			`- On-device transcription (mobile, edge devices)`
			`- Low-latency applications`
			`- Offline transcription`

			`✅ Hogwarts Legacy SpellCaster`
			`- Ultra-lightweight and low latency spell speech recognition`
			`- https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster`

			`## Limitations and Biases`

			`### Known Limitations for this tiny model`

			`- Hallucination: Like all seq2seq models, may generate text not present in audio`
			`- Repetition: May repeat phrases, especially with greedy decoding (use beam search)`
			`- Short Segments: Performance may degrade on very short audio clips (<0.5s)`
			`- Domain Specificity: Trained primarily on audiobooks (read speech)`
			`- Accents: Best performance on metropolitan French; regional accents may have higher WER`
			`- Background Noise: Performance degrades with significant background noise`

			`## Model Card Author`

			`Pierre Chéneau (Cornebidouil)`

			`Geologist, Developer and maintainer of this fine-tuned French model.`

			`Links:`
			`- 🌐 [Personal Website](https://pcheneau.fr)`
			`- 💼 [GitHub](https://github.com/pierre-cheneau)`
			`- 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr)`

			`## Citations`

			`### This Model`

			```bibtex
			`@misc{cheneau2026moonshine-tiny-fr,`
			`author = {Pierre Chéneau (Cornebidouil)},`
			`title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},`
			`year = {2026},`
			`publisher = {HuggingFace},`
			`url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}`
			`}`
			```

			`### Fine tuning Guide`

			```bibtex
			`@misc{cheneau2026moonshine-finetune,`
			`author = {Pierre Chéneau (Cornebidouil)},`
			`title = {Moonshine ASR Fine-Tuning Guide},`
			`year = {2026},`
			`publisher = {GitHub},`
			`url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}`
			`}`
			```

			`### Original Moonshine Model`

			```bibtex
			`@misc{jeffries2024moonshinespeechrecognitionlive,`
			`title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},`
			`author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},`
			`year={2024},`
			`eprint={2410.15608},`
			`archivePrefix={arXiv},`
			`primaryClass={cs.SD},`
			`url={https://arxiv.org/abs/2410.15608},`
			`}`
			```

			`### Multilingual LibriSpeech Dataset`

			```bibtex
			`@inproceedings{panayotov2015librispeech,`
			`title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},`
			`author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},`
			`booktitle={Interspeech},`
			`year={2020}`
			`}`
			```

			`## Additional Resources`

			`- Fine-Tuning Guide: [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr)`
			`- Original Moonshine: [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)`
			`- Dataset: [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech)`
			`- Issues/Support: [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues)`

			`## License`

			`This model is released under the MIT License, consistent with the base Moonshine model.`

			```
			`MIT License`

			`Copyright (c) 2026 Pierre Chéneau (Cornebidouil)`

			`Permission is hereby granted, free of charge, to any person obtaining a copy`
			`of this software and associated documentation files (the "Software"), to deal`
			`in the Software without restriction...`
			```

			`## Acknowledgments`

			`- Useful Sensors for the original Moonshine architecture and pre-trained model`
			`- Meta AI for the Multilingual LibriSpeech dataset`
			`- HuggingFace for the transformers library and model hosting`
			`- Schedule-Free Learning for the optimizer implementation`

			`---`

			`Questions? Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation.`

			`Want to fine-tune for your language? See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).`