初始化项目，由ModelHub XC社区提供模型

Model: Cornebidouil/moonshine-tiny-fr Source: Original Platform
2026-05-08 11:35:50 +08:00
commit 612685aa46
9 changed files with 290608 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,257 @@
+---
+license: mit
+language:
+- fr
+metrics:
+- wer
+- cer
+base_model:
+- UsefulSensors/moonshine-tiny
+pipeline_tag: automatic-speech-recognition
+library_name: transformers
+arvix: https://arxiv.org/abs/2410.15608
+datasets:
+- facebook/multilingual_librispeech
+tags:
+- audio
+- automatic-speech-recognition
+- speech-to-text
+- speech
+- french
+- moonshine
+- asr
+---
+
+# Moonshine-Tiny-FR: French Speech Recognition Model
+
+**Fine-tuned Moonshine ASR model for French language**
+
+This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.
+
+**Links:**
+- [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/)
+- [[Original Paper]](https://arxiv.org/abs/2410.15608)
+- [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr)
+
+## Usage
+
+### Installation
+```bash
+pip install --upgrade pip
+pip install --upgrade transformers datasets[audio]
+```
+
+### Basic Usage
+
+```python
+from transformers import MoonshineForConditionalGeneration, AutoProcessor
+import torch
+import torchaudio
+
+# Load model and processor
+model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')
+processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')
+
+# Load and resample audio to 16kHz
+audio, sr = torchaudio.load("french_audio.wav")
+if sr != 16000:
+    audio = torchaudio.functional.resample(audio, sr, 16000)
+audio = audio[0].numpy()  # Convert to mono
+
+# Prepare inputs
+inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
+
+# Generate transcription
+# Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)
+audio_duration = len(audio) / 16000
+max_new_tokens = int(audio_duration * 5)
+
+generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
+transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
+print(transcription)
+```
+
+### Advanced Usage
+
+For production deployments with:
+- **Live transcription** with Voice Activity Detection
+- **ONNX optimization** (20-30% faster)
+- **Batch processing** scripts
+- **Complete inference pipeline**
+
+See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).
+
+## Model Details
+
+### Model Description
+
+- **Base Model:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
+- **Language:** French (fr)
+- **Model Size:** 27M parameters
+- **Fine-tuned on:** Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model
+- **Training Duration:** 8,000 steps
+- **Optimizer:** Schedule-free AdamW
+- **License:** MIT
+
+### Model Architecture
+
+Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:
+- **Encoder:** Convolutional feature extraction + Transformer blocks
+- **Decoder:** Autoregressive Transformer decoder
+- **Parameters:** 27M (tiny variant)
+- **Input:** 16kHz mono audio
+- **Output:** French text transcription
+
+## Performance
+
+### Evaluation Metrics
+
+Evaluated on Multilingual LibriSpeech (MLS) French test set:
+
+| Metric | Score |
+|--------|-------|
+| **Word Error Rate (WER)** | 21.8% |
+| **Character Error Rate (CER)** | ~10% |
+| **Real-Time Factor (RTF)** | 0.11x (CPU) |
+
+**Inference Speed:** ~9x faster than real-time on CPU, enabling live transcription.
+
+### Comparison
+
+| Model | Size | Language | WER (MLS-FR) |
+|-------|------|----------|--------------|
+| Whisper-tiny | 39M | Multilingual | ~25% |
+| **Moonshine-tiny-fr** | 27M | French | **21.8%** |
+| Whisper-base | 74M | Multilingual | ~18% |
+
+*Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.*
+
+## Training Details / Fine tuning
+
+Please refer to my Github repo for the training procedure : 
+
+## Use Cases
+
+### Primary Applications
+
+✅ **French Speech Recognition**
+- Real-time transcription
+- Audio file transcription
+- Voice commands
+- Accessibility tools
+
+✅ **Resource-Constrained Environments**
+- On-device transcription (mobile, edge devices)
+- Low-latency applications
+- Offline transcription
+
+✅ **Hogwarts Legacy SpellCaster**
+- Ultra-lightweight and low latency spell speech recognition
+- https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster
+
+## Limitations and Biases
+
+### Known Limitations for this tiny model
+
+- **Hallucination:** Like all seq2seq models, may generate text not present in audio
+- **Repetition:** May repeat phrases, especially with greedy decoding (use beam search)
+- **Short Segments:** Performance may degrade on very short audio clips (<0.5s)
+- **Domain Specificity:** Trained primarily on audiobooks (read speech)
+- **Accents:** Best performance on metropolitan French; regional accents may have higher WER
+- **Background Noise:** Performance degrades with significant background noise
+
+## Model Card Author
+
+**Pierre Chéneau (Cornebidouil)**
+
+Geologist, Developer and maintainer of this fine-tuned French model.
+
+**Links:**
+- 🌐 [Personal Website](https://pcheneau.fr)
+- 💼 [GitHub](https://github.com/pierre-cheneau)
+- 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr)
+
+## Citations
+
+### This Model
+
+```bibtex
+@misc{cheneau2026moonshine-tiny-fr,
+  author = {Pierre Chéneau (Cornebidouil)},
+  title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},
+  year = {2026},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}
+}
+```
+
+### Fine tuning Guide
+
+```bibtex
+@misc{cheneau2026moonshine-finetune,
+  author = {Pierre Chéneau (Cornebidouil)},
+  title = {Moonshine ASR Fine-Tuning Guide},
+  year = {2026},
+  publisher = {GitHub},
+  url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}
+}
+```
+
+### Original Moonshine Model
+
+```bibtex
+@misc{jeffries2024moonshinespeechrecognitionlive,
+      title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
+      author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
+      year={2024},
+      eprint={2410.15608},
+      archivePrefix={arXiv},
+      primaryClass={cs.SD},
+      url={https://arxiv.org/abs/2410.15608},
+}
+```
+
+### Multilingual LibriSpeech Dataset
+
+```bibtex
+@inproceedings{panayotov2015librispeech,
+  title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},
+  author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
+  booktitle={Interspeech},
+  year={2020}
+}
+```
+
+## Additional Resources
+
+- **Fine-Tuning Guide:** [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr)
+- **Original Moonshine:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
+- **Dataset:** [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech)
+- **Issues/Support:** [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues)
+
+## License
+
+This model is released under the MIT License, consistent with the base Moonshine model.
+
+```
+MIT License
+
+Copyright (c) 2026 Pierre Chéneau (Cornebidouil)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction...
+```
+
+## Acknowledgments
+
+- **Useful Sensors** for the original Moonshine architecture and pre-trained model
+- **Meta AI** for the Multilingual LibriSpeech dataset
+- **HuggingFace** for the transformers library and model hosting
+- **Schedule-Free Learning** for the optimizer implementation
+
+---
+
+**Questions?** Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation.
+
+**Want to fine-tune for your language?** See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).