初始化项目，由ModelHub XC社区提供模型

Model: Cornebidouil/moonshine-tiny-fr Source: Original Platform
2026-05-08 11:35:50 +08:00
commit 612685aa46
9 changed files with 290608 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,35 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,257 @@
 ---
 license: mit
 language:
 - fr
 metrics:
 - wer
 - cer
 base_model:
 - UsefulSensors/moonshine-tiny
 pipeline_tag: automatic-speech-recognition
 library_name: transformers
 arvix: https://arxiv.org/abs/2410.15608
 datasets:
 - facebook/multilingual_librispeech
 tags:
 - audio
 - automatic-speech-recognition
 - speech-to-text
 - speech
 - french
 - moonshine
 - asr
 ---
 # Moonshine-Tiny-FR: French Speech Recognition Model
 **Fine-tuned Moonshine ASR model for French language**
 This is a fine-tuned version of [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) specifically optimized for French speech recognition. The model achieves state-of-the-art performance for its size (27M parameters) on French ASR tasks.
 **Links:**
 - [[Original Moonshine Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/)
 - [[Original Paper]](https://arxiv.org/abs/2410.15608)
 - [[Fine-Tuning Guide]](https://github.com/pierre-cheneau/finetune-moonshine-asr)
 ## Usage
 ### Installation
 ```bash
 pip install --upgrade pip
 pip install --upgrade transformers datasets[audio]
 ```
 ### Basic Usage
 ```python
 from transformers import MoonshineForConditionalGeneration, AutoProcessor
 import torch
 import torchaudio
 # Load model and processor
 model = MoonshineForConditionalGeneration.from_pretrained('Cornebidouil/moonshine-tiny-fr')
 processor = AutoProcessor.from_pretrained('Cornebidouil/moonshine-tiny-fr')
 # Load and resample audio to 16kHz
 audio, sr = torchaudio.load("french_audio.wav")
 if sr != 16000:
    audio = torchaudio.functional.resample(audio, sr, 16000)
 audio = audio[0].numpy()  # Convert to mono
 # Prepare inputs
 inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
 # Generate transcription
 # Calculate max_new_tokens to avoid truncation (5 tokens per second is optimal for French)
 audio_duration = len(audio) / 16000
 max_new_tokens = int(audio_duration * 5)
 generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
 transcription = processor.decode(generated_ids[0], skip_special_tokens=True)
 print(transcription)
 ```
 ### Advanced Usage
 For production deployments with:
 - **Live transcription** with Voice Activity Detection
 - **ONNX optimization** (20-30% faster)
 - **Batch processing** scripts
 - **Complete inference pipeline**
 See the included [`inference.py`](https://github.com/pierre-cheneau/finetune-moonshine-asr/blob/main/scripts/inference.py) script in the [fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).
 ## Model Details
 ### Model Description
 - **Base Model:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
 - **Language:** French (fr)
 - **Model Size:** 27M parameters
 - **Fine-tuned on:** Multilingual LibriSpeech (MLS) French dataset specifically segmented for the requirements of the moonshine model
 - **Training Duration:** 8,000 steps
 - **Optimizer:** Schedule-free AdamW
 - **License:** MIT
 ### Model Architecture
 Moonshine is a compact sequence-to-sequence ASR model designed for efficient on-device inference:
 - **Encoder:** Convolutional feature extraction + Transformer blocks
 - **Decoder:** Autoregressive Transformer decoder
 - **Parameters:** 27M (tiny variant)
 - **Input:** 16kHz mono audio
 - **Output:** French text transcription
 ## Performance
 ### Evaluation Metrics
 Evaluated on Multilingual LibriSpeech (MLS) French test set:
 | Metric | Score |
 |--------|-------|
 | **Word Error Rate (WER)** | 21.8% |
 | **Character Error Rate (CER)** | ~10% |
 | **Real-Time Factor (RTF)** | 0.11x (CPU) |
 **Inference Speed:** ~9x faster than real-time on CPU, enabling live transcription.
 ### Comparison
 | Model | Size | Language | WER (MLS-FR) |
 |-------|------|----------|--------------|
 | Whisper-tiny | 39M | Multilingual | ~25% |
 | **Moonshine-tiny-fr** | 27M | French | **21.8%** |
 | Whisper-base | 74M | Multilingual | ~18% |
 *Moonshine-tiny-fr achieves competitive performance with 30% fewer parameters than Whisper-tiny. While being a proof of concept. More work should be done to create a proper and robust dataset.*
 ## Training Details / Fine tuning
 Please refer to my Github repo for the training procedure : 
 ## Use Cases
 ### Primary Applications
 ✅ **French Speech Recognition**
 - Real-time transcription
 - Audio file transcription
 - Voice commands
 - Accessibility tools
 ✅ **Resource-Constrained Environments**
 - On-device transcription (mobile, edge devices)
 - Low-latency applications
 - Offline transcription
 ✅ **Hogwarts Legacy SpellCaster**
 - Ultra-lightweight and low latency spell speech recognition
 - https://github.com/pierre-cheneau/HogwartsLegacy-SpellCaster
 ## Limitations and Biases
 ### Known Limitations for this tiny model
 - **Hallucination:** Like all seq2seq models, may generate text not present in audio
 - **Repetition:** May repeat phrases, especially with greedy decoding (use beam search)
 - **Short Segments:** Performance may degrade on very short audio clips (<0.5s)
 - **Domain Specificity:** Trained primarily on audiobooks (read speech)
 - **Accents:** Best performance on metropolitan French; regional accents may have higher WER
 - **Background Noise:** Performance degrades with significant background noise
 ## Model Card Author
 **Pierre Chéneau (Cornebidouil)**
 Geologist, Developer and maintainer of this fine-tuned French model.
 **Links:**
 - 🌐 [Personal Website](https://pcheneau.fr)
 - 💼 [GitHub](https://github.com/pierre-cheneau)
 - 📚 [Fine-tuning Guide](https://github.com/pierre-cheneau/finetune-moonshine-asr)
 ## Citations
 ### This Model
 ```bibtex
@misc{cheneau2026moonshine-tiny-fr,
  author = {Pierre Chéneau (Cornebidouil)},
  title = {Moonshine-Tiny-FR: Fine-tuned French Speech Recognition},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Cornebidouil/moonshine-tiny-fr}
 }
 ```
 ### Fine tuning Guide
 ```bibtex
@misc{cheneau2026moonshine-finetune,
  author = {Pierre Chéneau (Cornebidouil)},
  title = {Moonshine ASR Fine-Tuning Guide},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/pierre-cheneau/finetune-moonshine-asr}
 }
 ```
 ### Original Moonshine Model
 ```bibtex
@misc{jeffries2024moonshinespeechrecognitionlive,
      title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
      author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
      year={2024},
      eprint={2410.15608},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2410.15608},
 }
 ```
 ### Multilingual LibriSpeech Dataset
 ```bibtex
@inproceedings{panayotov2015librispeech,
  title={Multilingual LibriSpeech: A Corpus for Speech Recognition in Multiple Languages},
  author={Pratap, Vineel and Xu, Qiantong and Sriram, Anuroop and Synnaeve, Gabriel and Collobert, Ronan},
  booktitle={Interspeech},
  year={2020}
 }
 ```
 ## Additional Resources
 - **Fine-Tuning Guide:** [Complete tutorial](https://github.com/pierre-cheneau/finetune-moonshine-asr)
 - **Original Moonshine:** [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny)
 - **Dataset:** [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech)
 - **Issues/Support:** [GitHub Issues](https://github.com/pierre-cheneau/finetune-moonshine-asr/issues)
 ## License
 This model is released under the MIT License, consistent with the base Moonshine model.
 ```
 MIT License
 Copyright (c) 2026 Pierre Chéneau (Cornebidouil)
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction...
 ```
 ## Acknowledgments
 - **Useful Sensors** for the original Moonshine architecture and pre-trained model
 - **Meta AI** for the Multilingual LibriSpeech dataset
 - **HuggingFace** for the transformers library and model hosting
 - **Schedule-Free Learning** for the optimizer implementation
 ---
 **Questions?** Open an issue on the [fine-tuning guide repository](https://github.com/pierre-cheneau/finetune-moonshine-asr) or check the documentation.
 **Want to fine-tune for your language?** See the [complete fine-tuning guide](https://github.com/pierre-cheneau/finetune-moonshine-asr).
--- a/config.json
+++ b/config.json
@@ -0,0 +1,33 @@
 {
  "architectures": [
    "MoonshineForConditionalGeneration"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "decoder_hidden_act": "silu",
  "decoder_num_attention_heads": 8,
  "decoder_num_hidden_layers": 6,
  "decoder_num_key_value_heads": 8,
  "decoder_start_token_id": 1,
  "dtype": "float32",
  "encoder_hidden_act": "gelu",
  "encoder_num_attention_heads": 8,
  "encoder_num_hidden_layers": 6,
  "encoder_num_key_value_heads": 8,
  "eos_token_id": 2,
  "hidden_size": 288,
  "initializer_range": 0.02,
  "intermediate_size": 1152,
  "is_encoder_decoder": true,
  "max_position_embeddings": 194,
  "model_type": "moonshine",
  "pad_head_dim_to_multiple_of": 8,
  "pad_token_id": 2,
  "partial_rotary_factor": 0.9,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "transformers_version": "4.57.6",
  "use_cache": false,
  "vocab_size": 32768
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,16 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "decoder_start_token_id": 1,
  "early_stopping": true,
  "eos_token_id": [
    2
  ],
  "length_penalty": 1.2,
  "max_length": 194,
  "no_repeat_ngram_size": 2,
  "num_beams": 5,
  "pad_token_id": 2,
  "repetition_penalty": 1.2,
  "transformers_version": "4.57.6"
 }
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:f980d9f1b34ec113b6abe8cb79e4c30f5caf42aaa5af2714b41f14d298101b59
 size 108389192
--- a/preprocessor_config.json
+++ b/preprocessor_config.json
@@ -0,0 +1,10 @@
 {
  "do_normalize": false,
  "feature_extractor_type": "Wav2Vec2FeatureExtractor",
  "feature_size": 1,
  "padding_side": "right",
  "padding_value": 0.0,
  "processor_class": "Wav2Vec2Processor",
  "return_attention_mask": true,
  "sampling_rate": 16000
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
 {
  "bos_token": {
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json