初始化项目，由ModelHub XC社区提供模型

Model: TalTechNLP/xls-r-300m-et Source: Original Platform
2026-05-08 11:39:09 +08:00
commit 068d7ac5c1
8 changed files with 242 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,93 @@
+---
+license: cc-by-4.0
+tags:
+- audio
+- automatic-speech-recognition
+- hf-asr-leaderboard
+language: et
+model-index:
+- name: xls-r-300m-et
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 
+      type: common_voice
+      args: et
+    metrics:
+    - name: Test WER
+      type: wer
+      value: 12.520395591222402
+    - name: Test CER
+      type: cer
+      value: 2.7091152438624897
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 8
+      type: mozilla-foundation/common_voice_8_0
+      args: et
+    metrics:
+    - name: Test WER
+      type: wer
+      value: 13.38447882323104
+    - name: Test CER
+      type: cer
+      value: 2.9816686199500255
+---
+
+
+# XLS-R-300m-ET
+
+This is a XLS-R-300M model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) finetuned on around 800 hours of diverse Estonian data.
+
+## Model description
+This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech. It consists of only the CTC-based end-to-end model, no language model is currently provided.
+
+## Intended uses & limitations
+
+This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
+
+## How to use
+
+
+TODO
+
+#### Limitations and bias
+
+Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following:
+  * Speech containing technical and other domain-specific terms
+  * Children's speech
+  * Non-native speech
+  * Speech recorded under very noisy conditions or with a microphone far from the speaker
+  * Very spontaneous and overlapping speech
+
+## Training data
+Acoustic training data:
+
+| Type                  | Amount (h) |
+|-----------------------|:------:|
+| Broadcast speech      |   591  |
+| Spontaneous speech    |   53   |
+| Elderly speech corpus |   53   |
+| Talks, lectures       |   49   |
+| Parliament speeches   |   31   |
+| *Total*               |   *761*  |
+
+
+## Training procedure
+
+Finetuned using Fairseq.
+
+## Evaluation results
+
+### WER
+
+|Dataset | WER |
+|---|---|
+| jutusaated.devset | 7.9 |
+| jutusaated.testset | 6.1 |
+| Common Voice 6.1 | 12.5 |
+| Common Voice 8.0 | 13.4 |