初始化项目，由ModelHub XC社区提供模型

Model: reazon-research/japanese-hubert-base-k2-rs35kh Source: Original Platform
2026-05-08 11:40:38 +08:00
commit f2177f9d4e
8 changed files with 5490 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,91 @@
+---
+library_name: transformers
+tags:
+- automatic-speech-recognition
+- hubert
+- k2ssl
+datasets:
+- reazon-research/reazonspeech
+language:
+- ja
+metrics:
+- cer
+base_model:
+- reazon-research/japanese-hubert-base-k2
+license: apache-2.0
+pipeline_tag: automatic-speech-recognition
+---
+
+# `japanese-hubert-base-k2-rs35kh`
+
+This model is a [Hubert Base](https://huggingface.co/reazon-research/japanese-hubert-base-k2) fine-tuned on the large-scale Japanese ASR corpus [ReazonSpeech v2.0](https://huggingface.co/datasets/reazon-research/reazonspeech) using the k2 framework.
+
+## Usage
+
+You can use this model through `transformers` library:
+```python
+import librosa
+import numpy as np
+from transformers import AutoProcessor, HubertForCTC
+
+model = HubertForCTC.from_pretrained(
+    "reazon-research/japanese-hubert-base-k2-rs35kh",
+    torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
+).to("cuda")
+processor = AutoProcessor.from_pretrained("reazon-research/japanese-hubert-base-k2-rs35kh")
+
+audio, _ = librosa.load(audio_filepath, sr=16_000)
+audio = np.pad(audio, pad_width=int(0.5 * 16_000))  # Recommend to pad audio before inference
+input_values = processor(
+    audio,
+    return_tensors="pt",
+    sampling_rate=16_000
+).input_values.to("cuda").to(torch.bfloat16)
+
+with torch.inference_mode():
+    logits = model(input_values).logits.cpu()
+predicted_ids = torch.argmax(logits, dim=-1)[0]
+transcription = processor.decode(predicted_ids, skip_special_tokens=True)
+```
+
+## Test Results
+
+We report the Character Error Rate (CER) of our model and the other wav2vec2 families.
+| Model                                              | #Prameters⬇ |  AVERAGE⬇  | JSUT-BASIC5000⬇ | Common Voice⬇ | TEDxJP-10K⬇ |
+| :------------------------------------------------- | :---------: | :--------: | :-------------: | :-----------: | :---------: |
+| reazon-research/japanese-wav2vec2-large-rs35kh     |     319M    |   16.25%   |     11.00%      |    18.23%     |   19.53%    |
+| reazon-research/japanese-wav2vec2-base-rs35kh      |    96.7M    |   20.40%   |     13.22%      |    23.76%     |   24.23%    |
+| reazon-research/japanese-hubert-base-k2-rs35kh     |    98.4M    |   11.23%   |      9.94%      |    11.59%     |   12.18%    |
+| reazon-research/japanese-hubert-base-k2-rs35kh-bpe |    98.4M    | **11.07%** |    **9.76%**    |  **11.36%**   | **12.10%**  |
+
+We also report the CER for long-form speech.
+| Model                                                   | #Prameters⬇ | JSUT-BOOK⬇ |
+| :------------------------------------------------------ | :---------: | :--------: |
+| reazon-research/japanese-wav2vec2-large-rs35kh          |     319M    |   30.98%   |
+| reazon-research/japanese-wav2vec2-base-rs35kh           |    96.7M    |   82.84%   |
+| reazon-research/japanese-hubert-base-k2-rs35kh          |    98.4M    | **27.05%** |
+|  + [Silero VAD](https://github.com/snakers4/silero-vad) |             | **19.59%** |
+| reazon-research/japanese-hubert-base-k2-rs35kh-bpe      |    98.4M    |   84.55%   |
+|  + [Silero VAD](https://github.com/snakers4/silero-vad) |             | **19.34%** |
+
+## Citation
+```bibtex
+@misc{japanese-hubert-base-k2-rs35kh,
+  title={japanese-hubert-base-k2-rs35kh},
+  author={Sasaki, Yuta},
+  url = {https://huggingface.co/reazon-research/japanese-hubert-base-k2-rs35kh},
+  year = {2025}
+}
+
+@article{yang2024k2ssl,
+  title={k2SSL: A faster and better framework for self-supervised speech representation learning},
+  author={Yang, Yifan and Zhuo, Jianheng and Jin, Zengrui and Ma, Ziyang and Yang, Xiaoyu and Yao, Zengwei and Guo, Liyong and Kang, Wei and Kuang, Fangjun and Lin, Long and others},
+  journal={arXiv preprint arXiv:2411.17100},
+  year={2024}
+}
+```
+
+## License
+
+[Apache Licence 2.0](https://choosealicense.com/licenses/apache-2.0/)