Add LODR support to online and offline recognizers (#2026)

This PR integrates LODR (Level-Ordered Deterministic Rescoring) support from Icefall into both online and offline recognizers, enabling LODR for LM shallow fusion and LM rescore.

- Extended OnlineLMConfig and OfflineLMConfig to include lodr_fst, lodr_scale, and lodr_backoff_id.
- Implemented LodrFst and LodrStateCost classes and wired them into RNN LM scoring in both online and offline code paths.
- Updated Python bindings, CLI entry points, examples, and CI test scripts to accept and exercise the new LODR options.
This commit is contained in:
Askars Salimbajevs
2025-07-09 11:23:46 +03:00
committed by GitHub
parent 6122a678f5
commit f0960342ad
21 changed files with 613 additions and 14 deletions

View File

@@ -35,6 +35,25 @@ file(s) with a non-streaming model.
/path/to/0.wav \
/path/to/1.wav
also with RNN LM rescoring and LODR (optional):
./python-api-examples/offline-decode-files.py \
--tokens=/path/to/tokens.txt \
--encoder=/path/to/encoder.onnx \
--decoder=/path/to/decoder.onnx \
--joiner=/path/to/joiner.onnx \
--num-threads=2 \
--decoding-method=modified_beam_search \
--debug=false \
--sample-rate=16000 \
--feature-dim=80 \
--lm=/path/to/lm.onnx \
--lm-scale=0.1 \
--lodr-fst=/path/to/lodr.fst \
--lodr-scale=-0.1 \
/path/to/0.wav \
/path/to/1.wav
(3) For CTC models from NeMo
python3 ./python-api-examples/offline-decode-files.py \
@@ -269,6 +288,39 @@ def get_args():
default="greedy_search",
help="Valid values are greedy_search and modified_beam_search",
)
parser.add_argument(
"--lm",
metavar="file",
type=str,
default="",
help="Path to RNN LM model",
)
parser.add_argument(
"--lm-scale",
metavar="lm_scale",
type=float,
default=0.1,
help="LM model scale for rescoring",
)
parser.add_argument(
"--lodr-fst",
metavar="file",
type=str,
default="",
help="Path to LODR FST model. Used only when --lm is given.",
)
parser.add_argument(
"--lodr-scale",
metavar="lodr_scale",
type=float,
default=-0.1,
help="LODR scale for rescoring.Used only when --lodr_fst is given.",
)
parser.add_argument(
"--debug",
type=bool,
@@ -364,6 +416,10 @@ def main():
num_threads=args.num_threads,
sample_rate=args.sample_rate,
feature_dim=args.feature_dim,
lm=args.lm,
lm_scale=args.lm_scale,
lodr_fst=args.lodr_fst,
lodr_scale=args.lodr_scale,
decoding_method=args.decoding_method,
hotwords_file=args.hotwords_file,
hotwords_score=args.hotwords_score,

View File

@@ -21,6 +21,22 @@ rm sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2
./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/1.wav \
./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/8k.wav
or with RNN LM rescoring and LODR:
./python-api-examples/online-decode-files.py \
--tokens=./sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt \
--encoder=./sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-64.onnx \
--decoder=./sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-64.onnx \
--joiner=./sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-64.onnx \
--decoding-method=modified_beam_search \
--lm=/path/to/lm.onnx \
--lm-scale=0.1 \
--lodr-fst=/path/to/lodr.fst \
--lodr-scale=-0.1 \
./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/0.wav \
./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/1.wav \
./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/8k.wav
(2) Streaming paraformer
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2
@@ -186,6 +202,22 @@ def get_args():
""",
)
parser.add_argument(
"--lodr-fst",
metavar="file",
type=str,
default="",
help="Path to LODR FST model. Used only when --lm is given.",
)
parser.add_argument(
"--lodr-scale",
metavar="lodr_scale",
type=float,
default=-0.1,
help="LODR scale for rescoring.Used only when --lodr_fst is given.",
)
parser.add_argument(
"--provider",
type=str,
@@ -320,6 +352,8 @@ def main():
max_active_paths=args.max_active_paths,
lm=args.lm,
lm_scale=args.lm_scale,
lodr_fst=args.lodr_fst,
lodr_scale=args.lodr_scale,
hotwords_file=args.hotwords_file,
hotwords_score=args.hotwords_score,
modeling_unit=args.modeling_unit,