decoder for open vocabulary keyword spotting (#505)

* various fixes to ContextGraph to support open vocabulary keywords decoder

* Add keyword spotter runtime

* Add binary

* First version works

* Minor fixes

* update text2token

* default values

* Add jni for kws

* add kws android project

* Minor fixes

* Remove unused interface

* Minor fixes

* Add workflow

* handle extra info in texts

* Minor fixes

* Add more comments

* Fix ci

* fix cpp style

* Add input box in android demo so that users can specify their keywords

* Fix cpp style

* Fix comments

* Minor fixes

* Minor fixes

* minor fixes

* Minor fixes

* Minor fixes

* Add CI

* Fix code style

* cpplint

* Fix comments

* Fix error
This commit is contained in:
Wei Kang
2024-01-20 22:52:41 +08:00
committed by GitHub
parent bf1dd3daf6
commit b6c020901a
77 changed files with 3316 additions and 68 deletions

View File

@@ -26,7 +26,32 @@ namespace sherpa_onnx {
* otherwise returns false.
*/
bool EncodeHotwords(std::istream &is, const SymbolTable &symbol_table,
std::vector<std::vector<int32_t>> *hotwords);
std::vector<std::vector<int32_t>> *hotwords_id);
/* Encode the keywords in an input stream to be tokens ids.
*
* @param is The input stream, it contains several lines, one hotword for each
* line. For each hotword, the tokens (cjkchar or bpe) are separated
* by spaces, it might contain boosting score (starting with :),
* triggering threshold (starting with #) and keyword string (starting
* with @) too.
* @param symbol_table The tokens table mapping symbols to ids. All the symbols
* in the stream should be in the symbol_table, if not this
* function returns fasle.
*
* @param keywords_id The encoded ids to be written to.
* @param keywords The original keyword string to be written to.
* @param boost_scores The boosting score for each keyword to be written to.
* @param threshold The triggering threshold for each keyword to be written to.
*
* @return If all the symbols from ``is`` are in the symbol_table, returns true
* otherwise returns false.
*/
bool EncodeKeywords(std::istream &is, const SymbolTable &symbol_table,
std::vector<std::vector<int32_t>> *keywords_id,
std::vector<std::string> *keywords,
std::vector<float> *boost_scores,
std::vector<float> *threshold);
} // namespace sherpa_onnx