--- license: apache-2.0 base_model: Qwen/Qwen3-1.7B language: - ms - en - zh - ta tags: - turn-detection - call-center - code-switching - multilingual pipeline_tag: text-generation --- # Turn Detector Qwen3-1.7B Fine-tuned [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) for **real-time turn-end detection** in multilingual call center conversations. The model predicts `P(<|im_end|>)` — the probability that a speaker has finished their turn. Designed for low-latency voice agent pipelines (e.g. LiveKit) to determine when to respond. ## How It Works Given a conversation so far, the model outputs the probability of `<|im_end|>` as the next token: - **P(im_end) > 0.5** → speaker is done talking (turn complete) - **P(im_end) < 0.5** → speaker is still talking (turn incomplete) ## Usage ```python import torch import math import torch.nn.functional as F from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "Scicom-intl/Malaysian-Turn-Detector-Qwen3-1.7B" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda().eval() IM_END_ID = tokenizer.convert_tokens_to_ids("<|im_end|>") def get_turn_end_prob(text): if text.endswith("<|im_end|>"): text = text[:-len("<|im_end|>")] inputs = tokenizer(text, return_tensors="pt").to("cuda") with torch.no_grad(): logits = model(**inputs).logits prob = F.softmax(logits[0, -1], dim=-1)[IM_END_ID].item() return prob ```` ## Eval Results **Test set:** 1200 samples (600 positive + 600 negative), 50 conversations per language pair. ### Overall (threshold = 0.5) | Metric | Score | | --------- | ------ | | Accuracy | 96.67% | | Precision | 99.82% | | Recall | 93.50% | | F1 | 96.56% | ### Per Language | Language Pair | Overall | Positive | Negative | | --------------- | ------- | -------- | -------- | | chinese-english | 95.00% | 90.00% | 100.00% | | chinese-malay | 97.00% | 94.00% | 100.00% | | chinese-tamil | 97.00% | 94.00% | 100.00% | | english-chinese | 97.00% | 96.00% | 98.00% | | english-malay | 94.00% | 88.00% | 100.00% | | english-tamil | 95.00% | 90.00% | 100.00% | | malay-chinese | 97.00% | 94.00% | 100.00% | | malay-english | 96.00% | 92.00% | 100.00% | | malay-tamil | 97.00% | 94.00% | 100.00% | | tamil-chinese | 100.00% | 100.00% | 100.00% | | tamil-english | 97.00% | 94.00% | 100.00% | | tamil-malay | 98.00% | 96.00% | 100.00% | ### Threshold Sweep | Threshold | Accuracy | Precision | Recall | F1 | | --------- | ---------- | ---------- | ---------- | ---------- | | 0.1 | 99.00% | 99.66% | 98.33% | 98.99% | | 0.2 | 98.67% | 99.66% | 97.67% | 98.65% | | 0.3 | 98.00% | 99.66% | 96.33% | 97.97% | | 0.4 | 97.58% | 99.65% | 95.50% | 97.53% | | **0.5** | **96.67%** | **99.82%** | **93.50%** | **96.56%** | | 0.6 | 95.50% | 99.82% | 91.17% | 95.30% | | 0.7 | 93.67% | 99.81% | 87.50% | 93.25% | | 0.8 | 91.17% | 100.00% | 82.33% | 90.31% | | 0.9 | 83.83% | 100.00% | 67.67% | 80.72% | ### Confusion Matrix (threshold = 0.5) | | Pred Pos | Pred Neg | | ---------- | -------- | -------- | | Actual Pos | 561 | 39 | | Actual Neg | 1 | 599 | ### Probability Distribution | Class | Mean | Median | Min | Max | | -------------------------- | ------ | ------ | ------ | ------ | | Positive (turn complete) | 0.8813 | 0.9673 | 0.0063 | 1.0000 | | Negative (turn incomplete) | 0.0020 | 0.0000 | 0.0000 | 0.7022 | ## Dataset Tokenized parquet datasets (chinidataset format) available at [Scicom-intl/turn-detector-Qwen3-0.6B-dataset](https://huggingface.co/datasets/Scicom-intl/turn-detector-Qwen3-0.6B-dataset). ``` turn-detector-Qwen3-0.6B-dataset/ ├── train-merged/ ├── train/ └── test/ ``` ## Training * **Base model:** Qwen/Qwen3-1.7B * **Training data:** Positive samples only (complete conversations ending with `<|im_end|>`) * **Loss:** Liger Fused Linear Cross Entropy * **Attention:** Flash Attention 3 * **Precision:** bfloat16 * **Block size:** 8192 (multipacked) * **Batch size:** 2 x 16 gradient accumulation * **Learning rate:** 2e-5 (constant) * **Epochs:** 1 ### Training Data Sources | Dataset | Source | | ---------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Call Center Language Switching | [https://huggingface.co/datasets/Scicom-intl/Call-Center-Language-Switching](https://huggingface.co/datasets/Scicom-intl/Call-Center-Language-Switching) | | Function Call | [https://huggingface.co/datasets/Scicom-intl/Function-Call](https://huggingface.co/datasets/Scicom-intl/Function-Call) | | Malaysian Multiturn Chat Assistant | [https://huggingface.co/datasets/mesolitica/Malaysian-Multiturn-Chat-Assistant](https://huggingface.co/datasets/mesolitica/Malaysian-Multiturn-Chat-Assistant) | | Malaysian Speech Instructions | [https://huggingface.co/datasets/mesolitica/Malaysian-Speech-Instructions](https://huggingface.co/datasets/mesolitica/Malaysian-Speech-Instructions) | ```