license, base_model, language, tags, pipeline_tag
license base_model language tags pipeline_tag
apache-2.0 Qwen/Qwen3-1.7B
ms
en
zh
ta
turn-detection
call-center
code-switching
multilingual
text-generation

Turn Detector Qwen3-1.7B

Fine-tuned Qwen3-1.7B for real-time turn-end detection in multilingual call center conversations.

The model predicts P(<|im_end|>) — the probability that a speaker has finished their turn. Designed for low-latency voice agent pipelines (e.g. LiveKit) to determine when to respond.

How It Works

Given a conversation so far, the model outputs the probability of <|im_end|> as the next token:

  • P(im_end) > 0.5 → speaker is done talking (turn complete)
  • P(im_end) < 0.5 → speaker is still talking (turn incomplete)

Usage

import torch
import math
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Scicom-intl/Malaysian-Turn-Detector-Qwen3-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda().eval()

IM_END_ID = tokenizer.convert_tokens_to_ids("<|im_end|>")

def get_turn_end_prob(text):
    if text.endswith("<|im_end|>"):
        text = text[:-len("<|im_end|>")]
    inputs = tokenizer(text, return_tensors="pt").to("cuda")
    with torch.no_grad():
        logits = model(**inputs).logits
    prob = F.softmax(logits[0, -1], dim=-1)[IM_END_ID].item()
    return prob

Eval Results

Test set: 1200 samples (600 positive + 600 negative), 50 conversations per language pair.

Overall (threshold = 0.5)

Metric Score
Accuracy 96.67%
Precision 99.82%
Recall 93.50%
F1 96.56%

Per Language

Language Pair Overall Positive Negative
chinese-english 95.00% 90.00% 100.00%
chinese-malay 97.00% 94.00% 100.00%
chinese-tamil 97.00% 94.00% 100.00%
english-chinese 97.00% 96.00% 98.00%
english-malay 94.00% 88.00% 100.00%
english-tamil 95.00% 90.00% 100.00%
malay-chinese 97.00% 94.00% 100.00%
malay-english 96.00% 92.00% 100.00%
malay-tamil 97.00% 94.00% 100.00%
tamil-chinese 100.00% 100.00% 100.00%
tamil-english 97.00% 94.00% 100.00%
tamil-malay 98.00% 96.00% 100.00%

Threshold Sweep

Threshold Accuracy Precision Recall F1
0.1 99.00% 99.66% 98.33% 98.99%
0.2 98.67% 99.66% 97.67% 98.65%
0.3 98.00% 99.66% 96.33% 97.97%
0.4 97.58% 99.65% 95.50% 97.53%
0.5 96.67% 99.82% 93.50% 96.56%
0.6 95.50% 99.82% 91.17% 95.30%
0.7 93.67% 99.81% 87.50% 93.25%
0.8 91.17% 100.00% 82.33% 90.31%
0.9 83.83% 100.00% 67.67% 80.72%

Confusion Matrix (threshold = 0.5)

Pred Pos Pred Neg
Actual Pos 561 39
Actual Neg 1 599

Probability Distribution

Class Mean Median Min Max
Positive (turn complete) 0.8813 0.9673 0.0063 1.0000
Negative (turn incomplete) 0.0020 0.0000 0.0000 0.7022

Dataset

Tokenized parquet datasets (chinidataset format) available at Scicom-intl/turn-detector-Qwen3-0.6B-dataset.

turn-detector-Qwen3-0.6B-dataset/
├── train-merged/
├── train/
└── test/

Training

  • Base model: Qwen/Qwen3-1.7B
  • Training data: Positive samples only (complete conversations ending with <|im_end|>)
  • Loss: Liger Fused Linear Cross Entropy
  • Attention: Flash Attention 3
  • Precision: bfloat16
  • Block size: 8192 (multipacked)
  • Batch size: 2 x 16 gradient accumulation
  • Learning rate: 2e-5 (constant)
  • Epochs: 1

Training Data Sources

Dataset Source
Call Center Language Switching https://huggingface.co/datasets/Scicom-intl/Call-Center-Language-Switching
Function Call https://huggingface.co/datasets/Scicom-intl/Function-Call
Malaysian Multiturn Chat Assistant https://huggingface.co/datasets/mesolitica/Malaysian-Multiturn-Chat-Assistant
Malaysian Speech Instructions https://huggingface.co/datasets/mesolitica/Malaysian-Speech-Instructions
Description
Model synced from source: Scicom-intl/Malaysian-Turn-Detector-Qwen3-1.7B
Readme 2 MiB
Languages
Jinja 100%