Scicom-intl/Malaysian-Turn-Detector-Qwen3-1.7B

Go to file

ModelHub XC 9fea2f6fd2 初始化项目，由ModelHub XC社区提供模型

Model: Scicom-intl/Malaysian-Turn-Detector-Qwen3-1.7B
Source: Original Platform

2026-05-12 20:02:33 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 20:02:33 +08:00

README.md

license, base_model, language, tags, pipeline_tag

license

base_model

language

Turn Detector Qwen3-1.7B

Fine-tuned Qwen3-1.7B for real-time turn-end detection in multilingual call center conversations.

The model predicts P(<|im_end|>) — the probability that a speaker has finished their turn. Designed for low-latency voice agent pipelines (e.g. LiveKit) to determine when to respond.

How It Works

Given a conversation so far, the model outputs the probability of <|im_end|> as the next token:

P(im_end) > 0.5 → speaker is done talking (turn complete)
P(im_end) < 0.5 → speaker is still talking (turn incomplete)

Usage

import torch
import math
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Scicom-intl/Malaysian-Turn-Detector-Qwen3-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda().eval()

IM_END_ID = tokenizer.convert_tokens_to_ids("<|im_end|>")

def get_turn_end_prob(text):
    if text.endswith("<|im_end|>"):
        text = text[:-len("<|im_end|>")]
    inputs = tokenizer(text, return_tensors="pt").to("cuda")
    with torch.no_grad():
        logits = model(**inputs).logits
    prob = F.softmax(logits[0, -1], dim=-1)[IM_END_ID].item()
    return prob

Eval Results

Test set: 1200 samples (600 positive + 600 negative), 50 conversations per language pair.

Overall (threshold = 0.5)

Metric	Score
Accuracy	96.67%
Precision	99.82%
Recall	93.50%
F1	96.56%

Per Language

Language Pair	Overall	Positive	Negative
chinese-english	95.00%	90.00%	100.00%
chinese-malay	97.00%	94.00%	100.00%
chinese-tamil	97.00%	94.00%	100.00%
english-chinese	97.00%	96.00%	98.00%
english-malay	94.00%	88.00%	100.00%
english-tamil	95.00%	90.00%	100.00%
malay-chinese	97.00%	94.00%	100.00%
malay-english	96.00%	92.00%	100.00%
malay-tamil	97.00%	94.00%	100.00%
tamil-chinese	100.00%	100.00%	100.00%
tamil-english	97.00%	94.00%	100.00%
tamil-malay	98.00%	96.00%	100.00%

Threshold Sweep

Threshold	Accuracy	Precision	Recall	F1
0.1	99.00%	99.66%	98.33%	98.99%
0.2	98.67%	99.66%	97.67%	98.65%
0.3	98.00%	99.66%	96.33%	97.97%
0.4	97.58%	99.65%	95.50%	97.53%
0.5	96.67%	99.82%	93.50%	96.56%
0.6	95.50%	99.82%	91.17%	95.30%
0.7	93.67%	99.81%	87.50%	93.25%
0.8	91.17%	100.00%	82.33%	90.31%
0.9	83.83%	100.00%	67.67%	80.72%

Confusion Matrix (threshold = 0.5)

	Pred Pos	Pred Neg
Actual Pos	561	39
Actual Neg	1	599

Probability Distribution

Class	Mean	Median	Min	Max
Positive (turn complete)	0.8813	0.9673	0.0063	1.0000
Negative (turn incomplete)	0.0020	0.0000	0.0000	0.7022

Dataset

Tokenized parquet datasets (chinidataset format) available at Scicom-intl/turn-detector-Qwen3-0.6B-dataset.

turn-detector-Qwen3-0.6B-dataset/
├── train-merged/
├── train/
└── test/

Training

Base model: Qwen/Qwen3-1.7B
Training data: Positive samples only (complete conversations ending with <|im_end|>)
Loss: Liger Fused Linear Cross Entropy
Attention: Flash Attention 3
Precision: bfloat16
Block size: 8192 (multipacked)
Batch size: 2 x 16 gradient accumulation
Learning rate: 2e-5 (constant)
Epochs: 1

Training Data Sources

Dataset	Source
Call Center Language Switching	https://huggingface.co/datasets/Scicom-intl/Call-Center-Language-Switching
Function Call	https://huggingface.co/datasets/Scicom-intl/Function-Call
Malaysian Multiturn Chat Assistant	https://huggingface.co/datasets/mesolitica/Malaysian-Multiturn-Chat-Assistant
Malaysian Speech Instructions	https://huggingface.co/datasets/mesolitica/Malaysian-Speech-Instructions