Files
ModelHub XC 0c36810a35 初始化项目,由ModelHub XC社区提供模型
Model: RAS1981/qwen3-0.6b-turn-detection-v1
Source: Original Platform
2026-06-17 15:14:19 +08:00

4.4 KiB

base_model, tags, license, language, library_name, datasets
base_model tags license language library_name datasets
unsloth/qwen3-0.6b-unsloth-bnb-4bit
text-generation-inference
transformers
unsloth
qwen3
apache-2.0
en
transformers
RAS1981/turn-detection-probability-balanced

🇷🇺 Qwen3-0.6B Turn Detection (Probability-Based)

This model is a specialized conversational boundary detector for Russian real-estate dialogues.

It predicts the probability that a user has finished their turn (<|im_end|>) versus continuing their sentence. It is fine-tuned using Single-Token Loss Masking on a balanced dataset of ~20k complete and incomplete conversational turns.

🚀 Key Features

  • Base Model: unsloth/Qwen3-0.6B (fast, efficient, good Russian support).
  • Method: Probability-based Turn Detection. Instead of a binary classifier head, it uses the model's intrinsic next-token prediction.
  • Performance:
    • Complete Turns: Predicts <|im_end|> with high confidence (>90%).
    • Incomplete Turns: Predicts the continuation word (next token), assigning near-zero probability to <|im_end|>.
  • Latency: Extremely fast inference on CPU/GPU due to 0.6B size.

📊 Training Data

Trained on RAS1981/turn-detection-probability-balanced.

  • Contrastive Pairs: Each complete sentence has a corresponding incomplete version.
  • Balanced: 50% complete turns, 50% incomplete turns.
  • Domain: Russian real-estate inquiries (renting, buying, viewing).

🛠️ How to Use (Inference)

1. Load Model & Tokenizer

from unsloth import FastLanguageModel
import torch

model_name = "RAS1981/qwen3-0.6b-turn-detection-probability-balanced"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)
EOS_ID = tokenizer.eos_token_id # 151645 for Qwen

2. Predict Turn Completion Probability

The core idea is to check the probability of the End-of-Sequence (EOS) token.

@torch.no_grad()
def get_eos_prob(text):
    # Prepare chat template
    messages = [
        {"role": "system", "content": "Ты определяешь конец реплики пользователя по смыслу."},
        {"role": "user", "content": text}
    ]
    
    # Format prompt WITHOUT generation prompt
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
    
    # Tokenize and STRIP trailing EOS if present (critical step!)
    prompt_ids = tokenizer(prompt, add_special_tokens=False).input_ids
    
    # Qwen adds <|im_end|>\n automatically. Strip them to predict the boundary.
    if len(prompt_ids) > 2 and prompt_ids[-1] == 198 and prompt_ids[-2] == 151645:
        prompt_ids = prompt_ids[:-2]
    elif len(prompt_ids) > 1 and prompt_ids[-1] == 151645:
        prompt_ids = prompt_ids[:-1]
        
    inputs = torch.tensor([prompt_ids]).to("cuda")
    
    # Get logits for the LAST token position
    logits = model(inputs).logits[:, -1, :]
    
    # Calculate probability of EOS token
    prob = torch.softmax(logits, dim=-1)[0, EOS_ID].item()
    return prob

# Example Usage
print(get_eos_prob("До свидания."))          # High Prob (e.g., 0.96) -> Turn Complete
print(get_eos_prob("Я хотел бы узнать...")) # Low Prob (e.g., 0.00) -> Turn Incomplete

📈 Evaluation Results

Phrase Type EOS Probability Interpretation
"До свидания." Complete 0.9626 CONFIDENT END
"Алло, здравствуйте" Ambiguous 0.2599 WAIT (User likely continues)
"Я хотел бы узнать про" Incomplete 0.0000 CONFIDENT CONTINUE
"Нет, вы знаете, я наверное" Incomplete 0.0000 CONFIDENT CONTINUE

Threshold Recommendation

  • Turn Complete: prob > 0.5 (Safe default)
  • Turn Incomplete: prob <= 0.5

🧠 Methodology: Single-Token Loss Masking

We trained the model to optimize the loss only on the final token.

  • For complete examples, the target label is <|im_end|>.
  • For incomplete examples, the target label is the actual next word.
  • All previous tokens are masked with -100 in the loss function.

This forces the model to focus purely on the boundary condition: "Given this context, does the turn end here or continue?"

📜 License

Apache 2.0