269 lines
8.5 KiB
Markdown
269 lines
8.5 KiB
Markdown
|
|
---
|
|||
|
|
language:
|
|||
|
|
- hi
|
|||
|
|
- bn
|
|||
|
|
- ta
|
|||
|
|
- te
|
|||
|
|
- ml
|
|||
|
|
- gu
|
|||
|
|
- kn
|
|||
|
|
- mr
|
|||
|
|
- or
|
|||
|
|
- pa
|
|||
|
|
- as
|
|||
|
|
- mai
|
|||
|
|
- sa
|
|||
|
|
- sd
|
|||
|
|
license: apache-2.0
|
|||
|
|
tags:
|
|||
|
|
- causal-lm
|
|||
|
|
- assistant
|
|||
|
|
- reasoning
|
|||
|
|
- multilingual
|
|||
|
|
- indic
|
|||
|
|
model_name: dheeyantra/dhee-nxtgen-qwen3-indic
|
|||
|
|
library_name: transformers
|
|||
|
|
---
|
|||
|
|
# Dhee-NxtGen-Qwen3-Indic (4B)
|
|||
|
|
|
|||
|
|
## Model Description
|
|||
|
|
**Dhee-NxtGen-Qwen3-Indic** is a **single, unified 4B-parameter multilingual large language model** developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**
|
|||
|
|
|
|||
|
|
Built on the **Qwen3-4B architecture**, the model is created to support **assistant-style conversations**, **reasoning**, and **function-calling–compatible workflows** across **14 Indian (Indic) languages** within one shared model.
|
|||
|
|
|
|||
|
|
The model is optimized for **native-script generation**, consistent multilingual behavior, and cross-lingual generalization.
|
|||
|
|
---
|
|||
|
|
## Supported Languages
|
|||
|
|
This single model supports the following Indic languages:
|
|||
|
|
|
|||
|
|
- Hindi (hi)
|
|||
|
|
- Bengali (bn)
|
|||
|
|
- Tamil (ta)
|
|||
|
|
- Telugu (te)
|
|||
|
|
- Malayalam (ml)
|
|||
|
|
- Gujarati (gu)
|
|||
|
|
- Kannada (kn)
|
|||
|
|
- Marathi (mr)
|
|||
|
|
- Odia (or)
|
|||
|
|
- Punjabi (pa)
|
|||
|
|
- Assamese (as)
|
|||
|
|
- Maithili (mai)
|
|||
|
|
- Sanskrit (sa)
|
|||
|
|
- Sindhi (sd)
|
|||
|
|
|
|||
|
|
> Best results are achieved when prompts are written entirely in the target language.
|
|||
|
|
---
|
|||
|
|
## Key Features
|
|||
|
|
- Single multilingual 4B model (no per-language checkpoints)
|
|||
|
|
- Fluent, native-script text generation across 14 Indic languages
|
|||
|
|
- Optimized for assistant-style and reasoning-based dialogue
|
|||
|
|
- Supports summarization, Q&A, and long-form generation
|
|||
|
|
- Compatible with function-calling style prompting
|
|||
|
|
- Fully compatible with Hugging Face Transformers
|
|||
|
|
- Ready for high-throughput inference using vLLM
|
|||
|
|
---
|
|||
|
|
## Example Usage
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import torch
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
|
|||
|
|
# 1. Configuration
|
|||
|
|
model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
|
|||
|
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|||
|
|
|
|||
|
|
# 2. Load Model and Tokenizer
|
|||
|
|
print(f"Loading model: {model_name}...")
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
trust_remote_code=True,
|
|||
|
|
device_map="auto",
|
|||
|
|
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 3. Define the Prompt
|
|||
|
|
# Using the ChatML format expected by Qwen-based architectures
|
|||
|
|
prompt = """<|im_start|>system
|
|||
|
|
You are a helpful multilingual assistant.<|im_end|>
|
|||
|
|
<|im_start|>user
|
|||
|
|
क्या आप मेरे लिए एक अपॉइंटमेंट बुक कर सकते हैं?
|
|||
|
|
अगर हाँ, तो कृपया मुझसे ज़रूरी जानकारी जैसे तारीख, समय और उद्देश्य पूछिए।<|im_end|>
|
|||
|
|
<|im_start|>assistant
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
# 4. Process and Generate
|
|||
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|||
|
|
|
|||
|
|
print("Generating response...")
|
|||
|
|
with torch.no_grad():
|
|||
|
|
outputs = model.generate(
|
|||
|
|
**inputs,
|
|||
|
|
max_new_tokens=256,
|
|||
|
|
temperature=0.7,
|
|||
|
|
top_p=0.9,
|
|||
|
|
do_sample=True,
|
|||
|
|
pad_token_id=tokenizer.eos_token_id
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 5. Decode and Print
|
|||
|
|
# We only want to print the newly generated text (the assistant's reply)
|
|||
|
|
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|||
|
|
response = full_output.split("assistant")[-1].strip()
|
|||
|
|
|
|||
|
|
print("-" * 30)
|
|||
|
|
print(f"Assistant: {response}")
|
|||
|
|
print("-" * 30)
|
|||
|
|
```
|
|||
|
|
## Function/Tool Calling Example Usage
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import torch
|
|||
|
|
import json
|
|||
|
|
import re
|
|||
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|||
|
|
|
|||
|
|
# --- 1. MODEL SETUP ---
|
|||
|
|
model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|||
|
|
model_name,
|
|||
|
|
trust_remote_code=True,
|
|||
|
|
device_map="auto",
|
|||
|
|
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# --- 2. TOOLS & SYSTEM PROMPT ---
|
|||
|
|
tools = [{
|
|||
|
|
"name": "book_appointment",
|
|||
|
|
"description": "Book an appointment for the user.",
|
|||
|
|
"parameters": {
|
|||
|
|
"type": "object",
|
|||
|
|
"properties": {
|
|||
|
|
"date": {"type": "string", "description": "Date in YYYY-MM-DD format"},
|
|||
|
|
"time": {"type": "string", "description": "Time in HH:MM (24h) format"},
|
|||
|
|
"purpose": {"type": "string", "description": "The medical reason or department"}
|
|||
|
|
},
|
|||
|
|
"required": ["date", "time", "purpose"]
|
|||
|
|
}
|
|||
|
|
}]
|
|||
|
|
|
|||
|
|
# We provide the current date so the model can resolve "tomorrow" or "next Monday"
|
|||
|
|
SYSTEM_PROMPT = f"""You are a helpful AI assistant.
|
|||
|
|
Today's Date: 2026-01-08 (Thursday).
|
|||
|
|
|
|||
|
|
Available Tools:
|
|||
|
|
{json.dumps(tools, indent=2)}
|
|||
|
|
|
|||
|
|
Rules:
|
|||
|
|
1. If details (date, time, purpose) are missing, ask the user in Hindi.
|
|||
|
|
2. If all details are present, output ONLY a <tool_call> JSON.
|
|||
|
|
3. After a tool result is provided, confirm the booking to the user in Hindi."""
|
|||
|
|
|
|||
|
|
# --- 3. BACKEND FUNCTION ---
|
|||
|
|
def execute_booking(date, time, purpose):
|
|||
|
|
# Simulated backend logic
|
|||
|
|
if "10:00" in time: # Simulate a busy slot
|
|||
|
|
return {"status": "error", "message": "यह समय पहले से बुक है। कृपया कोई और समय चुनें।"}
|
|||
|
|
return {"status": "success", "id": "APP-9921", "doctor": "Dr. Verma"}
|
|||
|
|
|
|||
|
|
# --- 4. THE INTERACTION ENGINE ---
|
|||
|
|
def run_conversation(user_input, history):
|
|||
|
|
# Add user input to history
|
|||
|
|
history.append({"role": "user", "content": user_input})
|
|||
|
|
|
|||
|
|
# Construct ChatML prompt
|
|||
|
|
prompt = f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
|
|||
|
|
for msg in history:
|
|||
|
|
prompt += f"<|im_start|>{msg['role']}\n{msg['content']}<|im_end|>\n"
|
|||
|
|
prompt += "<|im_start|>assistant\n"
|
|||
|
|
|
|||
|
|
# Generate
|
|||
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|||
|
|
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
|
|||
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip()
|
|||
|
|
|
|||
|
|
# Check if the model wants to call a tool
|
|||
|
|
tool_match = re.search(r"<tool_call>(.*?)</tool_call>", response, re.DOTALL)
|
|||
|
|
|
|||
|
|
if tool_match:
|
|||
|
|
print(f"\n[MODEL REQUESTED TOOL]: {response}")
|
|||
|
|
call_data = json.loads(tool_match.group(1).strip())
|
|||
|
|
|
|||
|
|
# Execute the function
|
|||
|
|
result = execute_booking(**call_data['arguments'])
|
|||
|
|
print(f"[TOOL RESULT]: {result}")
|
|||
|
|
|
|||
|
|
# Feed result back to model for final response
|
|||
|
|
history.append({"role": "assistant", "content": response})
|
|||
|
|
history.append({"role": "system", "content": f"Tool Result: {json.dumps(result)}"})
|
|||
|
|
|
|||
|
|
# Final confirmation generation
|
|||
|
|
return run_conversation("Please confirm the result to the user.", history)
|
|||
|
|
|
|||
|
|
return response
|
|||
|
|
|
|||
|
|
# --- 5. TEST SCENARIO ---
|
|||
|
|
chat_history = []
|
|||
|
|
|
|||
|
|
print("--- Chatbot Started (Today is 2026-01-08) ---")
|
|||
|
|
|
|||
|
|
# Step 1: User provides partial info
|
|||
|
|
query_1 = "मेरे लिए डेंटिस्ट का अपॉइंटमेंट बुक करें।"
|
|||
|
|
print(f"\nUser: {query_1}")
|
|||
|
|
res_1 = run_conversation(query_1, chat_history)
|
|||
|
|
chat_history.append({"role": "assistant", "content": res_1})
|
|||
|
|
print(f"Assistant: {res_1}")
|
|||
|
|
|
|||
|
|
# Step 2: User provides the rest
|
|||
|
|
query_2 = "कल दोपहर 2 बजे।"
|
|||
|
|
print(f"\nUser: {query_2}")
|
|||
|
|
res_2 = run_conversation(query_2, chat_history)
|
|||
|
|
print(f"Assistant: {res_2}")
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
---
|
|||
|
|
## Prompting Guidelines
|
|||
|
|
- Use pure native-language prompts for best fluency
|
|||
|
|
- Avoid heavy code-mixing (e.g., Hinglish-heavy inputs)
|
|||
|
|
- Include a system prompt to stabilize multilingual behavior
|
|||
|
|
- Ask explicitly for step-by-step reasoning when required
|
|||
|
|
---
|
|||
|
|
## Intended Uses & Limitations
|
|||
|
|
|
|||
|
|
### Intended Uses
|
|||
|
|
- Multilingual Indic chatbots and AI assistants
|
|||
|
|
- Education, governance, and public-sector AI applications
|
|||
|
|
- Content generation and summarization in Indian languages
|
|||
|
|
- Cross-lingual conversational and reasoning systems
|
|||
|
|
|
|||
|
|
### Limitations
|
|||
|
|
- May occasionally hallucinate or produce inaccurate facts
|
|||
|
|
- Performance may vary slightly across languages
|
|||
|
|
- Not intended for medical, legal, or safety-critical use cases
|
|||
|
|
- Code-mixed inputs may reduce output quality
|
|||
|
|
---
|
|||
|
|
## vLLM / High-Performance Serving
|
|||
|
|
|
|||
|
|
### Requirements
|
|||
|
|
- NVIDIA GPU with compute capability ≥ 8.0 (A100 / H100 recommended)
|
|||
|
|
- PyTorch 2.1+ with CUDA installed
|
|||
|
|
- V100 (sm70) GPUs are not supported for vLLM GPU inference
|
|||
|
|
|
|||
|
|
### Installation
|
|||
|
|
```bash
|
|||
|
|
pip install torch transformers vllm sentencepiece
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Run vLLM Server
|
|||
|
|
```bash
|
|||
|
|
vllm serve --model dheeyantra/dhee-nxtgen-qwen3-indic --host 0.0.0.0 --port 8000
|
|||
|
|
```
|
|||
|
|
---
|
|||
|
|
## License
|
|||
|
|
Released under the **Apache 2.0 License**.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
Developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**
|