dhee-nxtgen-qwen3-indic/README.md

---
language:
  - hi
  - bn
  - ta
  - te
  - ml
  - gu
  - kn
  - mr
  - or
  - pa
  - as
  - mai
  - sa
  - sd
license: apache-2.0
tags:
  - causal-lm
  - assistant
  - reasoning
  - multilingual
  - indic
model_name: dheeyantra/dhee-nxtgen-qwen3-indic
library_name: transformers
---
# Dhee-NxtGen-Qwen3-Indic (4B)

## Model Description
**Dhee-NxtGen-Qwen3-Indic** is a **single, unified 4B-parameter multilingual large language model** developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**

Built on the **Qwen3-4B architecture**, the model is created to support **assistant-style conversations**, **reasoning**, and **function-calling–compatible workflows** across **14 Indian (Indic) languages** within one shared model.

The model is optimized for **native-script generation**, consistent multilingual behavior, and cross-lingual generalization.
---
## Supported Languages
This single model supports the following Indic languages:

- Hindi (hi)
- Bengali (bn)
- Tamil (ta)
- Telugu (te)
- Malayalam (ml)
- Gujarati (gu)
- Kannada (kn)
- Marathi (mr)
- Odia (or)
- Punjabi (pa)
- Assamese (as)
- Maithili (mai)
- Sanskrit (sa)
- Sindhi (sd)

> Best results are achieved when prompts are written entirely in the target language.
---
## Key Features
- Single multilingual 4B model (no per-language checkpoints)
- Fluent, native-script text generation across 14 Indic languages
- Optimized for assistant-style and reasoning-based dialogue
- Supports summarization, Q&A, and long-form generation
- Compatible with function-calling style prompting
- Fully compatible with Hugging Face Transformers
- Ready for high-throughput inference using vLLM
---
## Example Usage

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# 1. Configuration
model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
device = "cuda" if torch.cuda.is_available() else "cpu"

# 2. Load Model and Tokenizer
print(f"Loading model: {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

# 3. Define the Prompt
# Using the ChatML format expected by Qwen-based architectures
prompt = """<|im_start|>system
You are a helpful multilingual assistant.<|im_end|>
<|im_start|>user
क्या आप मेरे लिए एक अपॉइंटमेंट बुक कर सकते हैं? 
अगर हाँ, तो कृपया मुझसे ज़रूरी जानकारी जैसे तारीख, समय और उद्देश्य पूछिए।<|im_end|>
<|im_start|>assistant
"""

# 4. Process and Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

print("Generating response...")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

# 5. Decode and Print
# We only want to print the newly generated text (the assistant's reply)
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = full_output.split("assistant")[-1].strip()

print("-" * 30)
print(f"Assistant: {response}")
print("-" * 30)
```
## Function/Tool Calling Example Usage

```python
import torch
import json
import re
from transformers import AutoTokenizer, AutoModelForCausalLM

# --- 1. MODEL SETUP ---
model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

# --- 2. TOOLS & SYSTEM PROMPT ---
tools = [{
    "name": "book_appointment",
    "description": "Book an appointment for the user.",
    "parameters": {
        "type": "object",
        "properties": {
            "date": {"type": "string", "description": "Date in YYYY-MM-DD format"},
            "time": {"type": "string", "description": "Time in HH:MM (24h) format"},
            "purpose": {"type": "string", "description": "The medical reason or department"}
        },
        "required": ["date", "time", "purpose"]
    }
}]

# We provide the current date so the model can resolve "tomorrow" or "next Monday"
SYSTEM_PROMPT = f"""You are a helpful AI assistant.
Today's Date: 2026-01-08 (Thursday).

Available Tools:
{json.dumps(tools, indent=2)}

Rules:
1. If details (date, time, purpose) are missing, ask the user in Hindi.
2. If all details are present, output ONLY a <tool_call> JSON.
3. After a tool result is provided, confirm the booking to the user in Hindi."""

# --- 3. BACKEND FUNCTION ---
def execute_booking(date, time, purpose):
    # Simulated backend logic
    if "10:00" in time: # Simulate a busy slot
        return {"status": "error", "message": "यह समय पहले से बुक है। कृपया कोई और समय चुनें।"}
    return {"status": "success", "id": "APP-9921", "doctor": "Dr. Verma"}

# --- 4. THE INTERACTION ENGINE ---
def run_conversation(user_input, history):
    # Add user input to history
    history.append({"role": "user", "content": user_input})
    
    # Construct ChatML prompt
    prompt = f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
    for msg in history:
        prompt += f"<|im_start|>{msg['role']}\n{msg['content']}<|im_end|>\n"
    prompt += "<|im_start|>assistant\n"

    # Generate
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip()

    # Check if the model wants to call a tool
    tool_match = re.search(r"<tool_call>(.*?)</tool_call>", response, re.DOTALL)
    
    if tool_match:
        print(f"\n[MODEL REQUESTED TOOL]: {response}")
        call_data = json.loads(tool_match.group(1).strip())
        
        # Execute the function
        result = execute_booking(**call_data['arguments'])
        print(f"[TOOL RESULT]: {result}")
        
        # Feed result back to model for final response
        history.append({"role": "assistant", "content": response})
        history.append({"role": "system", "content": f"Tool Result: {json.dumps(result)}"})
        
        # Final confirmation generation
        return run_conversation("Please confirm the result to the user.", history)
    
    return response

# --- 5. TEST SCENARIO ---
chat_history = []

print("--- Chatbot Started (Today is 2026-01-08) ---")

# Step 1: User provides partial info
query_1 = "मेरे लिए डेंटिस्ट का अपॉइंटमेंट बुक करें।"
print(f"\nUser: {query_1}")
res_1 = run_conversation(query_1, chat_history)
chat_history.append({"role": "assistant", "content": res_1})
print(f"Assistant: {res_1}")

# Step 2: User provides the rest
query_2 = "कल दोपहर 2 बजे।"
print(f"\nUser: {query_2}")
res_2 = run_conversation(query_2, chat_history)
print(f"Assistant: {res_2}")

```
---
## Prompting Guidelines
- Use pure native-language prompts for best fluency
- Avoid heavy code-mixing (e.g., Hinglish-heavy inputs)
- Include a system prompt to stabilize multilingual behavior
- Ask explicitly for step-by-step reasoning when required
---
## Intended Uses & Limitations

### Intended Uses
- Multilingual Indic chatbots and AI assistants
- Education, governance, and public-sector AI applications
- Content generation and summarization in Indian languages
- Cross-lingual conversational and reasoning systems

### Limitations
- May occasionally hallucinate or produce inaccurate facts
- Performance may vary slightly across languages
- Not intended for medical, legal, or safety-critical use cases
- Code-mixed inputs may reduce output quality
---
## vLLM / High-Performance Serving

### Requirements
- NVIDIA GPU with compute capability ≥ 8.0 (A100 / H100 recommended)
- PyTorch 2.1+ with CUDA installed
- V100 (sm70) GPUs are not supported for vLLM GPU inference

### Installation
```bash
pip install torch transformers vllm sentencepiece
```

### Run vLLM Server
```bash
vllm serve   --model dheeyantra/dhee-nxtgen-qwen3-indic   --host 0.0.0.0   --port 8000
```
---
## License
Released under the **Apache 2.0 License**.

---

Developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**
-												初始化项目，由ModelHub XC社区提供模型

Model: dheeyantra/dhee-nxtgen-qwen3-indic
Source: Original Platform

											
										
										
											2026-06-17 02:32:19 +08:00
+								---
 								language:
 								  - hi
 								  - bn
 								  - ta
 								  - te
 								  - ml
 								  - gu
 								  - kn
 								  - mr
 								  - or
 								  - pa
 								  - as
 								  - mai
 								  - sa
 								  - sd
 								license: apache-2.0
 								tags:
 								  - causal-lm
 								  - assistant
 								  - reasoning
 								  - multilingual
 								  - indic
 								model_name: dheeyantra/dhee-nxtgen-qwen3-indic
 								library_name: transformers
 								---
 								# Dhee-NxtGen-Qwen3-Indic (4B)
 								## Model Description
 								**Dhee-NxtGen-Qwen3-Indic** is a **single, unified 4B-parameter multilingual large language model** developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**
 								Built on the **Qwen3-4B architecture**, the model is created to support **assistant-style conversations**, **reasoning**, and **function-calling–compatible workflows** across **14 Indian (Indic) languages** within one shared model.
 								The model is optimized for **native-script generation**, consistent multilingual behavior, and cross-lingual generalization.
 								---
 								## Supported Languages
 								This single model supports the following Indic languages:
 								- Hindi (hi)
 								- Bengali (bn)
 								- Tamil (ta)
 								- Telugu (te)
 								- Malayalam (ml)
 								- Gujarati (gu)
 								- Kannada (kn)
 								- Marathi (mr)
 								- Odia (or)
 								- Punjabi (pa)
 								- Assamese (as)
 								- Maithili (mai)
 								- Sanskrit (sa)
 								- Sindhi (sd)
 								> Best results are achieved when prompts are written entirely in the target language.
 								---
 								## Key Features
 								- Single multilingual 4B model (no per-language checkpoints)
 								- Fluent, native-script text generation across 14 Indic languages
 								- Optimized for assistant-style and reasoning-based dialogue
 								- Supports summarization, Q&A, and long-form generation
 								- Compatible with function-calling style prompting
 								- Fully compatible with Hugging Face Transformers
 								- Ready for high-throughput inference using vLLM
 								---
 								## Example Usage
 								```python
 								import torch
 								from transformers import AutoTokenizer, AutoModelForCausalLM
 								# 1. Configuration
 								model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
 								device = "cuda" if torch.cuda.is_available() else "cpu"
 								# 2. Load Model and Tokenizer
 								print(f"Loading model: {model_name}...")
 								tokenizer = AutoTokenizer.from_pretrained(model_name)
 								model = AutoModelForCausalLM.from_pretrained(
 								    model_name,
 								    trust_remote_code=True,
 								    device_map="auto",
 								    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
 								)
 								# 3. Define the Prompt
 								# Using the ChatML format expected by Qwen-based architectures
 								prompt = """<|im_start|>system
 								You are a helpful multilingual assistant.<|im_end|>
 								<|im_start|>user
 								क्या आप मेरे लिए एक अपॉइंटमेंट बुक कर सकते हैं?
 								अगर हाँ, तो कृपया मुझसे ज़रूरी जानकारी जैसे तारीख, समय और उद्देश्य पूछिए।<|im_end|>
 								<|im_start|>assistant
 								"""
 								# 4. Process and Generate
 								inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 								print("Generating response...")
 								with torch.no_grad():
 								    outputs = model.generate(
 								        **inputs,
 								        max_new_tokens=256,
 								        temperature=0.7,
 								        top_p=0.9,
 								        do_sample=True,
 								        pad_token_id=tokenizer.eos_token_id
 								    )
 								# 5. Decode and Print
 								# We only want to print the newly generated text (the assistant's reply)
 								full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
 								response = full_output.split("assistant")[-1].strip()
 								print("-" * 30)
 								print(f"Assistant: {response}")
 								print("-" * 30)
 								```
 								## Function/Tool Calling Example Usage
 								```python
 								import torch
 								import json
 								import re
 								from transformers import AutoTokenizer, AutoModelForCausalLM
 								# --- 1. MODEL SETUP ---
 								model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
 								tokenizer = AutoTokenizer.from_pretrained(model_name)
 								model = AutoModelForCausalLM.from_pretrained(
 								    model_name,
 								    trust_remote_code=True,
 								    device_map="auto",
 								    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
 								)
 								# --- 2. TOOLS & SYSTEM PROMPT ---
 								tools = [{
 								    "name": "book_appointment",
 								    "description": "Book an appointment for the user.",
 								    "parameters": {
 								        "type": "object",
 								        "properties": {
 								            "date": {"type": "string", "description": "Date in YYYY-MM-DD format"},
 								            "time": {"type": "string", "description": "Time in HH:MM (24h) format"},
 								            "purpose": {"type": "string", "description": "The medical reason or department"}
 								        },
 								        "required": ["date", "time", "purpose"]
 								    }
 								}]
 								# We provide the current date so the model can resolve "tomorrow" or "next Monday"
 								SYSTEM_PROMPT = f"""You are a helpful AI assistant.
 								Today's Date: 2026-01-08 (Thursday).
 								Available Tools:
 								{json.dumps(tools, indent=2)}
 								Rules:
 . If details (date, time, purpose) are missing, ask the user in Hindi.
 . If all details are present, output ONLY a <tool_call> JSON.
 . After a tool result is provided, confirm the booking to the user in Hindi."""
 								# --- 3. BACKEND FUNCTION ---
 								def execute_booking(date, time, purpose):
 								    # Simulated backend logic
 								    if "10:00" in time: # Simulate a busy slot
 								        return {"status": "error", "message": "यह समय पहले से बुक है। कृपया कोई और समय चुनें।"}
 								    return {"status": "success", "id": "APP-9921", "doctor": "Dr. Verma"}
 								# --- 4. THE INTERACTION ENGINE ---
 								def run_conversation(user_input, history):
 								    # Add user input to history
 								    history.append({"role": "user", "content": user_input})
 								    # Construct ChatML prompt
 								    prompt = f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
 								    for msg in history:
 								        prompt += f"<|im_start|>{msg['role']}\n{msg['content']}<|im_end|>\n"
 								    prompt += "<|im_start|>assistant\n"
 								    # Generate
 								    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 								    outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
 								    response = tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip()
 								    # Check if the model wants to call a tool
 								    tool_match = re.search(r"<tool_call>(.*?)</tool_call>", response, re.DOTALL)
 								    if tool_match:
 								        print(f"\n[MODEL REQUESTED TOOL]: {response}")
 								        call_data = json.loads(tool_match.group(1).strip())
 								        # Execute the function
 								        result = execute_booking(**call_data['arguments'])
 								        print(f"[TOOL RESULT]: {result}")
 								        # Feed result back to model for final response
 								        history.append({"role": "assistant", "content": response})
 								        history.append({"role": "system", "content": f"Tool Result: {json.dumps(result)}"})
 								        # Final confirmation generation
 								        return run_conversation("Please confirm the result to the user.", history)
 								    return response
 								# --- 5. TEST SCENARIO ---
 								chat_history = []
 								print("--- Chatbot Started (Today is 2026-01-08) ---")
 								# Step 1: User provides partial info
 								query_1 = "मेरे लिए डेंटिस्ट का अपॉइंटमेंट बुक करें।"
 								print(f"\nUser: {query_1}")
 								res_1 = run_conversation(query_1, chat_history)
 								chat_history.append({"role": "assistant", "content": res_1})
 								print(f"Assistant: {res_1}")
 								# Step 2: User provides the rest
 								query_2 = "कल दोपहर 2 बजे।"
 								print(f"\nUser: {query_2}")
 								res_2 = run_conversation(query_2, chat_history)
 								print(f"Assistant: {res_2}")
 								```
 								---
 								## Prompting Guidelines
 								- Use pure native-language prompts for best fluency
 								- Avoid heavy code-mixing (e.g., Hinglish-heavy inputs)
 								- Include a system prompt to stabilize multilingual behavior
 								- Ask explicitly for step-by-step reasoning when required
 								---
 								## Intended Uses & Limitations
 								### Intended Uses
 								- Multilingual Indic chatbots and AI assistants
 								- Education, governance, and public-sector AI applications
 								- Content generation and summarization in Indian languages
 								- Cross-lingual conversational and reasoning systems
 								### Limitations
 								- May occasionally hallucinate or produce inaccurate facts
 								- Performance may vary slightly across languages
 								- Not intended for medical, legal, or safety-critical use cases
 								- Code-mixed inputs may reduce output quality
 								---
 								## vLLM / High-Performance Serving
 								### Requirements
 								- NVIDIA GPU with compute capability ≥ 8.0 (A100 / H100 recommended)
 								- PyTorch 2.1+ with CUDA installed
 								- V100 (sm70) GPUs are not supported for vLLM GPU inference
 								### Installation
 								```bash
 								pip install torch transformers vllm sentencepiece
 								```
 								### Run vLLM Server
 								```bash
 								vllm serve   --model dheeyantra/dhee-nxtgen-qwen3-indic   --host 0.0.0.0   --port 8000
 								```
 								---
 								## License
 								Released under the **Apache 2.0 License**.
 								---
 								Developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**