Files
dhee-nxtgen-qwen3-indic/README.md
ModelHub XC f2668e68fe 初始化项目,由ModelHub XC社区提供模型
Model: dheeyantra/dhee-nxtgen-qwen3-indic
Source: Original Platform
2026-06-17 02:32:19 +08:00

269 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- hi
- bn
- ta
- te
- ml
- gu
- kn
- mr
- or
- pa
- as
- mai
- sa
- sd
license: apache-2.0
tags:
- causal-lm
- assistant
- reasoning
- multilingual
- indic
model_name: dheeyantra/dhee-nxtgen-qwen3-indic
library_name: transformers
---
# Dhee-NxtGen-Qwen3-Indic (4B)
## Model Description
**Dhee-NxtGen-Qwen3-Indic** is a **single, unified 4B-parameter multilingual large language model** developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**
Built on the **Qwen3-4B architecture**, the model is created to support **assistant-style conversations**, **reasoning**, and **function-callingcompatible workflows** across **14 Indian (Indic) languages** within one shared model.
The model is optimized for **native-script generation**, consistent multilingual behavior, and cross-lingual generalization.
---
## Supported Languages
This single model supports the following Indic languages:
- Hindi (hi)
- Bengali (bn)
- Tamil (ta)
- Telugu (te)
- Malayalam (ml)
- Gujarati (gu)
- Kannada (kn)
- Marathi (mr)
- Odia (or)
- Punjabi (pa)
- Assamese (as)
- Maithili (mai)
- Sanskrit (sa)
- Sindhi (sd)
> Best results are achieved when prompts are written entirely in the target language.
---
## Key Features
- Single multilingual 4B model (no per-language checkpoints)
- Fluent, native-script text generation across 14 Indic languages
- Optimized for assistant-style and reasoning-based dialogue
- Supports summarization, Q&A, and long-form generation
- Compatible with function-calling style prompting
- Fully compatible with Hugging Face Transformers
- Ready for high-throughput inference using vLLM
---
## Example Usage
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# 1. Configuration
model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
device = "cuda" if torch.cuda.is_available() else "cpu"
# 2. Load Model and Tokenizer
print(f"Loading model: {model_name}...")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)
# 3. Define the Prompt
# Using the ChatML format expected by Qwen-based architectures
prompt = """<|im_start|>system
You are a helpful multilingual assistant.<|im_end|>
<|im_start|>user
क्या आप मेरे लिए एक अपॉइंटमेंट बुक कर सकते हैं?
अगर हाँ, तो कृपया मुझसे ज़रूरी जानकारी जैसे तारीख, समय और उद्देश्य पूछिए।<|im_end|>
<|im_start|>assistant
"""
# 4. Process and Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
print("Generating response...")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# 5. Decode and Print
# We only want to print the newly generated text (the assistant's reply)
full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
response = full_output.split("assistant")[-1].strip()
print("-" * 30)
print(f"Assistant: {response}")
print("-" * 30)
```
## Function/Tool Calling Example Usage
```python
import torch
import json
import re
from transformers import AutoTokenizer, AutoModelForCausalLM
# --- 1. MODEL SETUP ---
model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)
# --- 2. TOOLS & SYSTEM PROMPT ---
tools = [{
"name": "book_appointment",
"description": "Book an appointment for the user.",
"parameters": {
"type": "object",
"properties": {
"date": {"type": "string", "description": "Date in YYYY-MM-DD format"},
"time": {"type": "string", "description": "Time in HH:MM (24h) format"},
"purpose": {"type": "string", "description": "The medical reason or department"}
},
"required": ["date", "time", "purpose"]
}
}]
# We provide the current date so the model can resolve "tomorrow" or "next Monday"
SYSTEM_PROMPT = f"""You are a helpful AI assistant.
Today's Date: 2026-01-08 (Thursday).
Available Tools:
{json.dumps(tools, indent=2)}
Rules:
1. If details (date, time, purpose) are missing, ask the user in Hindi.
2. If all details are present, output ONLY a <tool_call> JSON.
3. After a tool result is provided, confirm the booking to the user in Hindi."""
# --- 3. BACKEND FUNCTION ---
def execute_booking(date, time, purpose):
# Simulated backend logic
if "10:00" in time: # Simulate a busy slot
return {"status": "error", "message": "यह समय पहले से बुक है। कृपया कोई और समय चुनें।"}
return {"status": "success", "id": "APP-9921", "doctor": "Dr. Verma"}
# --- 4. THE INTERACTION ENGINE ---
def run_conversation(user_input, history):
# Add user input to history
history.append({"role": "user", "content": user_input})
# Construct ChatML prompt
prompt = f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
for msg in history:
prompt += f"<|im_start|>{msg['role']}\n{msg['content']}<|im_end|>\n"
prompt += "<|im_start|>assistant\n"
# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip()
# Check if the model wants to call a tool
tool_match = re.search(r"<tool_call>(.*?)</tool_call>", response, re.DOTALL)
if tool_match:
print(f"\n[MODEL REQUESTED TOOL]: {response}")
call_data = json.loads(tool_match.group(1).strip())
# Execute the function
result = execute_booking(**call_data['arguments'])
print(f"[TOOL RESULT]: {result}")
# Feed result back to model for final response
history.append({"role": "assistant", "content": response})
history.append({"role": "system", "content": f"Tool Result: {json.dumps(result)}"})
# Final confirmation generation
return run_conversation("Please confirm the result to the user.", history)
return response
# --- 5. TEST SCENARIO ---
chat_history = []
print("--- Chatbot Started (Today is 2026-01-08) ---")
# Step 1: User provides partial info
query_1 = "मेरे लिए डेंटिस्ट का अपॉइंटमेंट बुक करें।"
print(f"\nUser: {query_1}")
res_1 = run_conversation(query_1, chat_history)
chat_history.append({"role": "assistant", "content": res_1})
print(f"Assistant: {res_1}")
# Step 2: User provides the rest
query_2 = "कल दोपहर 2 बजे।"
print(f"\nUser: {query_2}")
res_2 = run_conversation(query_2, chat_history)
print(f"Assistant: {res_2}")
```
---
## Prompting Guidelines
- Use pure native-language prompts for best fluency
- Avoid heavy code-mixing (e.g., Hinglish-heavy inputs)
- Include a system prompt to stabilize multilingual behavior
- Ask explicitly for step-by-step reasoning when required
---
## Intended Uses & Limitations
### Intended Uses
- Multilingual Indic chatbots and AI assistants
- Education, governance, and public-sector AI applications
- Content generation and summarization in Indian languages
- Cross-lingual conversational and reasoning systems
### Limitations
- May occasionally hallucinate or produce inaccurate facts
- Performance may vary slightly across languages
- Not intended for medical, legal, or safety-critical use cases
- Code-mixed inputs may reduce output quality
---
## vLLM / High-Performance Serving
### Requirements
- NVIDIA GPU with compute capability ≥ 8.0 (A100 / H100 recommended)
- PyTorch 2.1+ with CUDA installed
- V100 (sm70) GPUs are not supported for vLLM GPU inference
### Installation
```bash
pip install torch transformers vllm sentencepiece
```
### Run vLLM Server
```bash
vllm serve --model dheeyantra/dhee-nxtgen-qwen3-indic --host 0.0.0.0 --port 8000
```
---
## License
Released under the **Apache 2.0 License**.
---
Developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**