初始化项目，由ModelHub XC社区提供模型

Model: dheeyantra/dhee-nxtgen-qwen3-indic Source: Original Platform
2026-06-17 02:32:19 +08:00
commit f2668e68fe
14 changed files with 152537 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,268 @@
+---
+language:
+  - hi
+  - bn
+  - ta
+  - te
+  - ml
+  - gu
+  - kn
+  - mr
+  - or
+  - pa
+  - as
+  - mai
+  - sa
+  - sd
+license: apache-2.0
+tags:
+  - causal-lm
+  - assistant
+  - reasoning
+  - multilingual
+  - indic
+model_name: dheeyantra/dhee-nxtgen-qwen3-indic
+library_name: transformers
+---
+# Dhee-NxtGen-Qwen3-Indic (4B)
+
+## Model Description
+**Dhee-NxtGen-Qwen3-Indic** is a **single, unified 4B-parameter multilingual large language model** developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**
+
+Built on the **Qwen3-4B architecture**, the model is created to support **assistant-style conversations**, **reasoning**, and **function-calling–compatible workflows** across **14 Indian (Indic) languages** within one shared model.
+
+The model is optimized for **native-script generation**, consistent multilingual behavior, and cross-lingual generalization.
+---
+## Supported Languages
+This single model supports the following Indic languages:
+
+- Hindi (hi)
+- Bengali (bn)
+- Tamil (ta)
+- Telugu (te)
+- Malayalam (ml)
+- Gujarati (gu)
+- Kannada (kn)
+- Marathi (mr)
+- Odia (or)
+- Punjabi (pa)
+- Assamese (as)
+- Maithili (mai)
+- Sanskrit (sa)
+- Sindhi (sd)
+
+> Best results are achieved when prompts are written entirely in the target language.
+---
+## Key Features
+- Single multilingual 4B model (no per-language checkpoints)
+- Fluent, native-script text generation across 14 Indic languages
+- Optimized for assistant-style and reasoning-based dialogue
+- Supports summarization, Q&A, and long-form generation
+- Compatible with function-calling style prompting
+- Fully compatible with Hugging Face Transformers
+- Ready for high-throughput inference using vLLM
+---
+## Example Usage
+
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+# 1. Configuration
+model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
+device = "cuda" if torch.cuda.is_available() else "cpu"
+
+# 2. Load Model and Tokenizer
+print(f"Loading model: {model_name}...")
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    trust_remote_code=True,
+    device_map="auto",
+    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
+)
+
+# 3. Define the Prompt
+# Using the ChatML format expected by Qwen-based architectures
+prompt = """<|im_start|>system
+You are a helpful multilingual assistant.<|im_end|>
+<|im_start|>user
+क्या आप मेरे लिए एक अपॉइंटमेंट बुक कर सकते हैं? 
+अगर हाँ, तो कृपया मुझसे ज़रूरी जानकारी जैसे तारीख, समय और उद्देश्य पूछिए।<|im_end|>
+<|im_start|>assistant
+"""
+
+# 4. Process and Generate
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+
+print("Generating response...")
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=256,
+        temperature=0.7,
+        top_p=0.9,
+        do_sample=True,
+        pad_token_id=tokenizer.eos_token_id
+    )
+
+# 5. Decode and Print
+# We only want to print the newly generated text (the assistant's reply)
+full_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
+response = full_output.split("assistant")[-1].strip()
+
+print("-" * 30)
+print(f"Assistant: {response}")
+print("-" * 30)
+```
+## Function/Tool Calling Example Usage
+
+```python
+import torch
+import json
+import re
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+# --- 1. MODEL SETUP ---
+model_name = "dheeyantra/dhee-nxtgen-qwen3-indic"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    trust_remote_code=True,
+    device_map="auto",
+    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
+)
+
+# --- 2. TOOLS & SYSTEM PROMPT ---
+tools = [{
+    "name": "book_appointment",
+    "description": "Book an appointment for the user.",
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "date": {"type": "string", "description": "Date in YYYY-MM-DD format"},
+            "time": {"type": "string", "description": "Time in HH:MM (24h) format"},
+            "purpose": {"type": "string", "description": "The medical reason or department"}
+        },
+        "required": ["date", "time", "purpose"]
+    }
+}]
+
+# We provide the current date so the model can resolve "tomorrow" or "next Monday"
+SYSTEM_PROMPT = f"""You are a helpful AI assistant.
+Today's Date: 2026-01-08 (Thursday).
+
+Available Tools:
+{json.dumps(tools, indent=2)}
+
+Rules:
+1. If details (date, time, purpose) are missing, ask the user in Hindi.
+2. If all details are present, output ONLY a <tool_call> JSON.
+3. After a tool result is provided, confirm the booking to the user in Hindi."""
+
+# --- 3. BACKEND FUNCTION ---
+def execute_booking(date, time, purpose):
+    # Simulated backend logic
+    if "10:00" in time: # Simulate a busy slot
+        return {"status": "error", "message": "यह समय पहले से बुक है। कृपया कोई और समय चुनें।"}
+    return {"status": "success", "id": "APP-9921", "doctor": "Dr. Verma"}
+
+# --- 4. THE INTERACTION ENGINE ---
+def run_conversation(user_input, history):
+    # Add user input to history
+    history.append({"role": "user", "content": user_input})
+    
+    # Construct ChatML prompt
+    prompt = f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
+    for msg in history:
+        prompt += f"<|im_start|>{msg['role']}\n{msg['content']}<|im_end|>\n"
+    prompt += "<|im_start|>assistant\n"
+
+    # Generate
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True).split("assistant")[-1].strip()
+
+    # Check if the model wants to call a tool
+    tool_match = re.search(r"<tool_call>(.*?)</tool_call>", response, re.DOTALL)
+    
+    if tool_match:
+        print(f"\n[MODEL REQUESTED TOOL]: {response}")
+        call_data = json.loads(tool_match.group(1).strip())
+        
+        # Execute the function
+        result = execute_booking(**call_data['arguments'])
+        print(f"[TOOL RESULT]: {result}")
+        
+        # Feed result back to model for final response
+        history.append({"role": "assistant", "content": response})
+        history.append({"role": "system", "content": f"Tool Result: {json.dumps(result)}"})
+        
+        # Final confirmation generation
+        return run_conversation("Please confirm the result to the user.", history)
+    
+    return response
+
+# --- 5. TEST SCENARIO ---
+chat_history = []
+
+print("--- Chatbot Started (Today is 2026-01-08) ---")
+
+# Step 1: User provides partial info
+query_1 = "मेरे लिए डेंटिस्ट का अपॉइंटमेंट बुक करें।"
+print(f"\nUser: {query_1}")
+res_1 = run_conversation(query_1, chat_history)
+chat_history.append({"role": "assistant", "content": res_1})
+print(f"Assistant: {res_1}")
+
+# Step 2: User provides the rest
+query_2 = "कल दोपहर 2 बजे।"
+print(f"\nUser: {query_2}")
+res_2 = run_conversation(query_2, chat_history)
+print(f"Assistant: {res_2}")
+
+```
+---
+## Prompting Guidelines
+- Use pure native-language prompts for best fluency
+- Avoid heavy code-mixing (e.g., Hinglish-heavy inputs)
+- Include a system prompt to stabilize multilingual behavior
+- Ask explicitly for step-by-step reasoning when required
+---
+## Intended Uses & Limitations
+
+### Intended Uses
+- Multilingual Indic chatbots and AI assistants
+- Education, governance, and public-sector AI applications
+- Content generation and summarization in Indian languages
+- Cross-lingual conversational and reasoning systems
+
+### Limitations
+- May occasionally hallucinate or produce inaccurate facts
+- Performance may vary slightly across languages
+- Not intended for medical, legal, or safety-critical use cases
+- Code-mixed inputs may reduce output quality
+---
+## vLLM / High-Performance Serving
+
+### Requirements
+- NVIDIA GPU with compute capability ≥ 8.0 (A100 / H100 recommended)
+- PyTorch 2.1+ with CUDA installed
+- V100 (sm70) GPUs are not supported for vLLM GPU inference
+
+### Installation
+```bash
+pip install torch transformers vllm sentencepiece
+```
+
+### Run vLLM Server
+```bash
+vllm serve   --model dheeyantra/dhee-nxtgen-qwen3-indic   --host 0.0.0.0   --port 8000
+```
+---
+## License
+Released under the **Apache 2.0 License**.
+
+---
+
+Developed by **DheeYantra** in collaboration with **NxtGen Cloud Technologies Pvt. Ltd.**