Model: millat/mistral-7b-indian-university-guidance Source: Original Platform
license, language, library_name, tags, base_model, datasets, model-index, pipeline_tag
| license | language | library_name | tags | base_model | datasets | model-index | pipeline_tag | |||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 |
|
transformers |
|
mistralai/Mistral-7B-Instruct-v0.3 |
|
|
text-generation |
Mistral 7B - Indian University Guidance for Bangladeshi Students
A fine-tuned Mistral-7B model specialized for providing comprehensive guidance to Bangladeshi students seeking admission to Indian universities. The model delivers accurate, domain-specific information about scholarships, admissions, visa processes, and degree equivalencies.
This model is trained on the contextual dataset introduced in our research paper: "Development of a Contextual Educational Dataset for Bangladeshi Students Studying in India" presented at Sharda University, Greater Noida, India.
| Model DOI | 10.57967/hf/7639 |
| Dataset DOI | 10.57967/hf/6295 |
Authors
- MD Millat Hosen
- Md Moudud Ahmed Misil
- Dr. Rohit Kumar Sachan
Model Highlights
- Excellent Perplexity: 1.4555 (indicating high prediction confidence)
- Domain-Specialized: Trained on 7,044 curated Q&A pairs
- Practical Knowledge: Covers Sharda University scholarships, degree equivalencies, GPA conversions, and regulatory requirements (AICTE, NMC, BMDC)
- Efficient Training: QLoRA fine-tuning on Tesla T4 (Google Colab Free Tier)
Model Description
This model is fine-tuned from mistralai/Mistral-7B-Instruct-v0.3 using the Unsloth framework with QLoRA (Quantized Low-Rank Adaptation). It provides specialized guidance covering:
| Category | Topics Covered |
|---|---|
| Universities | Sharda University (primary focus), Galgotias, Amity, Noida International University (NIU) |
| Scholarships | Eligibility criteria, percentage waivers (20%/50%), retention requirements, program exclusions |
| Admissions | Requirements, documents, deadlines, lateral entry for diploma holders |
| Visa & Documentation | Student visa process, FRRO registration, required documents |
| Degree Equivalence | B.Sc. Engineering ↔ B.Tech, B.Com, BA, Polytechnic Diploma, Madrasa (Alim) recognition |
| GPA Conversion | HSC GPA (out of 5) to Indian percentage/CGPA (out of 10) system |
| Medical Education | MBBS requirements, NMC approval, BMDC recognition |
| Regulatory Bodies | AICTE, NMC, BMDC, UGC requirements and approval processes |
Note
: This model is primarily trained on Sharda University data and general degree equivalence information. It does not contain specific information about IITs, NITs, or other public universities.
Training Details
Configuration
| Parameter | Value |
|---|---|
| Base Model | mistralai/Mistral-7B-Instruct-v0.3 |
| Framework | Unsloth + QLoRA |
| Quantization | 4-bit (bnb) |
| LoRA Rank (r) | 32 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0 |
| Max Sequence Length | 512 |
| Learning Rate | 2e-4 |
| Batch Size | 4 |
| Gradient Accumulation | 4 |
| Effective Batch Size | 16 |
| Epochs | 3 |
| Total Steps | 1,191 |
| Warmup Steps | 10 |
| Seed | 3407 |
Training Results
| Metric | Value |
|---|---|
| Final Training Loss | 0.1016 |
| Evaluation Loss | 0.3754 |
| Perplexity | 1.4555 |
| Reported Training Duration | 66.47 minutes* |
| Actual Wall-Clock Time | ~3.5 hours** |
*Timer reset on Colab reconnections; this is cumulative GPU compute time only
**Actual elapsed time from first epoch start (07:32) to final epoch completion (10:58) was approximately 3 hours 26 minutes due to Colab session interruptions and reconnections
Training Progress
| Epoch | Eval Loss | Perplexity |
|---|---|---|
| 1 | 0.3820 | 1.4652 |
| 2 | 0.3757 | 1.4560 |
| 3 | 0.3754 | 1.4555 |
Hardware
| Resource | Specification |
|---|---|
| GPU | Tesla T4 |
| GPU Memory (Total) | 15.83 GB |
| GPU Memory (Peak) | 11.95 GB |
| GPU Memory (Average) | 5.19 GB |
| RAM (Peak) | 5.95 GB |
Dataset
Trained on millat/indian_university_guidance_for_bangladeshi_students - a custom dataset containing 7,044 high-quality, instruction-formatted Q&A pairs created by the same authors using the SetForge pipeline.
| Property | Value |
|---|---|
| Dataset DOI | 10.57967/hf/6295 |
| Format | JSONL (JSON Lines) |
| License | MIT |
| Total Samples | 7,044 |
| Training Samples | 6,339 (90%) |
| Evaluation Samples | 705 (10%) |
| Estimated Tokens | ~1,019,372 |
Dataset Topics
| Category | Topics Covered |
|---|---|
| University Information | Private NCR universities: Sharda (primary), Galgotias, Amity, Noida International University |
| Scholarships | Eligibility criteria, percentage waivers (20%/50%), retention requirements, program exclusions |
| Admissions | Requirements, documents, deadlines, lateral entry for diploma holders |
| Visa & Documentation | Student visa process, e-FRRO registration, passport requirements |
| Degree Equivalence | Bangladesh to India credential conversion (B.Sc. Engineering ↔ B.Tech, Polytechnic Diploma, Madrasa Alim) |
| GPA Conversion | HSC GPA (out of 5) to Indian percentage/CGPA (out of 10) system |
| Regulatory Bodies | AICTE, NMC, BMDC, UGC requirements and approval processes |
Data Sources (from research paper)
| Category | Sources |
|---|---|
| Universities | Sharda University (4 portals), Galgotias University (3 portals), Amity University (3 portals), Noida International University (2 portals) |
| Government - Bangladesh | High Commission New Delhi, Deputy High Commission Chennai, e-Passport Portal |
| Government - India | High Commission Dhaka, e-FRRO, Indian Visa portals, MEA, MHA, Study in India portal |
⚠️ Note: This dataset does NOT include IITs, NITs, Central Universities, or other public/government institutions. It focuses exclusively on private universities in India's National Capital Region (NCR).
Data Fields
Each record in the dataset contains:
| Field | Description |
|---|---|
question |
Question from the perspective of a Bangladeshi student |
answer |
Comprehensive, direct answer to the question |
context |
Brief description of the topic the Q&A belongs to |
source |
Source of information used to generate the answer |
metadata |
Nested object with degree_equivalence, grading_conversion, country_origin, tone, cultural_sensitivity |
Data Statistics
| Metric | Questions | Answers |
|---|---|---|
| Min Characters | 18 | 65 |
| Max Characters | 307 | 1,151 |
| Average Characters | 128.5 | 391.7 |
| Average Tokens | 38.9 | 105.8 |
Data Curation Pipeline
The dataset was created through the SetForge pipeline (as described in our research paper):
- Source Selection: Identification of relevant sources from NCR private universities and government portals
- Data Extraction: Content collected using WebScrape Chrome extension with support for dynamic content and PDF extraction via pdf.js
- Data Preprocessing: NLP and RegEx-based cleaning, filtering, duplicate removal, and domain categorization
- Data Categorization & Standardization: AI-powered document triage using LLMs for semantic chunking and schema-guided structuring with resilient error handling
- Dataset Generation: Context-aware Q&A pair generation with instruction-based formatting using asyncio for concurrent processing, stored in JSONL format
Usage
With Unsloth (Recommended - Fastest)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="millat/mistral-7b-indian-university-guidance",
max_seq_length=2048,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
What scholarships are available at Sharda University for Bangladeshi students?
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "millat/mistral-7b-indian-university-guidance"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
)
prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Can I get lateral entry to B.Tech 2nd year with my Polytechnic Diploma from Bangladesh?
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With GGUF (llama.cpp)
# Download the Q8_0 quantized version
huggingface-cli download millat/mistral-7b-indian-university-guidance \
--include "*.gguf" \
--local-dir ./models
# Run with llama.cpp
./main -m ./models/mistral-7b-indian-university-guidance-q8_0.gguf \
-p "What are the requirements for MBBS admission in India for Bangladeshi students?" \
-n 256
With Ollama
# Create Modelfile
cat << 'EOF' > Modelfile
FROM ./mistral-7b-indian-university-guidance-q8_0.gguf
TEMPLATE """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{{ .Prompt }}
### Response:
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF
# Create and run
ollama create indian-uni-guide -f Modelfile
ollama run indian-uni-guide "What is the FRRO registration process?"
Example Outputs
Scholarship Information
Question: "I have a GPA of 3.8 out of 5.0 in my HSC from Bangladesh. What scholarship can I get at Sharda University?"
Response: "With a GPA of 3.8/5, you are eligible for a 50% tuition fee scholarship at Sharda University for programs like B.Tech, BBA, MBA, BCA, and others listed in the 50% scholarship category. This is because your GPA falls within the 3.5-5.0 range. Exclusions from scholarships include Pharmacy, M.Sc. Nursing, MPT, Medical M.Sc., BDS & MBBS. To retain this scholarship after first year, maintain 75% attendance and pass all exams without backlogs."
Degree Equivalency
Question: "I have a B.Sc. in Engineering from Bangladesh. Is it equivalent to B.Tech in India?"
Response: "Yes, your 4-year B.Sc. in engineering (Bangladeshi university curriculum) is considered equivalent to an Indian B.Tech or B.E. degree. The key requirement for recognition in Bangladesh is that the Indian engineering program must be AICTE-approved."
Medical Degree Recognition
Question: "Will my MBBS degree from India be recognized by BMDC in Bangladesh?"
Response: "For your MBBS or BDS degree from an Indian university to be recognized and valid for practice in Bangladesh, it must be NMC-approved (National Medical Commission) in India and subsequently recognized by the BMDC (Bangladesh Medical and Dental Council). Ensure that the program is full-time and on campus. Recognition typically requires completion of specific regulatory requirements by both parties."
Lateral Entry
Question: "Can I get lateral entry to B.Tech 2nd year with my Polytechnic Diploma from Bangladesh?"
Response: "Yes, your 4-year Polytechnic Diploma from Bangladesh (completed after SSC) makes you eligible for lateral entry into the second year of B.Tech programs at Indian universities like Sharda University. The Indian B.Tech degree is equivalent to B.Sc. in Engineering in Bangladesh. Ensure the Indian institution is AICTE-approved for your degree to be recognized back in Bangladesh."
FRRO Registration
Question: "What documents do I need for FRRO registration in India as a Bangladeshi student?"
Response: "You will need: original passport with visa, bonafide certificate from university/college, CPR certificate (compulsory pregnancy report if applicable), marriage certificate (if married), medical certificate (AIDS test result) and 4 photographs."
Model Comparison
The fine-tuned model shows significant improvements over the base model:
| Aspect | Base Model | Fine-tuned Model |
|---|---|---|
| Domain Knowledge | Generic | Specialized for BD→India education |
| Response Length | Verbose, often off-topic | Concise and focused |
| Accuracy | Generic advice | Specific criteria, percentages, requirements |
| Hallucinations | Common (fake URLs, HTML artifacts) | Minimal |
| Regulatory Knowledge | Limited | Accurate (AICTE, NMC, BMDC) |
Available Formats
| Format | Size | Use Case |
|---|---|---|
| Safetensors (16-bit) | ~14.5 GB | Full precision inference |
| GGUF Q8_0 | ~7.7 GB | High-quality local inference |
| GGUF Q4_K_M | ~4.37 GB | Balanced quality/size |
| GGUF Q4_0 | ~4.11 GB | Smaller footprint |
| GGUF Q3_K_M | ~3.52 GB | Memory-constrained environments |
| GGUF Q2_K | ~2.72 GB | Minimum size |
Limitations
- Geographic Scope: Focused on Bangladeshi students applying to Indian universities
- Temporal: Information reflects 2025 admission cycles; verify current requirements
- University Coverage:
- Primary: Sharda University (most data - scholarships, programs, eligibility)
- Secondary: Galgotias, Amity, Noida International University (NIU)
- Not Covered: IITs (Indian Institutes of Technology), NITs (National Institutes of Technology), Central Universities, and other public/government universities
- Topic Bias: Heavy emphasis on scholarship eligibility criteria; limited information on campus life, placements, or research opportunities
- Not Legal Advice: For official processes, always verify with respective authorities
Ethical Considerations
- This model provides educational guidance and should not replace official university or government sources
- Users should verify scholarship amounts, eligibility criteria, and visa requirements with official sources
- The model may occasionally generate plausible but outdated information
Citation
If you use this model or dataset, please cite the research paper:
Paper Citation
@inproceedings{hosen2025contextual,
author = {MD Millat Hosen and Md Moudud Ahmed Misil and Dr. Rohit Kumar Sachan},
title = {Development of a Contextual Educational Dataset for Bangladeshi Students Studying in India},
booktitle = {Proceedings of Sharda University},
year = {2025},
address = {Greater Noida, India},
institution = {School of Computer Science and Engineering (SSCSE), Sharda University},
keywords = {Bangladeshi, Large language model, Natural language processing, ChatBot, JSON}
}
Model Citation
@misc{hosen2025mistral_indian_uni,
author = {MD Millat Hosen and Md Moudud Ahmed Misil and Dr. Rohit Kumar Sachan},
title = {Mistral 7B - Indian University Guidance for Bangladeshi Students},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/millat/mistral-7b-indian-university-guidance},
doi = {10.57967/hf/7639},
note = {Fine-tuned with Unsloth + QLoRA on 7,044 domain-specific Q&A pairs}
}
Dataset Citation
@misc{hosen2025indian_university_dataset,
author = {MD Millat Hosen and Md Moudud Ahmed Misil and Dr. Rohit Kumar Sachan},
title = {indian_university_guidance_for_bangladeshi_students},
year = {2025},
url = {https://huggingface.co/datasets/millat/indian_university_guidance_for_bangladeshi_students},
doi = {10.57967/hf/6295},
publisher = {Hugging Face}
}
Acknowledgments
- Unsloth for efficient fine-tuning framework
- Mistral AI for the base model
- Hugging Face for model hosting and datasets infrastructure
- SetForge for the dataset generation pipeline
- WebScrape for data collection
License
Apache 2.0 - See LICENSE for details.
Model Card Last Updated: January 25, 2026