Qwen3-8B-HPC-UG-Persona-Merged/README.md

---
base_model: unsloth/qwen3-8b-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3
license: apache-2.0
language:
- id
---

<div align="center">
  <h1>Qwen 3 8B HPC UG Assistant Persona</h1>
  <p><b>Empathetic & Professional AI Assistant for Universitas Gunadarma HPC Lab.</b></p>

  [![Unsloth](https://img.shields.io/badge/Unsloth-2x_Faster-blue?style=for-the-badge&logo=unsloth)](https://github.com/unslothai/unsloth)
  [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Models-orange?style=for-the-badge)](https://huggingface.co/felixhrdyn)
  [![License](https://img.shields.io/badge/License-Apache%202.0-red?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
</div>

---

## Model Overview

**Qwen 3 8B HPC UG Assistant Persona** is a behavioral fine-tuned version of Qwen-3-8B designed to serve as a digital assistant for the High-Performance Computing (HPC) lab at Universitas Gunadarma. 

Unlike standard models, this version is trained with a **humanistic persona**, focusing on empathy, professional Indonesian communication, and specific protocol adherence. It is "RAG-ready," meaning it excels at processing context provided via RAG to deliver accurate yet friendly answers.

## Persona Traits
- **Time-Awareness**: Greets users appropriately (Morning/Afternoon/Evening).
- **Empathy-First**: Calms users during technical failures or stressful moments.
- **Clarification First**: Asks for missing details (e.g., screenshots for errors) before providing solutions.
- **Natural Paraphrasing**: Converts technical FAQ data into conversational, easy-to-understand language.
- **Survey Footer**: Automatically includes feedback links only when the session is complete.

---

## Technical Specifications

This model was fine-tuned using the **Unsloth** library on a synthetic dataset of 126 multi-turn conversations reflecting various student emotional states.

| Parameter | Value |
| :--- | :--- |
| **Base Model** | `unsloth/qwen3-8b-unsloth-bnb-4bit` |
| **Method** | LoRA (PEFT) |
| **LoRA Rank (r)** | 16 |
| **LoRA Alpha** | 16 |
| **Target Modules** | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` |
| **Max Seq Length** | 1536 tokens |
| **Epochs** | 3 |
| **Optimizer** | `adamw_8bit` |

---

## Usage

### Prompt Template (ChatML)
The model expects the following format for optimal persona performance:

```
<|im_start|>system
Kamu adalah Asisten Praktikum AI Universitas Gunadarma. Ikuti panduan gaya berikut dengan ketat:
- Gunakan sapaan sesuai waktu: "Selamat pagi/siang/sore Kak" (variasikan sesuai konteks)
- Tanya klarifikasi jika pertanyaan ambigu SEBELUM menjawab — jangan langsung dump informasi
- Parafrase informasi dari konteks FAQ — JANGAN copy-paste verbatim
- Tutup dengan footer survey HANYA jika mahasiswa menyatakan sudah selesai/cukup/tidak ada pertanyaan lagi
- Gunakan "Kak" sebagai honorifik untuk mahasiswa
- Tawarkan follow-up setelah menjawab: "Apakah ada yang ingin ditanyakan kembali?"
- Untuk error teknis: minta detail/screenshot dulu, lalu berikan solusi langkah demi langkah
- Jika konteks tersedia dalam tag <konteks>, gunakan untuk menjawab tapi PARAFRASE, bukan salin
<|im_end|>
<|im_start|>user
{query}<|im_end|>
<|im_start|>assistant
```

### Inference with Unsloth (Recommended)
```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "felixhrdyn/Qwen3-8B-HPC-UG-Persona-Merged", # Use the merged version
    max_seq_length = 1536,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)

# Your chat logic here
```

---

## Available Formats

The model is released in two primary formats to cater to different deployment needs:

### 1. Merged 16-bit (DGX/Server Ready)
Optimized for server environments with full precision weights merged for maximum reliability.
- **Model Card**: [felixhrdyn/Qwen3-8B-HPC-UG-Persona-Merged](https://huggingface.co/felixhrdyn/Qwen3-8B-HPC-UG-Persona-Merged)

### 2. GGUF (Local / Edge Ready)
Converted using **Unsloth** for lightweight deployment on local machines (macOS, Windows, Linux).
- **Model Repository**: [felixhrdyn/Qwen3-8B-HPC-UG-Persona-GGUF](https://huggingface.co/felixhrdyn/Qwen3-8B-HPC-UG-Persona-GGUF)
- **Files**: `qwen3-8b.Q8_0.gguf`

#### GGUF Usage (llama-cli)
```bash
# For text only LLMs
llama-cli -hf felixhrdyn/Qwen3-8B-HPC-UG-Persona-GGUF --jinja

# For multimodal models
llama-mtmd-cli -hf felixhrdyn/Qwen3-8B-HPC-UG-Persona-GGUF --jinja
```

---

## Ollama Support

An **Ollama Modelfile** is included in the GGUF repository for easy deployment. 
- **Efficiency**: This model was trained **2x faster** with Unsloth.
- **Deployment**: Simply pull or create the model using the provided Modelfile to get started immediately in your Ollama environment.

---

## Evaluation
The model shows a significant behavioral shift from the base model, maintaining a **Professional, Formal, and Humanistic** tone even when faced with informal or frustrated user inputs.

### Training Metrics
The training was conducted for 3 epochs with a focus on loss convergence for behavioral stability.

| Metric | Value |
| :--- | :--- |
| Final Training Loss | 0.3802 |
| Validation Split | 10% |
| Training Epochs | 3 |
| Batch Size | 1 (Grad Accum: 4) |
| Convergence State | Achieved stable loss after Step 60 |

## Author
**Felix Hardyan**
- [Hugging Face](https://huggingface.co/felixhrdyn)
- [GitHub](https://github.com/flxhrdyn)
初始化项目，由ModelHub XC社区提供模型 Model: felixhrdyn/Qwen3-8B-HPC-UG-Persona-Merged Source: Original Platform 2026-05-28 18:22:16 +08:00			`---`
			`base_model: unsloth/qwen3-8b-unsloth-bnb-4bit`
			`tags:`
			`- text-generation-inference`
			`- transformers`
			`- unsloth`
			`- qwen3`
			`license: apache-2.0`
			`language:`
			`- id`
			`---`

			`<div align="center">`
			`<h1>Qwen 3 8B HPC UG Assistant Persona</h1>`
			`<p><b>Empathetic & Professional AI Assistant for Universitas Gunadarma HPC Lab.</b></p>`

			`[![Unsloth](https://img.shields.io/badge/Unsloth-2x_Faster-blue?style=for-the-badge&logo=unsloth)](https://github.com/unslothai/unsloth)`
			`[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Models-orange?style=for-the-badge)](https://huggingface.co/felixhrdyn)`
			`[![License](https://img.shields.io/badge/License-Apache%202.0-red?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)`
			`</div>`

			`---`

			`## Model Overview`

			`Qwen 3 8B HPC UG Assistant Persona is a behavioral fine-tuned version of Qwen-3-8B designed to serve as a digital assistant for the High-Performance Computing (HPC) lab at Universitas Gunadarma.`

			`Unlike standard models, this version is trained with a humanistic persona, focusing on empathy, professional Indonesian communication, and specific protocol adherence. It is "RAG-ready," meaning it excels at processing context provided via RAG to deliver accurate yet friendly answers.`

			`## Persona Traits`
			`- Time-Awareness: Greets users appropriately (Morning/Afternoon/Evening).`
			`- Empathy-First: Calms users during technical failures or stressful moments.`
			`- Clarification First: Asks for missing details (e.g., screenshots for errors) before providing solutions.`
			`- Natural Paraphrasing: Converts technical FAQ data into conversational, easy-to-understand language.`
			`- Survey Footer: Automatically includes feedback links only when the session is complete.`

			`---`

			`## Technical Specifications`

			`This model was fine-tuned using the Unsloth library on a synthetic dataset of 126 multi-turn conversations reflecting various student emotional states.`

			`\| Parameter \| Value \|`
			`\| :--- \| :--- \|`
			\| Base Model \| `unsloth/qwen3-8b-unsloth-bnb-4bit` \|
			`\| Method \| LoRA (PEFT) \|`
			`\| LoRA Rank (r) \| 16 \|`
			`\| LoRA Alpha \| 16 \|`
			\| Target Modules \| `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` \|
			`\| Max Seq Length \| 1536 tokens \|`
			`\| Epochs \| 3 \|`
			\| Optimizer \| `adamw_8bit` \|

			`---`

			`## Usage`

			`### Prompt Template (ChatML)`
			`The model expects the following format for optimal persona performance:`

			```
			`<\|im_start\|>system`
			`Kamu adalah Asisten Praktikum AI Universitas Gunadarma. Ikuti panduan gaya berikut dengan ketat:`
			`- Gunakan sapaan sesuai waktu: "Selamat pagi/siang/sore Kak" (variasikan sesuai konteks)`
			`- Tanya klarifikasi jika pertanyaan ambigu SEBELUM menjawab — jangan langsung dump informasi`
			`- Parafrase informasi dari konteks FAQ — JANGAN copy-paste verbatim`
			`- Tutup dengan footer survey HANYA jika mahasiswa menyatakan sudah selesai/cukup/tidak ada pertanyaan lagi`
			`- Gunakan "Kak" sebagai honorifik untuk mahasiswa`
			`- Tawarkan follow-up setelah menjawab: "Apakah ada yang ingin ditanyakan kembali?"`
			`- Untuk error teknis: minta detail/screenshot dulu, lalu berikan solusi langkah demi langkah`
			`- Jika konteks tersedia dalam tag <konteks>, gunakan untuk menjawab tapi PARAFRASE, bukan salin`
			`<\|im_end\|>`
			`<\|im_start\|>user`
			`{query}<\|im_end\|>`
			`<\|im_start\|>assistant`
			```

			`### Inference with Unsloth (Recommended)`
			```python
			`from unsloth import FastLanguageModel`

			`model, tokenizer = FastLanguageModel.from_pretrained(`
			`model_name = "felixhrdyn/Qwen3-8B-HPC-UG-Persona-Merged", # Use the merged version`
			`max_seq_length = 1536,`
			`load_in_4bit = True,`
			`)`
			`FastLanguageModel.for_inference(model)`

			`# Your chat logic here`
			```

			`---`

			`## Available Formats`

			`The model is released in two primary formats to cater to different deployment needs:`

			`### 1. Merged 16-bit (DGX/Server Ready)`
			`Optimized for server environments with full precision weights merged for maximum reliability.`
			`- Model Card: [felixhrdyn/Qwen3-8B-HPC-UG-Persona-Merged](https://huggingface.co/felixhrdyn/Qwen3-8B-HPC-UG-Persona-Merged)`

			`### 2. GGUF (Local / Edge Ready)`
			`Converted using Unsloth for lightweight deployment on local machines (macOS, Windows, Linux).`
			`- Model Repository: [felixhrdyn/Qwen3-8B-HPC-UG-Persona-GGUF](https://huggingface.co/felixhrdyn/Qwen3-8B-HPC-UG-Persona-GGUF)`
			- Files: `qwen3-8b.Q8_0.gguf`

			`#### GGUF Usage (llama-cli)`
			```bash
			`# For text only LLMs`
			`llama-cli -hf felixhrdyn/Qwen3-8B-HPC-UG-Persona-GGUF --jinja`

			`# For multimodal models`
			`llama-mtmd-cli -hf felixhrdyn/Qwen3-8B-HPC-UG-Persona-GGUF --jinja`
			```

			`---`

			`## Ollama Support`

			`An Ollama Modelfile is included in the GGUF repository for easy deployment.`
			`- Efficiency: This model was trained 2x faster with Unsloth.`
			`- Deployment: Simply pull or create the model using the provided Modelfile to get started immediately in your Ollama environment.`

			`---`

			`## Evaluation`
			`The model shows a significant behavioral shift from the base model, maintaining a Professional, Formal, and Humanistic tone even when faced with informal or frustrated user inputs.`

			`### Training Metrics`
			`The training was conducted for 3 epochs with a focus on loss convergence for behavioral stability.`

			`\| Metric \| Value \|`
			`\| :--- \| :--- \|`
			`\| Final Training Loss \| 0.3802 \|`
			`\| Validation Split \| 10% \|`
			`\| Training Epochs \| 3 \|`
			`\| Batch Size \| 1 (Grad Accum: 4) \|`
			`\| Convergence State \| Achieved stable loss after Step 60 \|`

			`## Author`
			`Felix Hardyan`
			`- [Hugging Face](https://huggingface.co/felixhrdyn)`
			`- [GitHub](https://github.com/flxhrdyn)`