Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260112-081758

Go to file

ModelHub XC 2c2240df42 初始化项目，由ModelHub XC社区提供模型

Model: Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260112-081758
Source: Original Platform

2026-04-11 12:30:59 +08:00

inference

初始化项目，由ModelHub XC社区提供模型

2026-04-11 12:30:59 +08:00

reports

初始化项目，由ModelHub XC社区提供模型

2026-04-11 12:30:59 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-11 12:30:59 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-11 12:30:59 +08:00

Modelfile

初始化项目，由ModelHub XC社区提供模型

2026-04-11 12:30:59 +08:00

phi-4.Q8_0.gguf

初始化项目，由ModelHub XC社区提供模型

2026-04-11 12:30:59 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-11 12:30:59 +08:00

README.md

pipeline_tag, tags, base_model, datasets

pipeline_tag

joke-finetome-model-gguf-phi4-20260112-081758 : GGUF

This model was finetuned and converted to GGUF format using Unsloth.

Example usage:

For text only LLMs: ./llama.cpp/llama-cli -hf Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260112-081758 --jinja
For multimodal models: ./llama.cpp/llama-mtmd-cli -hf Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260112-081758 --jinja

Available Model files:

phi-4.Q8_0.gguf

Ollama

An Ollama Modelfile is included for easy deployment. This was trained 2x faster with Unsloth

Training artifacts

Plot (interactive): reports/training_loss_step.html
Run manifest: reports/run_manifest.json
Inference sample: reports/inference_sample.json
Config snapshot: reports/config_snapshot.json

Inference

This repository contains a GGUF model intended to be used with llama.cpp and/or deployed on Hugging Face Inference Endpoints (llama.cpp container).

Recommended Inference Endpoints knobs:

Max tokens / request: 1024
Max concurrent requests: 2

Local llama.cpp (Phi-4 template)

llama-cli -hf Mathieu-Thomas-JOSSET/joke-finetome-model-gguf-phi4-20260112-081758:q8_0 -cnv --chat-template phi4

Hugging Face Inference Endpoint (llama.cpp)

When creating an endpoint, select this repo and the GGUF file <your_model>.gguf (quant: q8_0). Recommended settings are stored in: inference/endpoint_recipe.json.

Python client example: inference/hf_endpoint_client.py