Recommended default:Q4_K_M (4-bit, K-quant medium). For memory-constrained deployment, try Q2_K or Q3_K_M. For maximum fidelity, use Q8_0.
Usage
llama.cpp
# Text-only
llama-cli -hf ermiaazarkhalili/LFM2.5-1.2B-Function-Calling-xLAM-Unsloth-GGUF --jinja -p "Find flights from SFO to NYC on December 25th" -n 256# Interactive chat
llama-cli -hf ermiaazarkhalili/LFM2.5-1.2B-Function-Calling-xLAM-Unsloth-GGUF --jinja -cnv
Ollama
ollama run hf.co/ermiaazarkhalili/LFM2.5-1.2B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M
llama-cpp-python
fromllama_cppimportLlamallm=Llama.from_pretrained(repo_id="ermiaazarkhalili/LFM2.5-1.2B-Function-Calling-xLAM-Unsloth-GGUF",filename="*Q4_K_M.gguf",n_ctx=2048,)out=llm.create_chat_completion(messages=[{"role":"user","content":"Find flights from SFO to NYC on December 25th"}],max_tokens=256,)print(out["choices"][0]["message"]["content"])
Intended use
For research and non-commercial experimentation only. Outputs should be independently verified before any downstream use.
Limitations
GGUF quantizations have unavoidable quality loss relative to the source bfloat16 checkpoint. Use Q5_K_M or Q8_0 for best fidelity.