### Local inference (llama.cpp) ```bash llama-cli -hf {REPO_ID}:q8_0 -cnv --chat-template phi4 ``` ### Server (OpenAI-compatible) ```bash llama-server -hf {REPO_ID}:q8_0 # /v1/chat/completions will be available (OpenAI-compatible) ```