llamafile

Running Local GGUF Models with llamafile

This guide provides a step-by-step approach to running GGUF models like Llama 2, Phi-3, and Gemma 3 locally using llamafile, covering installation and usage across different operating systems.

Run a GGUF Model with llamafile (Windows / Linux)

Downloading llamafile
Visit the llamafile releases page and download the appropriate executable for your OS (Windows, Linux).

Downloading GGUF Models

Llama 2 (7B Q8_0):

https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q8_0.gguf

Phi-3 mini 4k instruct (Full Precision 16-bit floating point):

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf

Running Models

Windows

Llama 2:

.\llamafile-0.9.1.exe --server --v2 -m llama-2-7b.Q8_0.gguf

Phi-3:

.\llamafile-0.9.1.exe --server --v2 -m Phi-3-mini-4k-instruct-fp16.gguf

Linux

Llama 2:

./llamafile --server --v2 -m llama-2-7b.Q8_0.gguf

Phi-3:

./llamafile --server --v2 -m Phi-3-mini-4k-instruct-fp16.gguf

Optional Parameters
-c N, --ctx-size N: Sets the maximum context size in tokens for chat mode. The default is 8192 tokens, but it can be adjusted. If set to 0, it will use the model’s maximum context size.

Building llamafile from Source (Linux)

Install Dependencies

Red Hat (Fedora, CentOS):

sudo dnf install make unzip

Debian/Ubuntu:

sudo apt-get update && sudo apt-get install build-essential unzip

Clone Repository

git clone https://github.com/Mozilla-Ocho/llamafile.git

Compile and Install
```
sudo make install PREFIX=/usr/local
```
Important Note: By using PREFIX=/usr/local, you are instructing the make install command to place the executable file (llamafile) in the /usr/local/bin directory.

Download and Store GGUF in models Folder

As an example, to download the Gemma 3 model:

wget https://huggingface.co/unsloth/gemma-3-1b-it-GGUF/resolve/main/gemma-3-1b-it-BF16.gguf -O gemma-3-1b-it-BF16.gguf

Converting Models
Use llamafile-convert to convert other models:
```
llamafile-convert models/gemma-3-1b-it-BF16.gguf
```