llamafile

llamafile

Running Local GGUF Models with llamafile

This guide provides a step-by-step approach to running GGUF models like Llama 2, Phi-3, and Gemma 3 locally using llamafile, covering installation and usage across different operating systems.

Run a GGUF Model with llamafile (Windows / Linux)

  1. Downloading llamafile

    Visit the llamafile releases page and download the appropriate executable for your OS (Windows, Linux).

  2. Downloading GGUF Models

    Llama 2 (7B Q8_0):

    https://huggingface.co/TheBloke/Llama-2-7B-GGUF/resolve/main/llama-2-7b.Q8_0.gguf

    Phi-3 mini 4k instruct (Full Precision 16-bit floating point):

    https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf
  3. Running Models

    1. Windows

      Llama 2:

      .\llamafile-0.9.1.exe --server --v2 -m llama-2-7b.Q8_0.gguf

      Phi-3:

      .\llamafile-0.9.1.exe --server --v2 -m Phi-3-mini-4k-instruct-fp16.gguf
    2. Linux

      Llama 2:

      ./llamafile --server --v2 -m llama-2-7b.Q8_0.gguf

      Phi-3:

      ./llamafile --server --v2 -m Phi-3-mini-4k-instruct-fp16.gguf
  4. Optional Parameters

    -c N, --ctx-size N: Sets the maximum context size in tokens for chat mode. The default is 8192 tokens, but it can be adjusted. If set to 0, it will use the model’s maximum context size.

Building llamafile from Source (Linux)

  1. Install Dependencies

    Red Hat (Fedora, CentOS):

    sudo dnf install make unzip

    Debian/Ubuntu:

    sudo apt-get update && sudo apt-get install build-essential unzip
  2. Clone Repository

    git clone https://github.com/Mozilla-Ocho/llamafile.git
  3. Compile and Install

    sudo make install PREFIX=/usr/local

    Important Note: By using PREFIX=/usr/local, you are instructing the make install command to place the executable file (llamafile) in the /usr/local/bin directory.

  4. Download and Store GGUF in models Folder

    As an example, to download the Gemma 3 model:

    wget https://huggingface.co/unsloth/gemma-3-1b-it-GGUF/resolve/main/gemma-3-1b-it-BF16.gguf -O gemma-3-1b-it-BF16.gguf
  5. Converting Models

    Use llamafile-convert to convert other models:

    llamafile-convert models/gemma-3-1b-it-BF16.gguf