Summary and Setup

This lesson covers various strategies for deploying Generative AI models locally on your laptop or workstation. In particular the lesson will cover how to deploy and utilize Generative AI models on your laptop or workstation using the following tools.

LLaMA C++: Enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
LlamaFile: Make open-source LLMs more accessible to both developers and end users. Combines LLaMA C++ with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a “llamafile”) that runs locally on most computers, with no installation.
Ollama (GitHub): Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. Uses LLaMA C++ as the backend.
Open WebUI (GitHub): Extensible, self-hosted interface for AI that adapts to your workflow, all while operating entirely offline.
Jupyter AI: A generative AI extension for JupyterLab.

FIXME: Setup instructions live in this document. Please specify the tools and the data sets the Learner needs to have installed.

Data Sets

Download the data zip file and unzip it to your Desktop

Install Miniforge, Conda, Mamba

If you haven’t already done so, install Miniforge. Miniforge provides minimal installers for Conda and Mamba specific to conda-forge, with the following features pre-configured:

Packages in the base environment are obtained from the conda-forge channel.
The conda-forge channel is set as the default (and only) channel.

Conda/mamba will be the primary package managers used to install the required Python dependencies. For convenience, a script is included that will download and install Miniforge, Conda, and Mamba.

BASH

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

BASH

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

TODO

LLaMA C++, Llamafile, Ollama, and Friends

The exact software stack that you need to install depends on which episodes on the lesson that you plan to work through.

LLaMA C++

The instructions below will walk you through the process of installing the software required for the episodes of this lesson that focus on LLaMA C++ using the environment configuration files and build scripts in the kaust-generative-ai/local-deployment-llama-cpp repository on GitHub.

Clone the Git Repository

BASH

git clone https://github.com/kaust-generative-ai/local-deployment-llama-cpp.git
cd local-deployment-llama-cpp/

Create the Conda Environment

Creating the Conda environment for LLaMA C++ depends on your operating system, whether your CPU supports specific hardware acceleration, and whether you have access to a GPU. Create a Conda environment in a sub-directory env/ of your project directory by running one of following shell scripts.

BASH

./bin/create-conda-env.sh

In order to support Metal GPU acceleration available on Macs with ARM CPUs you need to install a few extra dependecies. These dependencies are added to the environment-metal-gpu.yml and requirements-metal-gpu.txt files.

BASH

./bin/create-conda-env.sh environment-metal-gpu.yml

In order to support NVIDIA GPU acceleration you need to install a few extra dependecies. These dependencies are added to the environment-nvidia-gpu.yml and requirements-nvidia-gpu.txt files.

BASH

./bin/create-conda-env.sh environment-nvidia-gpu.yml

Install LLaMA C++

For convenience there is an installer script which can be used to download pre-compiled LLaMA C++(https://github.com/ggerganov/llama.cpp) binaries for various OS, CPU, and GPU architectures and install the binaries into the bin/ directory of the Conda environment. You can find the latest release for LLaMA C++ on GitHub and pass the link to the zip archive for your desired release to the script as a command line argument.

BASH

./bin/install-llama-cpp.sh "$RELEASE_URL"

For reference, here are a few examples.

BASH

DOWNLOAD_URL=https://github.com/ggerganov/llama.cpp/releases/download/
TAG=b3868
RELEASE_ARCHIVE=llama-b3868-bin-ubuntu-x64.zip
RELEASE_URL="$DOWNLOAD_URL"/"$TAG"/"$RELEASE_ARCHIVE"
./bin/install-llama-cpp.sh "$RELEASE_URL"

BASH

DOWNLOAD_URL=https://github.com/ggerganov/llama.cpp/releases/download/
TAG=b3868
RELEASE_ARCHIVE=lllama-b3868-bin-macos-arm64.zip
RELEASE_URL="$DOWNLOAD_URL"/"$TAG"/"$RELEASE_ARCHIVE"
./bin/install-llama-cpp.sh "$RELEASE_URL"

BASH

DOWNLOAD_URL=https://github.com/ggerganov/llama.cpp/releases/download/
TAG=b3868
RELEASE_ARCHIVE=llama-b3868-bin-macos-x64.zip
RELEASE_URL="$DOWNLOAD_URL"/"$TAG"/"$RELEASE_ARCHIVE"
./bin/install-llama-cpp.sh "$RELEASE_URL"

BASH

DOWNLOAD_URL=https://github.com/ggerganov/llama.cpp/releases/download/
TAG=b3868
RELEASE_ARCHIVE=llama-b3868-bin-win-avx512-x64.zip
RELEASE_URL="$DOWNLOAD_URL"/"$TAG"/"$RELEASE_ARCHIVE"
./bin/install-llama-cpp.sh "$RELEASE_URL"

Build LLaMA C++ from Source (Optional)

If there isn’t an official release available for your target operating system, then you will need to build LLaMA C++ from source. Don’t worry, we have created build scripts to make this process as painless as possible. After creating the Conda environment using the instructions above you can build LLaMA C++ by running a command similar to the following to build LLaMA C++ from source within the activated Conda environment so that the build process has access to the required build tools compiler toolchain.

BASH

conda run --prefix ./env --live-stream ./bin/build-llama-cpp.sh

This command does the following.

Properly configures the Conda environment using the conda run command prior to running the build-llama-cpp.sh script.
Clones LLaMA C++ into ./src/llama-cpp.
Builds LLaMA C++ with support for CPU acceleration using OpenBlas in ./build/llama-cpp.
Installs the binaries into the bin/ directory of the Conda environment.
Removes the ./src/llama-cpp directory as it is no longer needed.
Removes the ./build/llama-cpp directory as it is no longer needed.

Depending on what CPU and GPU hardware you have available, there are other build scripts that use different compiler flags to optimize LLaMA C++ binaries.

If you are using Mac OS, then you will need to install XCode and then run the following command to install XCode Command Line Tools in order to build from source.

BASH

xcode-select --install

After creating the Conda environment you can build LLaMA C++ by running the following command.

BASH

conda run --prefix ./env --live-stream ./bin/build-llama-cpp-metal-gpu.sh

After creating the Conda environment you can build LLaMA C++ by with support for GPU acceleration by running the following command.

BASH

conda run --prefix ./env --live-stream ./bin/build-llama-cpp-nvidia-gpu.sh

For a detailed discussion of additional NVIDIA GPU compilation options that might improve performance on particular GPU architectures see the LLaMA C++ build documentation.

Llamafile

The instructions below will walk you through the process of installing the software required for the episodes of this lesson that focus on Llamafile using the environment configuration files and build scripts in the kaust-generative-ai/local-deployment-llamafile repository on GitHub.

Clone the Git Repository

BASH

git clone https://github.com/kaust-generative-ai/local-deployment-llamafile.git
cd local-deployment-llamafile/

Create the Conda Environment

TODO

Install Llamafile

TODO

Build Llamafile from Source (Optional)

TODO

Ollama

The instructions below will walk you through the process of installing the software required for the episodes of this lesson that focus on Ollama using the environment configuration files and build scripts in the kaust-generative-ai/local-deployment-ollama repository on GitHub.

Clone the Git Repository

BASH

git clone https://github.com/kaust-generative-ai/local-deployment-ollama.git
cd local-deployment-ollama/

Create the Conda Environment

TODO

Install Ollama

TODO