Summary and Setup
This lesson covers various strategies for deploying Generative AI models locally on your laptop or workstation. In particular the lesson will cover how to deploy and utilize Generative AI models on your laptop or workstation using the following tools.
- LLaMA C++: Enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
- LlamaFile: Make open-source LLMs more accessible to both developers and end users. Combines LLaMA C++ with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a “llamafile”) that runs locally on most computers, with no installation.
- Ollama (GitHub): Get up and running with Llama 3, Mistral, Gemma 2, and other large language models. Uses LLaMA C++ as the backend.
- Open WebUI (GitHub): Extensible, self-hosted interface for AI that adapts to your workflow, all while operating entirely offline.
- Jupyter AI: A generative AI extension for JupyterLab.
FIXME: Setup instructions live in this document. Please specify the tools and the data sets the Learner needs to have installed.
Data Sets
Download the data zip file and unzip it to your Desktop
Install Miniforge, Conda, Mamba
If you haven’t already done so, install Miniforge. Miniforge provides minimal installers for Conda and Mamba specific to conda-forge, with the following features pre-configured:
- Packages in the base environment are obtained from the
conda-forge
channel. - The
conda-forge
channel is set as the default (and only) channel.
Conda/mamba will be the primary package managers used to install the required Python dependencies. For convenience, a script is included that will download and install Miniforge, Conda, and Mamba.
LLaMA C++, Llamafile, Ollama, and Friends
The exact software stack that you need to install depends on which episodes on the lesson that you plan to work through.
The instructions below will walk you through the process of installing the software required for the episodes of this lesson that focus on LLaMA C++ using the environment configuration files and build scripts in the kaust-generative-ai/local-deployment-llama-cpp repository on GitHub.
Create the Conda Environment
Creating the Conda environment for LLaMA C++ depends on your
operating system, whether your CPU supports specific hardware
acceleration, and whether you have access to a GPU. Create a Conda
environment in a sub-directory env/
of your project
directory by running one of following shell scripts.
Install LLaMA C++
For convenience there is an installer script which can be used to
download pre-compiled LLaMA C++(https://github.com/ggerganov/llama.cpp) binaries for
various OS, CPU, and GPU architectures and install the binaries into the
bin/
directory of the Conda environment. You can find the
latest release for
LLaMA C++ on GitHub and pass the link to the zip archive for your
desired release to the script as a command line argument.
For reference, here are a few examples.
Build LLaMA C++ from Source (Optional)
If there isn’t an official release available for your target operating system, then you will need to build LLaMA C++ from source. Don’t worry, we have created build scripts to make this process as painless as possible. After creating the Conda environment using the instructions above you can build LLaMA C++ by running a command similar to the following to build LLaMA C++ from source within the activated Conda environment so that the build process has access to the required build tools compiler toolchain.
This command does the following.
- Properly configures the Conda environment using the
conda run
command prior to running thebuild-llama-cpp.sh
script. - Clones LLaMA C++ into
./src/llama-cpp
. - Builds LLaMA C++ with support for CPU acceleration using OpenBlas in
./build/llama-cpp
. - Installs the binaries into the
bin/
directory of the Conda environment. - Removes the
./src/llama-cpp
directory as it is no longer needed. - Removes the
./build/llama-cpp
directory as it is no longer needed.
Depending on what CPU and GPU hardware you have available, there are other build scripts that use different compiler flags to optimize LLaMA C++ binaries.
The instructions below will walk you through the process of installing the software required for the episodes of this lesson that focus on Llamafile using the environment configuration files and build scripts in the kaust-generative-ai/local-deployment-llamafile repository on GitHub.
The instructions below will walk you through the process of installing the software required for the episodes of this lesson that focus on Ollama using the environment configuration files and build scripts in the kaust-generative-ai/local-deployment-ollama repository on GitHub.