🤖 Model Configuration

Supported Models

At a minimum, a Large Language Model (LLM) must be configured in Oracle AI Microservices Sandbox for basic functionality. For Retrieval-Augmented Generation (RAG), an Embedding Model will also need to be configured.

10-Sept-2024: Documentation In-Progress…

If there is a specific model that you would like to use with the Oracle AI Microservices Sandbox, please open an issue in GitHub.

ModelTypeAPIOn-Premises
llama3.1LLMChatOllamaX
gpt-3.5-turboLLMOpenAI
gpt-4o-miniLLMOpenAI
gpt-4LLMOpenAI
gpt-4oLLMOpenAI
llama-3-sonar-small-128k-chatLLMChatPerplexity
llama-3-sonar-small-128k-onlineLLMChatPerplexity
command-rLLMCohere
mxbai-embed-largeEmbedOllamaEmbeddings
nomic-embed-textEmbedOllamaEmbeddingsX
all-minilmEmbedOllamaEmbeddingsX
thenlper/gte-baseEmbedHuggingFaceEndpointEmbeddingsX
text-embedding-3-smallEmbedOpenAIEmbeddings
text-embedding-3-largeEmbedOpenAIEmbeddings
embed-english-v3.0EmbedCohereEmbeddings
embed-english-light-v3.0EmbedCohereEmbeddings

Configuration

The models can either be configured using environment variables or through the Sandbox interface. To configure models through environment variables, please read the Additional Information about the specific model you would like to configure.

To configure an LLM or embedding model from the Sandbox, navigate to Configuration -> Models:

Model Config Model Config

Here you can configure both Large Language Models and Embedding Models. Set the API Keys and API URL as required. You can also Enable and Disable models.

API Keys

Third-Party cloud models, such as OpenAI and Perplexity AI, require API Keys. These keys are tied to registered, funded accounts on these platforms. For more information on creating an account, funding it, and generating API Keys for third-party cloud models, please visit their respective sites.

Cloud Model API Keys Cloud Model API Keys

On-Premises models, such as those from Ollama or HuggingFace usually do not require API Keys. These values can be left blank.

On-Premises Model API Keys On-Premises Model API Keys

API URL

When using an on-premises model, for performance purposes, they should be running on hosts with GPUs. As the Sandbox does not require GPUs, often is the case that the API URL for these models will be the IP or hostname address of a remote host. Specify the API URL and Port of the remote host.

On-Premises Model API URL On-Premises Model API URL

Additional Information

Ollama

Ollama is an open-source project that simplifies the running of LLMs and Embedding Models On-Premises.

When configuring an Ollama model in the Sandbox, set the API Server URL (e.g http://127.0.0.1:11434) and leave the API Key blank. Substitute the IP Address with the IP of where Ollama is running.

Skip the GUI!

You can set the following environment variable to automatically set the API Server URL and enable Ollama models (change the IP address and Port, as applicable to your environment):

export ON_PREM_OLLAMA_URL=http://127.0.0.1:11434

Quick-start

Example of running llama3.1 on a Linux host:

  1. Install Ollama:
sudo curl -fsSL https://ollama.com/install.sh | sh
  1. Pull the llama3.1 model:
ollama pull llama3.1
  1. Start Ollama
ollama serve

For more information and instructions on running Ollama on other platforms, please visit the Ollama GitHub Repository.

HuggingFace

HuggingFace is a platform where the machine learning community collaborates on models, datasets, and applications. It provides a large selection of models that can be run both in the cloud and On-Premises.

Skip the GUI!

You can set the following environment variable to automatically set the API Server URL and enable HuggingFace models (change the IP address and Port, as applicable to your environment): :

export ON_PREM_HF_URL=http://127.0.0.1:8080

Quick-start

Example of running thenlper/gte-base in a container:

  1. Set the Model based on CPU or GPU

    For CPUs: export HF_IMAGE=ghcr.io/huggingface/text-embeddings-inference:cpu-1.2 For GPUs: export HF_IMAGE=ghcr.io/huggingface/text-embeddings-inference:0.6

  2. Define a Temporary Volume

    export TMP_VOLUME=/tmp/hf_data
    mkdir -p $TMP_VOLUME
  3. Define the Model

    export HF_MODEL=thenlper/gte-base
  4. Start the Container

    podman run -d -p 8080:80 -v $TMP_VOLUME:/data --name hftei-gte-base \
        --pull always ${image} --model-id $HF_MODEL --max-client-batch-size 5024
  5. Determine the IP

    docker inspect hftei-gte-base | grep IPA

    NOTE: if there is no IP, use 127.0.0.1

Cohere

Cohere is an AI-powered answer engine. To use Cohere, you will need to sign-up and provide the Sandbox an API Key. Cohere offers a free-trial, rate-limited API Key.

WARNING: Cohere is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the Sandbox.

Skip the GUI!

You can set the following environment variable to automatically set the API Key and enable Perplexity models: :

export COHERE_API_KEY=<super-secret API Key>

OpenAI

OpenAI is an AI research organization behind the popular, online ChatGPT chatbot. To use OpenAI models, you will need to sign-up, purchase credits, and provide the Sandbox an API Key.

WARNING: OpenAI is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the Sandbox.

Skip the GUI!

You can set the following environment variable to automatically set the API Key and enable OpenAI models: :

export OPENAI_API_KEY=<super-secret API Key>

Perplexity AI

Perplexity AI is an AI-powered answer engine. To use Perplexity AI models, you will need to sign-up, purchase credits, and provide the Sandbox an API Key.

WARNING: Perplexity AI is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the Sandbox.

Skip the GUI!

You can set the following environment variable to automatically set the API Key and enable Perplexity models: :

export PPLX_API_KEY=<super-secret API Key>