🤖 Model Configuration

Supported Models

At a minimum, a Large Language Model (LLM) must be configured in AI Optimizer for basic functionality. For Retrieval-Augmented Generation (RAG), an Embedding Model will also need to be configured.

Model APIs

If there is a specific model API that you would like to use, please open an issue in GitHub.

Type	API	Location
LLM	ChatOCIGenAI	Private Cloud
LLM	ChatOllama	On-Premises
LLM	CompatOpenAI	On-Premises
LLM	OpenAI	Third-Party
LLM	ChatPerplexity	Third-Party
LLM	Cohere	Third-Party
Embed	OCIGenAIEmbeddings	Private Cloud
Embed	OllamaEmbeddings	On-Premises
Embed	HuggingFaceEndpointEmbeddings	On-Premises
Embed	CompatOpenAIEmbeddings	On-Premises
Embed	OpenAIEmbeddings	Third-Party
Embed	CohereEmbeddings	Third-Party

Configuration

The models can either be configured using environment variables or through the AI Optimizer interface. To configure models through environment variables, please read the Additional Information about the specific model you would like to configure.

To configure an LLM or embedding model from the AI Optimizer, navigate to Configuration -> Models:

Here you can add and/or configure both Large Language Models and Embedding Models.

Add/Edit

Set the API, API Keys, API URL and other parameters as required. Parameters such as Default Temperature, Context Length, and Penalties can often be found on the model card. If they are not listed, the defaults are usually sufficient.

API

The AI Optimizer supports a number of model API’s. When adding a model, choose the most appropriate Model API. If unsure, or the specific API is not listed, try CompatOpenAI or CompatOpenAIEmbeddings before opening an issue requesting an additional model API support.

There are a number of local AI Model runners that use OpenAI compatible API’s, including:

When using these local runners, select the appropriate compatible OpenAI API (Language: CompatOpenAI; Embeddings: CompatOpenAIEmbeddings).

API URL

The API URL for the model will either be the URL, including the IP or Hostname and Port, of a locally running model; or the remote URL for a Third-Party or Cloud model.

Examples:

Third-Party: OpenAI - https://api.openai.com
On-Premises: Ollama - http://localhost:11434
On-Premises: LM Studio - http://localhost:1234/v1

API Keys

Third-Party cloud models, such as OpenAI and Perplexity AI, require API Keys. These keys are tied to registered, funded accounts on these platforms. For more information on creating an account, funding it, and generating API Keys for third-party cloud models, please visit their respective sites.

On-Premises models, such as those from Ollama or HuggingFace usually do not require API Keys. These values can be left blank.

Additional Information

OCI GenAI

OCI GenAI is a fully managed service in Oracle Cloud Infrastructure (OCI) for seamlessly integrating versatile language models into a wide range of use cases, including writing assistance, summarization, analysis, and chat.

Please follow the Getting Started guide for deploying the service in your OCI tenancy.

To use OCI GenAI, the AI Optimizer must be configured for OCI access; including the Compartment OCID for the OCI GenAI service.

Skip the GUI!

You can set the following environment variables to automatically enable OCI GenAI models:

export OCI_GENAI_REGION=<OCI GenAI Service Region>
export OCI_GENAI_COMPARTMENT_ID=<OCI Compartment OCID of the OCI GenAI Service>

Alternatively, you can specify the following in the ~/.oci/config configfile under the appropriate OCI profile:

genai_compartment_id=<OCI Compartment OCID of the OCI GenAI Service>
genai_region=<OCI GenAI Region>

Ollama

Ollama is an open-source project that simplifies the running of LLMs and Embedding Models On-Premises.

When configuring an Ollama model in the AI Optimizer, set the API Server URL (e.g http://127.0.0.1:11434) and leave the API Key blank. Substitute the IP Address with the IP of where Ollama is running.

Skip the GUI!

You can set the following environment variable to automatically set the API Server URL and enable Ollama models (change the IP address and Port, as applicable to your environment):

export ON_PREM_OLLAMA_URL=http://127.0.0.1:11434

Quick-start

Example of running llama3.1 on a Linux host:

Install Ollama:

sudo curl -fsSL https://ollama.com/install.sh | sh

Pull the llama3.1 model:

ollama pull llama3.1

Start Ollama

ollama serve

For more information and instructions on running Ollama on other platforms, please visit the Ollama GitHub Repository.

HuggingFace

HuggingFace is a platform where the machine learning community collaborates on models, datasets, and applications. It provides a large selection of models that can be run both in the cloud and On-Premises.

Skip the GUI!

You can set the following environment variable to automatically set the API Server URL and enable HuggingFace models (change the IP address and Port, as applicable to your environment): :

export ON_PREM_HF_URL=http://127.0.0.1:8080

Quick-start

Example of running thenlper/gte-base in a container:

Set the Model based on CPU or GPU
- For CPUs: export HF_IMAGE=ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
- For GPUs: export HF_IMAGE=ghcr.io/huggingface/text-embeddings-inference:0.6

Define a Temporary Volume

export TMP_VOLUME=/tmp/hf_data
mkdir -p $TMP_VOLUME

Define the Model
```
export HF_MODEL=thenlper/gte-base
```

Start the Container

podman run -d -p 8080:80 -v $TMP_VOLUME:/data --name hftei-gte-base \
    --pull always ${image} --model-id $HF_MODEL --max-client-batch-size 5024

Determine the IP
```
docker inspect hftei-gte-base | grep IPA
```
NOTE: If there is no IP, use 127.0.0.1

Cohere

Cohere is an AI-powered answer engine. To use Cohere, you will need to sign-up and provide the AI Optimizer an API Key. Cohere offers a free-trial, rate-limited API Key.

WARNING: Cohere is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the AI Optimizer.

Skip the GUI!

You can set the following environment variable to automatically set the API Key and enable Perplexity models: :

export COHERE_API_KEY=<super-secret API Key>

OpenAI

OpenAI is an AI research organization behind the popular, online ChatGPT chatbot. To use OpenAI models, you will need to sign-up, purchase credits, and provide the AI Optimizer an API Key.

WARNING: OpenAI is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the AI Optimizer.

Skip the GUI!

You can set the following environment variable to automatically set the API Key and enable OpenAI models: :

export OPENAI_API_KEY=<super-secret API Key>

Compatible OpenAI

Many “AI Runners” provide OpenAI compatible APIs. These can be used without any specific API by using the CompatOpenAI API specification. The API URL will normally be a local address and the API Key can be left blank.

Perplexity AI

Perplexity AI is an AI-powered answer engine. To use Perplexity AI models, you will need to sign-up, purchase credits, and provide the AI Optimizer an API Key.

WARNING: Perplexity AI is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the AI Optimizer.

Skip the GUI!

You can set the following environment variable to automatically set the API Key and enable Perplexity models: :

export PPLX_API_KEY=<super-secret API Key>