๐ค Model Configuration
Supported Models
At a minimum, a Large Language Model (LLM) must be configured in AI Optimizer for basic functionality. For Retrieval-Augmented Generation (RAG), an Embedding Model will also need to be configured.
Model APIs
If there is a specific model API that you would like to use, please open an issue in GitHub.
Type | API | Location |
---|---|---|
LLM | ChatOCIGenAI | Private Cloud |
LLM | ChatOllama | On-Premises |
LLM | CompatOpenAI | On-Premises |
LLM | OpenAI | Third-Party |
LLM | ChatPerplexity | Third-Party |
LLM | Cohere | Third-Party |
Embed | OCIGenAIEmbeddings | Private Cloud |
Embed | OllamaEmbeddings | On-Premises |
Embed | HuggingFaceEndpointEmbeddings | On-Premises |
Embed | CompatOpenAIEmbeddings | On-Premises |
Embed | OpenAIEmbeddings | Third-Party |
Embed | CohereEmbeddings | Third-Party |
Configuration
The models can either be configured using environment variables or through the AI Optimizer interface. To configure models through environment variables, please read the Additional Information about the specific model you would like to configure.
To configure an LLM or embedding model from the AI Optimizer, navigate to Configuration -> Models
:
Here you can add and/or configure both Large Language Models and Embedding Models.
Add/Edit
Set the API, API Keys, API URL and other parameters as required. Parameters such as Default Temperature, Context Length, and Penalties can often be found on the model card. If they are not listed, the defaults are usually sufficient.
API
The AI Optimizer supports a number of model API’s. When adding a model, choose the most appropriate Model API. If unsure, or the specific API is not listed, try CompatOpenAI or CompatOpenAIEmbeddings before opening an issue requesting an additional model API support.
There are a number of local AI Model runners that use OpenAI compatible API’s, including:
When using these local runners, select the appropriate compatible OpenAI API (Language: CompatOpenAI; Embeddings: CompatOpenAIEmbeddings).
API URL
The API URL for the model will either be the URL, including the IP or Hostname and Port, of a locally running model; or the remote URL for a Third-Party or Cloud model.
Examples:
- Third-Party: OpenAI - https://api.openai.com
- On-Premises: Ollama - http://localhost:11434
- On-Premises: LM Studio - http://localhost:1234/v1
API Keys
Third-Party cloud models, such as OpenAI and Perplexity AI, require API Keys. These keys are tied to registered, funded accounts on these platforms. For more information on creating an account, funding it, and generating API Keys for third-party cloud models, please visit their respective sites.
On-Premises models, such as those from Ollama or HuggingFace usually do not require API Keys. These values can be left blank.
Additional Information
OCI GenAI
OCI GenAI is a fully managed service in Oracle Cloud Infrastructure (OCI) for seamlessly integrating versatile language models into a wide range of use cases, including writing assistance, summarization, analysis, and chat.
Please follow the Getting Started guide for deploying the service in your OCI tenancy.
To use OCI GenAI, the AI Optimizer must be configured for OCI access; including the Compartment OCID for the OCI GenAI service.
Skip the GUI!
You can set the following environment variables to automatically enable OCI GenAI models:
export OCI_GENAI_SERVICE_ENDPOINT=<OCI GenAI Service Endpoint>
export OCI_GENAI_COMPARTMENT_ID=<OCI Compartment OCID of the OCI GenAI Service>
Alternatively, you can specify the following in the ~/.oci/config
configfile under the appropriate OCI profile:
service_endpoint=<OCI GenAI Service Endpoint>
compartment_id=<OCI Compartment OCID of the OCI GenAI Service>
Ollama
Ollama is an open-source project that simplifies the running of LLMs and Embedding Models On-Premises.
When configuring an Ollama model in the AI Optimizer, set the API Server
URL (e.g http://127.0.0.1:11434
) and leave the API Key blank. Substitute the IP Address with the IP of where Ollama is running.
Skip the GUI!
You can set the following environment variable to automatically set the API Server
URL and enable Ollama models (change the IP address and Port, as applicable to your environment):
export ON_PREM_OLLAMA_URL=http://127.0.0.1:11434
Quick-start
Example of running llama3.1 on a Linux host:
- Install Ollama:
sudo curl -fsSL https://ollama.com/install.sh | sh
- Pull the llama3.1 model:
ollama pull llama3.1
- Start Ollama
ollama serve
For more information and instructions on running Ollama on other platforms, please visit the Ollama GitHub Repository.
HuggingFace
HuggingFace is a platform where the machine learning community collaborates on models, datasets, and applications. It provides a large selection of models that can be run both in the cloud and On-Premises.
Skip the GUI!
You can set the following environment variable to automatically set the API Server
URL and enable HuggingFace models (change the IP address and Port, as applicable to your environment):
:
export ON_PREM_HF_URL=http://127.0.0.1:8080
Quick-start
Example of running thenlper/gte-base in a container:
Set the Model based on CPU or GPU
- For CPUs:
export HF_IMAGE=ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
- For GPUs:
export HF_IMAGE=ghcr.io/huggingface/text-embeddings-inference:0.6
- For CPUs:
Define a Temporary Volume
export TMP_VOLUME=/tmp/hf_data mkdir -p $TMP_VOLUME
Define the Model
export HF_MODEL=thenlper/gte-base
Start the Container
podman run -d -p 8080:80 -v $TMP_VOLUME:/data --name hftei-gte-base \ --pull always ${image} --model-id $HF_MODEL --max-client-batch-size 5024
Determine the IP
docker inspect hftei-gte-base | grep IPA
NOTE: If there is no IP, use 127.0.0.1
Cohere
Cohere is an AI-powered answer engine. To use Cohere, you will need to sign-up and provide the AI Optimizer an API Key. Cohere offers a free-trial, rate-limited API Key.
WARNING: Cohere is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the AI Optimizer.
Skip the GUI!
You can set the following environment variable to automatically set the API Key
and enable Perplexity models:
:
export COHERE_API_KEY=<super-secret API Key>
OpenAI
OpenAI is an AI research organization behind the popular, online ChatGPT chatbot. To use OpenAI models, you will need to sign-up, purchase credits, and provide the AI Optimizer an API Key.
WARNING: OpenAI is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the AI Optimizer.
Skip the GUI!
You can set the following environment variable to automatically set the API Key
and enable OpenAI models:
:
export OPENAI_API_KEY=<super-secret API Key>
Compatible OpenAI
Many “AI Runners” provide OpenAI compatible APIs. These can be used without any specific API by using the CompatOpenAI API specification. The API URL will normally be a local address and the API Key can be left blank.
Perplexity AI
Perplexity AI is an AI-powered answer engine. To use Perplexity AI models, you will need to sign-up, purchase credits, and provide the AI Optimizer an API Key.
WARNING: Perplexity AI is a cloud model and you should familiarize yourself with their Privacy Policies if using it to experiment with private, sensitive data in the AI Optimizer.
Skip the GUI!
You can set the following environment variable to automatically set the API Key
and enable Perplexity models:
:
export PPLX_API_KEY=<super-secret API Key>