Deploy¶
AgentServer is a drop-in FastAPI wrapper. Deploys anywhere FastAPI
runs. This guide covers the four most common targets: OCI Functions,
OCI Container Instances, OKE / Kubernetes, and OCI Compute.
The shape you ship¶
# server.py
from locus import Agent
from locus.server import AgentServer
from locus.memory.backends import OCIBucketBackend
agent = Agent(
model="oci:openai.gpt-5.5",
tools=[...],
system_prompt="...",
checkpointer=OCIBucketBackend(
bucket_name="locus-threads",
namespace="<your-namespace>",
),
)
server = AgentServer(
agent=agent,
title="Booking concierge",
cors_origins=["https://app.example.com"],
)
if __name__ == "__main__":
server.run(host="0.0.0.0", port=8080)
You get out of the box:
POST /invoke— synchronous run, fullAgentResultJSON.POST /stream— Server-Sent Events of every typed event.GET / DELETE /threads/{id}— conversation persistence.GET /health— liveness probe.
OCI Functions — serverless, scale to zero¶
Best for low-frequency or bursty traffic. Pay only when the function runs.
# Dockerfile
FROM fnproject/python:3.11-fdk
COPY requirements.txt /function/
RUN pip install -r /function/requirements.txt
COPY server.py /function/
CMD ["server.handler"]
# func.yaml
schema_version: 20180708
name: locus-concierge
version: 0.1.0
runtime: python
build_image: fnproject/python:3.11-fdk-build
run_image: fnproject/python:3.11-fdk
entrypoint: /python/bin/fdk /function/server.py handler
Deploy:
Functions inherit OCI workload identity, so the agent
authenticates to OCI Generative AI without explicit credentials. Set
OCI_AUTH_TYPE=resource_principal in the function configuration.
OCI Container Instances — managed, no cluster¶
Best when you want a long-running container without operating Kubernetes.
# 1. Build and push
docker build -t \
iad.ocir.io/$NAMESPACE/locus-concierge:0.1.0 .
docker push \
iad.ocir.io/$NAMESPACE/locus-concierge:0.1.0
# 2. Create the container instance
oci container-instances container-instance create \
--availability-domain "$AD" \
--compartment-id "$COMPARTMENT" \
--containers '[{
"image-url": "iad.ocir.io/'$NAMESPACE'/locus-concierge:0.1.0",
"display-name": "concierge",
"command": ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]
}]' \
--shape "CI.Standard.E4.Flex" \
--shape-config '{"ocpus":1,"memoryInGBs":4}'
Container Instances also support instance_principal auth — the
running container can call OCI services without a stored API key.
Set OCI_AUTH_TYPE=instance_principal in the container env.
OKE — Kubernetes for production¶
Best for multi-replica, autoscaled, multi-region production. The
quickest path is the bundled Helm chart at
deploy/helm/locus-agent/:
helm install locus-agent ./deploy/helm/locus-agent \
--set image.repository=iad.ocir.io/$NAMESPACE/locus-concierge \
--set image.tag=0.1.0 \
--set auth.apiKey=$(openssl rand -hex 16) \
--set ociBucket.enabled=true \
--set ociBucket.bucketName=locus-threads-prod \
--set ociBucket.namespace=$NAMESPACE \
--set autoscaling.enabled=true \
--set autoscaling.minReplicas=2 \
--set autoscaling.maxReplicas=10
The chart ships a Deployment, Service, ServiceAccount (with workload-
identity annotation hooks), Secret, HPA, and Ingress, all driven by
values.yaml. See deploy/helm/locus-agent/README.md
for the full value reference.
The container image is built from the root Dockerfile — multi-stage, non-root user, HEALTHCHECK on /health. Build it with:
docker build -t iad.ocir.io/$NAMESPACE/locus-concierge:0.1.0 .
docker push iad.ocir.io/$NAMESPACE/locus-concierge:0.1.0
If you need raw YAML instead of Helm, the equivalent is:
apiVersion: apps/v1
kind: Deployment
metadata: { name: concierge }
spec:
replicas: 3
selector: { matchLabels: { app: concierge } }
template:
metadata: { labels: { app: concierge } }
spec:
serviceAccountName: locus-workload # OCI workload identity
containers:
- name: concierge
image: iad.ocir.io/$NAMESPACE/locus-concierge:0.1.0
ports: [{ containerPort: 8080 }]
env:
- { name: OCI_AUTH_TYPE, value: instance_principal }
- { name: LOCUS_THREAD_BUCKET, value: locus-threads-prod }
readinessProbe:
httpGet: { path: /health, port: 8080 }
resources:
requests: { cpu: 500m, memory: 1Gi }
limits: { cpu: 2, memory: 4Gi }
---
apiVersion: v1
kind: Service
metadata: { name: concierge }
spec:
type: LoadBalancer
selector: { app: concierge }
ports: [{ port: 80, targetPort: 8080 }]
For SSE streaming, ensure your ingress / load balancer doesn't
buffer the response (X-Accel-Buffering: no on nginx,
response_buffering: off equivalent on OCI Load Balancer).
OCI Compute — full VM control¶
Best when you need raw VM access or run the agent alongside other local services.
# On the compute instance:
pip install "locus-sdk[oci]"
git clone https://github.com/oracle-samples/locus.git ~/concierge
cd ~/concierge
# Launch under systemd
sudo tee /etc/systemd/system/concierge.service <<EOF
[Unit]
Description=Locus concierge agent
After=network.target
[Service]
Type=simple
User=opc
Environment=OCI_AUTH_TYPE=instance_principal
ExecStart=/home/opc/.local/bin/uvicorn server:app --host 0.0.0.0 --port 8080
Restart=always
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable --now concierge
Auth — one surface, four targets¶
| Deployment target | Recommended OCI_AUTH_TYPE |
|---|---|
| OCI Functions | resource_principal |
| OCI Container Instances | instance_principal |
| OKE | instance_principal (via workload identity) |
| OCI Compute | instance_principal |
| Laptop / CI | api_key or session_token |
Locus picks the right OCI signer from the OCI_AUTH_TYPE env var.
You don't change application code between environments.
Sessions — X-Session-ID for chat UIs¶
When the underlying agent has a checkpointer, AgentServer
honours the X-Session-ID header (or thread_id in the body) for
cross-request continuity. Same browser tab → same thread → same
context. Drop the header, drop the thread.
POST /invoke
X-Session-ID: user-c42-support
Content-Type: application/json
{"prompt": "What were we discussing?"}
Observability¶
Wire TelemetryHook to your OTLP collector for traces and metrics.
Set the exporter target via the standard OpenTelemetry environment
variables before the agent starts:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
from locus.hooks.builtin import TelemetryHook
agent = Agent(
...,
hooks=[TelemetryHook(service_name="my-agent")],
)
OCI APM accepts OTLP. So do Honeycomb, Tempo, Grafana Cloud, and every other backend that speaks the spec. See Observability.
See also¶
- Agent Server — the FastAPI wrapper in detail.
- Conversation Management —
how
thread_idsurvives across requests and restarts. - OCI GenAI models — auth and transport selection.