Voice Output¶
A real agent often needs to talk, not just type. This notebook pairs a regular chat-completions agent (text in, text out) with an audio-capable model on OCI Generative AI so the response can be spoken aloud.
Pipeline::
user prompt ──▶ Agent (OCIChatCompletionsModel · chat model)
│
│ reply text
▼
OCI /openai/v1/audio/speech
(openai.gpt-4o-mini-tts)
│
│ mp3 bytes
▼
./notebook_66_response.mp3
- Same OCI v1 transport as the rest of the notebooks — one signer, one base URL, one set of credentials. No separate audio service to configure.
- Bring-your-own-voice via the
voice=parameter (alloy, ash, ballad, coral, echo, sage, shimmer, verse). - Output is a normal MP3 you can pipe into a frontend
<audio>element, an IVR system, or a podcast feed.
Prerequisites: an audio-capable model on OCI Generative AI. The
notebook uses openai.gpt-4o-mini-tts for synthesis.
Run it:
LOCUS_MODEL_PROVIDER=oci \
LOCUS_OCI_PROFILE=MY_PROFILE \
LOCUS_OCI_REGION=us-chicago-1 \
LOCUS_OCI_AUTH_TYPE=security_token \
LOCUS_OCI_COMPARTMENT=ocid1.compartment.oc1..… \
python examples/notebook_66_audio_response.py
afplay notebook_66_response.mp3 # macOS
This notebook does not run under LOCUS_MODEL_PROVIDER=mock — it
builds an OCI signer directly, so it needs real OCI credentials.
Source¶
#!/usr/bin/env python3
# Copyright (c) 2025, 2026 Oracle and/or its affiliates.
# Licensed under the Universal Permissive License v1.0 as shown at
# https://oss.oracle.com/licenses/upl/
"""Notebook 61: Voice output — turn an agent's reply into speech.
A real agent often needs to talk, not just type. This notebook pairs a
regular chat-completions agent (text in, text out) with an audio-capable
model on OCI Generative AI so the response can be spoken aloud.
Pipeline::
user prompt ──▶ Agent (OCIChatCompletionsModel · chat model)
│
│ reply text
▼
OCI /openai/v1/audio/speech
(openai.gpt-4o-mini-tts)
│
│ mp3 bytes
▼
./notebook_66_response.mp3
- Same OCI v1 transport as the rest of the notebooks — one signer, one
base URL, one set of credentials. No separate audio service to
configure.
- Bring-your-own-voice via the voice= parameter (alloy, ash, ballad,
coral, echo, sage, shimmer, verse).
- Output is a normal MP3 you can pipe into a frontend <audio> element,
an IVR system, or a podcast feed.
Prerequisites: an audio-capable model on OCI Generative AI. The
notebook uses openai.gpt-4o-mini-tts for synthesis.
Run it
LOCUS_MODEL_PROVIDER=oci \\
LOCUS_OCI_PROFILE=MY_PROFILE \\
LOCUS_OCI_REGION=us-chicago-1 \\
LOCUS_OCI_AUTH_TYPE=security_token \\
LOCUS_OCI_COMPARTMENT=ocid1.compartment.oc1..… \\
python examples/notebook_66_audio_response.py
afplay notebook_66_response.mp3 # macOS
# or open it in any media player
Note: this notebook does not run under LOCUS_MODEL_PROVIDER=mock —
it builds an OCI signer directly, so it needs real OCI credentials.
The smoke test for mock environments is `python -m py_compile <file>`.
"""
from __future__ import annotations
import asyncio
import os
from pathlib import Path
from config import get_model
from locus.agent import Agent, AgentConfig
PROMPT = (
"Give me a 25-word elevator pitch for the locus SDK aimed at a senior "
"platform engineer. Speak it in the second person."
)
TTS_MODEL = "openai.gpt-4o-mini-tts"
TTS_VOICE = "alloy"
OUT_PATH = Path(__file__).resolve().parent / "notebook_66_response.mp3"
def _build_oci_audio_client():
"""Reuse the OCI v1 signer to talk to /openai/v1/audio/speech.
OCIChatCompletionsModel wraps chat completions; for audio.speech.create we
attach the same signer to a fresh openai.AsyncOpenAI so the audio
endpoint goes through the same authenticated transport.
"""
import httpx
import openai
from locus.models.providers.oci._signing import OCIRequestSigner
from locus.models.providers.oci.openai_compat import build_oci_openai_base_url
profile = os.environ.get("LOCUS_OCI_PROFILE", "DEFAULT")
region = os.environ.get("LOCUS_OCI_REGION", "us-chicago-1")
compartment_id = os.environ.get("LOCUS_OCI_COMPARTMENT")
# Build the signer the same way OCIChatCompletionsModel does internally.
import oci # noqa: PLC0415 — optional dep, lives behind the [oci] extra
cfg = oci.config.from_file(profile_name=profile)
auth_type = os.environ.get("LOCUS_OCI_AUTH_TYPE", "api_key")
if auth_type == "security_token":
token_file = os.path.expanduser(cfg["security_token_file"])
key_file = os.path.expanduser(cfg["key_file"])
with open(token_file, encoding="utf-8") as fh:
token = fh.read().strip()
private_key = oci.signer.load_private_key_from_file(key_file)
signer = oci.auth.signers.SecurityTokenSigner(token, private_key)
else:
signer = oci.signer.Signer.from_config(cfg)
http_client = httpx.AsyncClient(
auth=OCIRequestSigner(signer, compartment_id=compartment_id),
timeout=httpx.Timeout(60.0, connect=10.0),
)
return openai.AsyncOpenAI(
api_key="not-used",
base_url=build_oci_openai_base_url(region),
http_client=http_client,
)
async def main() -> None:
print("Notebook 61: Voice output via OCI Generative AI text-to-speech")
print("=" * 60)
# Step 1: a regular Locus Agent answers the prompt as text.
agent = Agent(
config=AgentConfig(
agent_id="elevator-pitch",
model=get_model(max_tokens=600),
system_prompt=(
"You are a senior developer-relations engineer. Reply in "
"natural spoken English, no markdown, no bullet points."
),
max_iterations=2,
)
)
print(f"\n→ asking the agent: {PROMPT!r}")
result = agent.run_sync(PROMPT)
reply = (result.message or "").strip()
if not reply:
msg = "Agent returned no text — check provider creds + max_tokens"
raise RuntimeError(msg)
print(f"\n← agent reply ({len(reply)} chars):\n{reply}\n")
# Step 2: synthesise speech through the OCI v1 audio.speech endpoint.
print(f"→ synthesising speech with model={TTS_MODEL!r} voice={TTS_VOICE!r}")
client = _build_oci_audio_client()
speech = await client.audio.speech.create(
model=TTS_MODEL,
voice=TTS_VOICE,
input=reply,
response_format="mp3",
)
audio_bytes = await speech.aread()
OUT_PATH.write_bytes(audio_bytes)
print(f"\n✓ wrote {len(audio_bytes):,} bytes of mp3 → {OUT_PATH}")
print(" Play it on macOS: afplay notebook_66_response.mp3")
print(" Linux (mpg123): mpg123 notebook_66_response.mp3")
print(" Browser (file:// URL): open notebook_66_response.mp3")
if __name__ == "__main__":
asyncio.run(main())