Skip to content

Multi-modal providers

Non-LLM provider Protocols — web search, web fetch, image generation, text-to-speech, speech recognition. Setting any of them on AgentConfig (web_search=, web_fetch=, image_generator=, speech_provider=) auto-registers a matching @tool so the model can call the capability the same way it calls any other tool.

For LLM providers, see Models. For embedding providers and vector stores, see RAG.

BaseWebSearchProvider

Bases: Protocol

Protocol every web-search provider must implement.

search async

search(query: str, *, max_results: int = 5) -> list[SearchResult]

Return max_results (or fewer) search hits for query.

Source code in src/locus/providers/web_search.py
async def search(
    self,
    query: str,
    *,
    max_results: int = 5,
) -> list[SearchResult]:
    """Return ``max_results`` (or fewer) search hits for ``query``."""
    ...

OpenAISearchPreviewProvider

OpenAISearchPreviewProvider(model: OpenAIModel, *, max_chars_per_snippet: int = 500)

Web search via OpenAI's gpt-4o-search-preview model.

The search-preview chat completions return annotated URLs + snippets inline; we ask the model to return a strict JSON list and parse it.

Parameters:

Name Type Description Default
model OpenAIModel

An :class:OpenAIModel instance. The caller picks the search-capable model id (e.g. "gpt-4o-search-preview").

required
max_chars_per_snippet int

Cap each snippet to this length so the agent context doesn't blow up.

500
Source code in src/locus/providers/web_search.py
def __init__(
    self,
    model: OpenAIModel,
    *,
    max_chars_per_snippet: int = 500,
) -> None:
    self._model = model
    self._cap = max_chars_per_snippet

Web fetch

BaseWebFetchProvider

Bases: Protocol

Protocol every web-fetch provider must implement.

fetch async

fetch(url: str, *, max_chars: int = 50000, keep_html: bool = False) -> WebPage

Fetch url and return a normalized :class:WebPage.

Implementations should follow redirects, time out within a reasonable budget, and cap the returned text at max_chars to keep it agent-context friendly.

Source code in src/locus/providers/web_fetch.py
async def fetch(
    self,
    url: str,
    *,
    max_chars: int = 50_000,
    keep_html: bool = False,
) -> WebPage:
    """Fetch ``url`` and return a normalized :class:`WebPage`.

    Implementations should follow redirects, time out within a
    reasonable budget, and cap the returned ``text`` at ``max_chars``
    to keep it agent-context friendly.
    """
    ...

HTTPXWebFetcher

HTTPXWebFetcher(*, timeout_seconds: float = 10.0, user_agent: str = 'locus-web-fetch/1.0', follow_redirects: bool = True)

Default web-fetch provider using httpx + a stdlib HTML→text shim.

Parameters:

Name Type Description Default
timeout_seconds float

Per-request timeout. Default 10s.

10.0
user_agent str

User-Agent header. Default locus-web-fetch/1.0.

'locus-web-fetch/1.0'
follow_redirects bool

Whether to follow redirects. Default True.

True
Source code in src/locus/providers/web_fetch.py
def __init__(
    self,
    *,
    timeout_seconds: float = 10.0,
    user_agent: str = "locus-web-fetch/1.0",
    follow_redirects: bool = True,
) -> None:
    self._timeout = timeout_seconds
    self._ua = user_agent
    self._follow = follow_redirects

Image generation

BaseImageGenerationProvider

Bases: Protocol

Protocol every image-generation provider must implement.

generate async

generate(prompt: str, *, size: str = '1024x1024', n: int = 1, **kwargs: Any) -> list[ImageResult]

Generate n images for prompt and return their refs.

Source code in src/locus/providers/image.py
async def generate(
    self,
    prompt: str,
    *,
    size: str = "1024x1024",
    n: int = 1,
    **kwargs: Any,
) -> list[ImageResult]:
    """Generate ``n`` images for ``prompt`` and return their refs."""
    ...

ImageResult

Bases: BaseModel

One generated image, exposed as either a URL or base64 PNG.

Speech (TTS + ASR)

BaseSpeechProvider

Bases: Protocol

Protocol every speech provider must implement.

A provider may implement only one direction (e.g. TTS-only) by raising NotImplementedError from the unsupported method; callers can detect via the capabilities attribute.

capabilities instance-attribute

capabilities: frozenset[str]

{"tts", "stt"} (either or both).

speak async

speak(text: str, *, voice: str | None = None, **kwargs: Any) -> SynthesizedAudio

Synthesize text to audio bytes.

Source code in src/locus/providers/speech.py
async def speak(
    self,
    text: str,
    *,
    voice: str | None = None,
    **kwargs: Any,
) -> SynthesizedAudio:
    """Synthesize ``text`` to audio bytes."""
    ...

transcribe async

transcribe(audio_bytes: bytes, *, content_type: str = 'audio/mpeg', **kwargs: Any) -> SpeechTranscript

Transcribe raw audio bytes to text.

Source code in src/locus/providers/speech.py
async def transcribe(
    self,
    audio_bytes: bytes,
    *,
    content_type: str = "audio/mpeg",
    **kwargs: Any,
) -> SpeechTranscript:
    """Transcribe raw audio bytes to text."""
    ...

SynthesizedAudio

Bases: BaseModel

Output of speak(text) — the audio bytes + their content type.

SpeechTranscript

Bases: BaseModel

Output of transcribe(audio_bytes) — recognized text + metadata.

Shared types

SearchResult

Bases: BaseModel

One result from a web-search provider.

WebPage

Bases: BaseModel

A fetched web page, normalized to text.