Multi-modal providers¶
Non-LLM provider Protocols — web search, web fetch, image generation,
text-to-speech, speech recognition. Setting any of them on
AgentConfig (web_search=, web_fetch=, image_generator=,
speech_provider=) auto-registers a matching @tool so the model
can call the capability the same way it calls any other tool.
For LLM providers, see Models. For embedding providers and vector stores, see RAG.
Web search¶
BaseWebSearchProvider ¶
OpenAISearchPreviewProvider ¶
Web search via OpenAI's gpt-4o-search-preview model.
The search-preview chat completions return annotated URLs + snippets inline; we ask the model to return a strict JSON list and parse it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
OpenAIModel
|
An :class: |
required |
max_chars_per_snippet
|
int
|
Cap each snippet to this length so the agent context doesn't blow up. |
500
|
Source code in src/locus/providers/web_search.py
Web fetch¶
BaseWebFetchProvider ¶
Bases: Protocol
Protocol every web-fetch provider must implement.
fetch
async
¶
Fetch url and return a normalized :class:WebPage.
Implementations should follow redirects, time out within a
reasonable budget, and cap the returned text at max_chars
to keep it agent-context friendly.
Source code in src/locus/providers/web_fetch.py
HTTPXWebFetcher ¶
HTTPXWebFetcher(*, timeout_seconds: float = 10.0, user_agent: str = 'locus-web-fetch/1.0', follow_redirects: bool = True)
Default web-fetch provider using httpx + a stdlib HTML→text shim.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
timeout_seconds
|
float
|
Per-request timeout. Default 10s. |
10.0
|
user_agent
|
str
|
|
'locus-web-fetch/1.0'
|
follow_redirects
|
bool
|
Whether to follow redirects. Default True. |
True
|
Source code in src/locus/providers/web_fetch.py
Image generation¶
BaseImageGenerationProvider ¶
Bases: Protocol
Protocol every image-generation provider must implement.
generate
async
¶
Generate n images for prompt and return their refs.
ImageResult ¶
Bases: BaseModel
One generated image, exposed as either a URL or base64 PNG.
Speech (TTS + ASR)¶
BaseSpeechProvider ¶
Bases: Protocol
Protocol every speech provider must implement.
A provider may implement only one direction (e.g. TTS-only) by
raising NotImplementedError from the unsupported method;
callers can detect via the capabilities attribute.
SynthesizedAudio ¶
Bases: BaseModel
Output of speak(text) — the audio bytes + their content type.
SpeechTranscript ¶
Bases: BaseModel
Output of transcribe(audio_bytes) — recognized text + metadata.
Shared types¶
SearchResult ¶
Bases: BaseModel
One result from a web-search provider.
WebPage ¶
Bases: BaseModel
A fetched web page, normalized to text.