Evaluation¶
Treat the agent like any other piece of code. Declare cases, run them, read the report. Locus ships a small, dependency-free harness so you don't need an external eval framework for the common cases.
EvalCasedeclares prompt plus expected substrings (positive or negative).EvalRunnerruns the agent against every case.EvalReportsummarises pass/fail counts and an average score.
Run it (OCI Generative AI is the default; auto-detected from ~/.oci/config):
python examples/notebook_55_evaluation.py
Offline:
LOCUS_MODEL_PROVIDER=mock python examples/notebook_55_evaluation.py
Source¶
# Copyright (c) 2025, 2026 Oracle and/or its affiliates.
# Licensed under the Universal Permissive License v1.0 as shown at
# https://oss.oracle.com/licenses/upl/
"""Notebook 50: Evaluation — score an agent against a test suite.
Treat the agent like any other piece of code. Declare cases, run them,
read the report. Locus ships a small, dependency-free harness so you
don't need an external eval framework for the common cases.
- EvalCase declares prompt plus expected substrings (positive or negative).
- EvalRunner runs the agent against every case.
- EvalReport summarises pass/fail counts and an average score.
Run it
# Default: OCI Generative AI auto-detected from ~/.oci/config
python examples/notebook_55_evaluation.py
# Offline / no credentials:
LOCUS_MODEL_PROVIDER=mock python examples/notebook_55_evaluation.py
"""
from config import get_model
from locus.agent import Agent, AgentConfig
from locus.evaluation import EvalCase, EvalRunner
def example_evaluation():
"""Run a systematic evaluation of an agent."""
print("=== Agent Evaluation ===\n")
model = get_model()
agent = Agent(
config=AgentConfig(
system_prompt="You are a helpful assistant. Answer concisely.",
max_iterations=3,
model=model,
)
)
cases = [
EvalCase(
name="basic_knowledge",
prompt="What is the capital of France?",
expected_output_contains=["paris"],
max_iterations=3,
),
EvalCase(
name="math",
prompt="What is 15 * 7?",
expected_output_contains=["105"],
),
EvalCase(
name="no_hallucination",
prompt="What is the capital of France?",
expected_output_not_contains=["berlin", "london"],
),
]
runner = EvalRunner(agent=agent)
report = runner.run(cases)
print(report.summary())
print(f"\nTotal: {report.total_cases}, Passed: {report.passed}, Failed: {report.failed}")
print(f"Average score: {report.avg_score:.2f}")
if __name__ == "__main__":
example_evaluation()