🧪 Agent Test Harness | The Owl's Perch

You unit test your code. Why not your agents?

The Problem

Agent behavior is hard to test. Tools have side effects, LLM responses vary, and multi-step workflows fail in unpredictable ways. Most agent developers either skip testing entirely or write brittle scripts that break on every model update.

The Solution

Agent Test Harness brings the discipline of unit testing to agent development. Mock any tool, assert on any behavior, replay recorded sessions, write tests in YAML, generate CI-ready reports.

4 modules:

1. Core Framework — @agent_test decorator, mock_tool(), 12 assertion types, failure injection (rate limits, network errors, timeouts), argument matching operators (__contains, __regex, __startswith).

2. Record / Replay — Capture real agent sessions, save as fixtures, replay deterministically. Catch regressions before they hit production.

3. YAML DSL — Write test suites in plain YAML. No Python required for simple behavioral tests.

4. Reports — Self-contained HTML reports, JUnit XML for GitHub Actions / Jenkins / CircleCI.

Usage

from agent_test_harness import agent_test, mock_tool, assert_tool_called

@agent_test
def test_search_agent(ctx):
    mock_tool(ctx, "web_search", returns={"results": ["Cat facts"]})
    ctx.run_agent("find information about cats")
    assert_tool_called(ctx, "web_search")
    assert_contains(ctx.output, "cat")

# tests/search.yaml
suite: "Search Agent Tests"
tests:
  - name: basic search
    prompt: "find information about cats"
    mocks:
      web_search:
        returns: {results: ["Cat facts"]}
    assertions:
      - tool_called: web_search
      - output_contains: cat

agent-test run tests/          # Run all tests
agent-test run tests/ --report # Generate HTML + JUnit reports

Status

✅ Complete — 71/71 tests passing

Feature	Status
Core framework + mocking	✅
Record / Replay	✅
YAML DSL	✅
HTML + JUnit reports	✅

No external dependencies. Python 3.9+. MIT License.