Agent reliability, scored in real time

Catch AI agents failing silently — before your users do.

88% of agent pilots die because teams can't tell when a run quietly went wrong. Sentinel watches every run, scores its health 0 → 1, and explains exactly what broke. Framework-agnostic. Zero dependencies. Free, locally, in under 5 minutes.

trace · answer_question
scored
0.00health
steps6
cost$0.0060
tokens1,420
findings1 critical
[CRIT] REPEATED_TOOL_CALL — tool 'web_search' called 6× with identical args; the agent is stuck in a loop.
0of agent pilots fail on silent errors
0failure detectors + judges
0runtime dependencies
0to first scored failure
The silent failure problem

Your agent didn't crash. It just went wrong.

A run "succeeds," returns nothing useful, loops 40 times, hallucinates a fact, or burns $40 — and nobody notices until a customer does. Uptime and error rates are blind to it, because nothing errored. Sentinel sees what they can't.

What it catches

Eight failure modes, on every single run.

Six heuristic detectors run free and locally in the SDK. Two model-graded judges add semantic checks on the hosted tier — same Trace in → findings out contract.

🔁
REPEATED_TOOL_CALL

Same tool + args N times — the agent is stuck in a loop.

🕳️
EMPTY_OUTPUT

"Succeeded" but returned nothing — the classic silent failure.

💸
COST_SPIKE

A run that crosses an absolute or relative cost threshold.

♾️
RUNAWAY_STEPS

The run blew past its step ceiling and won't terminate.

🐌
LATENCY_HIGH

Wall-clock exceeded threshold — a run that times out on users.

⚠️
SPAN_ERROR

A tool or step raised an exception mid-chain.

🎭
HALLUCINATION

Judge: the final answer asserts facts no step could support.

🎯
GOAL_INCOMPLETE

Judge: the agent stopped early or answered a different question.

Five minutes to value

Install. Instrument. Watch.

Install the SDK

One package, zero dependencies — Python or TypeScript. No account, no backend.

Instrument your agent

One import for LangChain, LlamaIndex, CrewAI, or your OpenAI/Anthropic client. It never crashes your app.

Watch the score

See failures locally for free, or set an API key to sync to your dashboard and gate CI on regressions.

my-agent · python
Pricing

Free where you build. Paid when your team depends on it.

Free

$0

For developers — local, offline, forever.

  • All 6 heuristic detectors
  • Console scoring + CI gate script
  • Framework auto-instrumentation
  • Zero dependencies, never crashes
Start free
MOST POPULAR

Team

$99/mo

For teams shipping agents to real users.

  • Everything in Free
  • Hosted dashboard + score history
  • Model-graded judges (hallucination, goal)
  • Hosted CI gate + PR comments
  • 50,000 traces / month
Start Team trial

Enterprise

Custom

For orgs with security + compliance needs.

  • Everything in Team
  • SSO (SAML / OIDC) + RBAC
  • Audit export + data residency
  • On-prem / self-hosted
  • SLA + priority support
Contact us
Works with your stack

If it runs in Python or Node, Sentinel can watch it.

LangChainLlamaIndexCrewAI OpenAIAnthropicLangGraphRaw / custom
Stop shipping blind

Start watching your agents in 5 minutes.

Free locally, forever. Add a key when your team needs the dashboard and CI gate.