Agent reliability, scored in real time

Catch AI agents failing silently — before your users do.

88% of agent pilots die because teams can't tell when a run quietly went wrong. Sentinel watches every run, scores its health 0 → 1, and explains exactly what broke. Framework-agnostic. Zero dependencies. Free, locally, in under 5 minutes.

Start free → Open dashboard

trace · answer_question

scored

0.00health

steps6

cost$0.0060

tokens1,420

findings1 critical

[CRIT] REPEATED_TOOL_CALL — tool 'web_search' called 6× with identical args; the agent is stuck in a loop.

0of agent pilots fail on silent errors

0failure detectors + judges

0runtime dependencies

0to first scored failure

The silent failure problem

Your agent didn't crash. It just went wrong.

A run "succeeds," returns nothing useful, loops 40 times, hallucinates a fact, or burns $40 — and nobody notices until a customer does. Uptime and error rates are blind to it, because nothing errored. Sentinel sees what they can't.

What it catches

Eight failure modes, on every single run.

Six heuristic detectors run free and locally in the SDK. Two model-graded judges add semantic checks on the hosted tier — same Trace in → findings out contract.

🔁

REPEATED_TOOL_CALL

Same tool + args N times — the agent is stuck in a loop.

🕳️

EMPTY_OUTPUT

"Succeeded" but returned nothing — the classic silent failure.

💸

COST_SPIKE

A run that crosses an absolute or relative cost threshold.

♾️

RUNAWAY_STEPS

The run blew past its step ceiling and won't terminate.

🐌

LATENCY_HIGH

Wall-clock exceeded threshold — a run that times out on users.

⚠️

SPAN_ERROR

A tool or step raised an exception mid-chain.

🎭

HALLUCINATION

Judge: the final answer asserts facts no step could support.

🎯

GOAL_INCOMPLETE

Judge: the agent stopped early or answered a different question.

Five minutes to value

Install. Instrument. Watch.

Install the SDK

One package, zero dependencies — Python or TypeScript. No account, no backend.

Instrument your agent

One import for LangChain, LlamaIndex, CrewAI, or your OpenAI/Anthropic client. It never crashes your app.

Watch the score

See failures locally for free, or set an API key to sync to your dashboard and gate CI on regressions.

my-agent · python

Pricing

Free where you build. Paid when your team depends on it.

Free

For developers — local, offline, forever.

All 6 heuristic detectors
Console scoring + CI gate script
Framework auto-instrumentation
Zero dependencies, never crashes

Start free

Team

$99/mo

For teams shipping agents to real users.

Everything in Free
Hosted dashboard + score history
Model-graded judges (hallucination, goal)
Hosted CI gate + PR comments
50,000 traces / month

Start Team trial

Enterprise

Custom

For orgs with security + compliance needs.

Everything in Team
SSO (SAML / OIDC) + RBAC
Audit export + data residency
On-prem / self-hosted
SLA + priority support

Works with your stack

If it runs in Python or Node, Sentinel can watch it.

LangChainLlamaIndexCrewAI OpenAIAnthropicLangGraphRaw / custom

Stop shipping blind

Start watching your agents in 5 minutes.

Free locally, forever. Add a key when your team needs the dashboard and CI gate.

Get started free → Log in