The Eval-Deck Theatre
A score in a slide deck. Untestable in prod, never re-run after launch. Procurement signs the deck. Attackers read it for hints.
One Living Cert score from red-team pass-rate, firewall block-rate, and intent failure. Signed, public, embeddable, auto-revocable.
A score in a slide deck. Untestable in prod, never re-run after launch. Procurement signs the deck. Attackers read it for hints.
Five vendors — eval, firewall, observability, SBOM, red-team. One pager when something breaks at 2 AM. Zero accountability for the score.
Jailbreak-tree, crescendo, best-of-n harnesses pointed at your agent on every commit. The number that lands on the cert is the number an attacker would respect.
Block prompt injection, exfiltration, PII leaks, canary echoes — at the trace level, in milliseconds. Same policy, same provenance, every model you front.
RS256-signed JWT. Public verify URL. Embeddable badge. Hand it to procurement, hand it to your bond underwriter, auto-revoke when the score drops.
Drop the SDK in front of any model or MCP server. Prompt-injection patterns, exfiltration, PII echoes, canary leaks — caught at the trace, scored, and shipped to your Findings inbox in under 15 ms.
Jailbreak-tree, crescendo, best-of-n, social-engineering harnesses fired against your agent on every commit. Each row is one harness, one verdict, one piece of evidence the cert can sign.
Your Living Cert is the pricing input. The underwriter quotes against the score, not against vibes. Aggregate limit, per-claim limit, premium — visible the same afternoon you mint the cert.
Embed the public verify URL on your trust page. Hand the signed JWT to the buyer's security team. The cert revokes itself the moment the score drops.
One SDK in front of OpenAI, Anthropic, Bedrock, and your MCP servers. Same policy, same provenance, same Findings inbox.
The cert pillars are the underwriter's pricing input. Aggregate limit, per-claim limit, premium — visible the same afternoon you mint the cert.
import { defineAgent } from "@trident/sdk";
export default defineAgent({
id: "prod-rag-bot",
models: ["openai/gpt-5.1", "anthropic/sonnet-4-6"],
firewall: {
block: ["prompt-injection", "secret-exfil", "canary"],
anonymize: ["pii"],
p99Budget: "15ms",
},
redteam: {
harnesses: ["jailbreak:tree", "crescendo", "best-of-n"],
schedule: "on every commit + daily 03:00 UTC",
},
cert: {
issuer: "trident.dev",
rotateEvery: "90d",
revokeOn: { score: { lt: 80 } },
embedAt: "https://your.site/trust",
},
});An RS256-signed JWT issued for one of your agents. The payload carries three pillars — red-team pass-rate, firewall block-rate, intent failure-rate — plus an SBOM hash and an expiry. It's verifiable at a public URL, embeddable as a badge, and auto-revokes the moment the score drops below your threshold.
p99 of 14 ms in front of OpenAI, Anthropic, and Bedrock with our default policy set. Most calls are under 8 ms. The firewall runs as a sidecar SDK or as a hosted edge — same policy, same provenance, your choice.
They each own one slice — eval, firewall, or harness — and stop at a dashboard. Trident is the cert layer: the score, the artifact procurement signs, and the input the bond underwriter prices against. Run their tooling alongside ours; the cert is what travels.
Lloyd's syndicates and a small number of US carriers, scoped to the Living Cert. We don't take a cut of premium; we sit on the underwriting committee so the cert pillars match what carriers actually price against.
Both. Self-host the firewall and the harness inside your VPC; mint and serve the cert from our managed cloud. Same SDK, same scoring math.
Free to mint your first cert and run the firewall for a single agent. Team and enterprise tiers price by traffic and number of agents — talk to us for a number.