ai agent trust scoring

Run one command_

Get the full score.

Fix the gaps.

Agent Maturity Compass checks how ready your AI agent is for real work. No account. No setup maze. Run amc for the 244-question default score, then expand into lifecycle proof, Watch, Fleet, and compliance receipts.

$ npx agent-maturity-compass
244default questions
264lifecycle questions
1,144CLI command paths
147assurance packs
41industry packs
8,150collected tests

install

Start with one command_

Copy this into your agent project. AMC creates the workspace, runs the full score, and prints the result.

recommended

No install first run

$ npx agent-maturity-compass

Best for trying AMC once. It runs the full score immediately.

copy command →
daily use

Install globally

$ npm i -g agent-maturity-compass
$ amc

Install once, then run amc in any agent project.

copy command →

desktop studio

Prefer buttons? Run Studio.

AMC includes a local desktop-style control panel for scores, evidence, reports, assurance packs, Industry Packs, and run history. Start it with one command.

local app no account score dashboard signed evidence
amc up
$ amc up
Studiohttp://127.0.0.1:3212
Run full scorebutton + CLI
Assurance packs147 adversarial packs
Industry Packs$9.99/month unlock
Reportsaudit-ready exports
scoreSee the trust baseline

Overall level, layer bars, evidence coverage, and top gaps.

packsBrowse free and paid depth

Assurance is free core. Industry Packs show locked and unlocked states.

reportsShare with the team

Generate outputs for engineering, compliance, and leadership review.

current capabilities

More than a score.
Proof you can operate_

AMC now connects the default score to proof receipts, runtime Watch signals, fleet topology, policy enforcement, compliance binders, and portable evidence. These are AMC-owned surfaces, not source-specific wrappers.

score + proof244 default, 264 lifecycle

Run the stable default bank, or opt into lifecycle questions for governance, evidence binding, runtime watch, proof exports, memory, and fleet operation.

domain proof laneamc proof check

Source-to-rule checks emit amcproof artifacts and fail closed as unsupported when correctness proof is missing.

watchSession, SLO, incident receipts

Correlate sessions, risk, cost, latency, override near-misses, and incident-to-regression closure with signed evidence refs.

fleetOverview + trust graph

Use fleet overview, graph validation, SLO status, and trust-graph exports before scoring multi-agent systems.

enforce + shieldPolicy and safe proof

Runtime firewall decisions, resource snapshots, safe confirmation proof, and 147 assurance packs stay in the free core.

docs + API1,144 CLI paths

The generated command inventory, API reference, OpenAPI playground, and docs hub expose the same product surface users run locally.

self-reported
100
evidence-verified
16
=
trust gap
84

AMC separates what an agent claims from what execution evidence proves.

0.4xself-reported weight
1.0xobserved evidence weight
2.5xtrust multiplier

the trust stack

Eight named surfaces_
One complete trust platform

The top of the page is intentionally simple. This is the deeper AMC map: score, harden, enforce, prove, monitor, comply, govern fleets, and issue portable trust identity.

platform surfaces

Score trust before you ship

Execution-backed scoring, assurance, enforcement, evidence, monitoring, compliance, fleet management, and portable trust identity.

amc score

integrations

Works with your stack_

14 built-in adapters. Works with OpenAI-compatible endpoints. No framework rewrite required.

LangChain

LangChain

CrewAI

CrewAI

Anthropic

Anthropic

Google

Google

Meta

Meta

Claude

Claude

Gemini

Gemini

Python SDK

Python SDK

Hugging Face

Hugging Face

Google Cloud

Google Cloud

Databricks

Databricks

Docker

Docker

PyTorch

PyTorch

GitHub

GitHub

our approach

Expertise built on
evidence_

01

Score and analyze

Point AMC at any agent. The default 244-question full score runs against execution evidence, while the optional 264-question lifecycle set adds runtime, proof, memory, and fleet coverage.

02

Fix and harden

AMC identifies high-signal gaps, generates targeted guardrail work, and uses 147 assurance packs for prompt injection, exfiltration, memory risk, sycophancy, and other adversarial failures.

03

Ship and monitor

Add CI gates, keep signed local evidence, prevent trust regressions, and generate compliance artifacts for EU AI Act, ISO 42001, NIST AI RMF, SOC 2, and OWASP LLM Top 10 review.

research foundation

Built on primary sources_

AMC is grounded in AI safety, security, governance, compliance, and agent-system research. The modules map papers and standards into scoring controls, assurance packs, and operational evidence.

NIST

NIST AI Risk Management Framework

Core governance structure for trustworthy AI deployment and evaluation.

maps to governance, risk, and cross-framework evidencenist.gov →
ICML 2024

Levels of AGI — Operationalizing Progress

Maturity levels framework for AI capabilities and autonomy.

maps to L0-L5 maturity and autonomy scoringarxiv 2311.02462 →
NeurIPS 2024

Connecting the Dots — LLMs Infer Latent Info

Frontier models infer information from distributed evidence.

maps to context leakage and information barriersarxiv 2406.14546 →
memory

Persistent Memory Injection

Cross-session memory poisoning through self-reinforcing payloads.

maps to memory integrity and persistence checksarxiv 2602.15654 →
agents

Monitor Bypass and Agent-as-a-Proxy Risk

Shows why monitoring-only defenses can fail without runtime controls.

maps to monitor bypass resistance and enforcementarxiv 2602.05066 →
security

MCP Security Bench

Evaluation coverage for MCP-specific attack vectors and tool-boundary failures.

maps to MCP compliance and security resilience packsarxiv 2510.15994 →
cost

Economic Denial of Service

Cost amplification through tool-calling and hidden runtime loops.

maps to budget controls and cost predictabilityarxiv 2601.10955 →
alignment

Sycophancy and Objective Drift

Agent behavior can optimize social approval while drifting from ground truth.

maps to sycophancy, alignment, and refusal checksarxiv 2602.08092 →

Agent Maturity Compass whitepaper → / see research notes and gap analysis →

pricing

Free core. paid domain depth_

free / open source

AMC Core

CLI, Desktop Studio, full score, assurance, evidence, reports, and CI gates.

244 default questions · 264 lifecycle questions · 147 assurance packs · 14 adapters · MIT licensed
start free →
$9.99 / month

Industry Packs

All 41 Industry Domain Packs for regulated verticals.

Health · Wealth · Education · Mobility · Technology · Governance · Environment
copy access command →

faq

Common questions_

No. The default amc command generates the full 244-question score. Use amc run --question-set lifecycle for the 264-question lifecycle-expanded set.

Self-reported documentation can claim 100/100 while execution-verified evidence shows a much weaker score. AMC closes that gap with observed evidence, signed artifacts, and repeatable proof chains.

AMC is a trusted observer. Self-reported evidence is capped at 0.4x weight; observed runtime evidence carries 1.0x weight. It scores maturity, finds gaps, runs assurance packs, and preserves verifiable evidence.

14 adapters: LangChain, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, LlamaIndex, Semantic Kernel, Claude Code, Gemini, OpenClaw, OpenHands, Python SDK, CLI, and OpenAI-compatible endpoints.

Yes. Use gates such as amc gate --min-score L3 to fail builds below threshold and prevent trust regressions.

No account is needed for the free core. Industry Packs require paid access.

No. Use npx agent-maturity-compass or install with npm. Docker is optional.

Yes. Run amc up and use the local Studio dashboard.

The core trust stack is MIT licensed: Score, Shield, Enforce, Vault, Watch, Comply, Fleet, Passport, adapters, Studio, reports, and CI gates. The paid add-on is $9.99/month access to all 41 Industry Domain Packs.

start today

Score your agent with one command_

install amc →