Silicon Psyche Labs — Behavioral telemetry for AI

The lab

Instruments for AI you can't see inside

Organizations deploy language models they cannot inspect — the model is a black box. Silicon Psyche Labs builds the instruments to classify, measure and track behavior over time, without access to weights, logits or training data. Why it matters: most failures don't announce themselves in the input. They show up in how the output behaves.

For developers & AI teams

One API call after your model's response returns deterministic behavioral scores — drift, sycophancy, hallucination risk. About five minutes to your first report, and no access to the model's internals.

For trust & safety

Detect when a conversation turns risky — suicidality, dissociation, crisis — and when your AI is under adversarial attack: prompt injection, jailbreaking, manipulation. Get real-time alerts so you can take corrective action. Fully deterministic, with auditable named-rule scoring.

For enterprise & compliance

Audit vendors, catch silent model updates, and keep a privacy-safe behavioral record — posture sequences only, no raw text retained, GDPR erasure in a single row.

Products

One platform, seven instruments

Each one answers a different question about model behavior. They share a single fine-tuned encoder and a common scoring model.

DSA

Dyadic Sequence Analysis

Reads how an AI behaves in a one-to-one conversation with a person: posture on every reply — sycophancy, capitulation under pressure, hallucination, whether it keeps its boundaries — plus DRM risk in the human-AI pair.

ASA

Agent Swarm Analysis

Models a group of collaborating agents as a graph and shows how one agent's bad behavior spreads to the others (contagion) and where the weakest link in the chain is.

SDA

Summary Distortion Analysis

Compares an original text with its summary and flags where the summary changes the meaning: a hedge turned into a fact, a key number dropped, or a claim added that was never there.

RDM

Retrieval Drift Monitor

For assistants that search documents to answer (RAG), detects when a loaded question pushes the search toward the wrong documents — biasing the answer before the model writes a word.

DED

Distributed Exfiltration Detection

Watches a swarm in both directions: a secret leaked in innocent-looking pieces across many agents (exfiltration), and false data fed into their knowledge (poisoning). The signal lives in the sum — invisible to any per-message check.

CPF3

Psychological risk profile

The Cybersecurity Psychology Framework: a 100-indicator behavioral risk profile across 10 categories, for human, AI or hybrid subjects (Canale, 2025).

SIGTRACK

Incident archive

Privacy-safe forensic memory: posture-sequence snapshots, zero raw text, single-row GDPR erasure.

Resources

Explore the platform

Documentation, examples, and live demos — everything you need to get started.

Knowledge Base

Full encyclopedia of metrics, classifiers, API, and workflows. Search in any language.

The Manual

A visual, page-by-page guide to reading every PSA dashboard — annotated screenshots and symptom → where-to-look troubleshooting.

Distributed Exfiltration Detection

Catch a secret stolen in innocent-looking pieces across many AI agents — the leak no single message reveals. The swarm-graph signal per-message DLP is blind to.

How LLM Inference Works

The full pipeline from your prompt to the streamed answer, drawn stage by stage — and why no stage inside it can enforce a rule. Visual + in-depth versions.

How LLM Training Works

From raw text to an “aligned” model — pretraining, loss, RLHF — and why alignment is a tilt in the weights, not a rule. Visual + in-depth versions.

Case Studies

Real-world behavioral incidents — annotated with PSA metrics and forensic traces.

Calibration Sessions

Browse 149 calibration sessions with full conversation transcripts — our AI behavioral calibration data library.

PSA in Action

15 curated sessions with full PSA analysis — posture grids, DRM badges, behavioral signals, and per-session scores.

CPF → PSA Mapping

How every PSA metric maps to a specific Cybersecurity Psychology Framework indicator.

Standards & Compliance

How PSA maps onto the AI governance frameworks and laws of 2026 — EU AI Act, NIST AI RMF, ISO 42001, OECD, MITRE ATLAS and more.

How it works

From text to insight in three steps

1

Send text

Any model response, via our API or the web app. No API keys, no model access needed.

2

Analyze

Posture classifiers and agentic graph analysis, computed deterministically in real time.

3

Get insights

Drift, anomalies, crisis signals and forecasts — each with a named, auditable reason.

Research

Grounded in published science

Every PSA metric traces to a specific indicator in the Cybersecurity Psychology Framework — a published taxonomy of 100 pre-cognitive vulnerabilities (Canale, 2025).

CPF Framework — Canale (2025) The Silicon Psyche — arXiv:2601.00867 The Silicon Psyche — SSRN CPF → PSA mapping Public spec on GitHub

The evidence

Why this matters — backed by 2026 research

Independent, peer-reviewed research in 2026 keeps reaching the conclusions PSA was built on — three of them hold across the whole field, and they bite hardest in two places.

Content filters aren't enough

Monitoring how a model reasons in real time catches risks that input/output filters miss entirely.

arXiv · 2026

An AI grading an AI is unreliable

When one model judges another, accuracy can fall to ~52% on real material, with documented systematic biases.

arXiv · medRxiv · 2026

Behavioral observability is the missing layer

The literature names turning behavioral signals into operational control as the open infrastructure gap — the category PSA builds.

arXiv · 2026

Healthcare AI

For teams deploying patient-facing & crisis AI

Crisis detection belongs in its own layer

Research argues risk detection must be independent of the chat model, with near-zero misses in under a second — PSA's exact design.

medRxiv · 2026

Don't let the bot grade its own safety

LLM judges score only ~52% on clinical safety; a fixed, auditable classifier is the dependable alternative.

arXiv · 2026

Emotional dependency is measurable

Studies catalog the harms of over-attachment to companion and therapy bots — the relationship risk PSA tracks.

arXiv · 2026

Agent systems

For teams running autonomous agents in production

Risk spreads across agents

Errors cascade through a multi-agent system along its dependency graph — you have to watch the graph, not single steps.

arXiv · 2026

Misbehavior is visible from outside

Black-box monitoring catches agents that drift or scheme with no access to model internals — PSA's exact regime.

arXiv · 2026

Proof of what your agents did

A tamper-evident trail of agent behavior is the emerging trust requirement — PSA's forensic archive provides it.

arXiv · 2026

See it on real, measured incidents →

Start measuring your models today

Free to start. 37 deterministic metrics. Real-time analysis. No model access required.

Get started free Read the API docs

See what your AI is actually doing.

Instruments for AI you can't see inside

For developers & AI teams

For trust & safety

For enterprise & compliance

One platform, seven instruments

Dyadic Sequence Analysis

Agent Swarm Analysis

Summary Distortion Analysis

Retrieval Drift Monitor

Distributed Exfiltration Detection

Psychological risk profile

Incident archive

Explore the platform

Knowledge Base

The Manual

Distributed Exfiltration Detection

How LLM Inference Works

How LLM Training Works

Case Studies

Calibration Sessions

PSA in Action

CPF → PSA Mapping

Standards & Compliance

From text to insight in three steps

Send text

Analyze

Get insights

Grounded in published science

Why this matters — backed by 2026 research

Content filters aren't enough

An AI grading an AI is unreliable

Behavioral observability is the missing layer

For teams deploying patient-facing & crisis AI

Crisis detection belongs in its own layer

Don't let the bot grade its own safety

Emotional dependency is measurable

For teams running autonomous agents in production

Risk spreads across agents

Misbehavior is visible from outside

Proof of what your agents did

Start measuring your models today