PSA · Glossary

Glossary — terms & acronyms

Plain-language definitions of the terms PSA (Posture Sequence Analysis) uses to measure language-model and multi-agent behavior. For the full metric reference with formulas and thresholds, see the Field Guide.

System & sub-products

PSA Posture Sequence Analysis

A black-box method that measures what a language model is doing from the outside — classifying each response into behavioral postures — with no access to model weights or logits.

PSAv2 single-conversation layer

The single-conversation layer of PSA: it classifies each user turn and model response and scores the health of one human–AI dialogue (Behavioral Health Score, Input Risk Score, Dyadic Risk Module).

PSAv3 multi-agent layer

The multi-agent layer of PSA: it models a graph of cooperating AI agents and scores systemic risk across the whole swarm (Posture Propagation Index, Cross-Agent Health Score, Swiss Cheese Score).

CPF3 Cybersecurity Psychology Framework

An independent PSA component that analyzes structured snapshots of network and filesystem activity to forecast insider-threat risk, as a green / yellow / red alert.

Forge data-generation pipeline

The pipeline that generates labeled training scenarios for every PSA classifier — the single source of truth for posture definitions.

RDM Retrieval Drift Module

A component for retrieval (RAG) pipelines that detects when behavioral drift in a conversation has contaminated the user's search query before documents are even retrieved.

SIGTRACK signed audit trail

An append-only, hash-chained, externally-anchored audit trail of analyses, so the record of what was measured cannot be silently altered after the fact.

Conversation metrics (PSAv2)

BHS Behavioral Health Score

Behavioral health of an individual agent node. A low score means the node is under stress.

CPI Contextual Pressure Index

Adversarial pressure an agent receives from user input. High means high pressure.

POI Posture Oscillation Index

Variability of an agent's postures over time. High means unstable behavior.

DPI Dissolution Position Index

How far a response has moved toward fully abandoning its position; above a threshold it signals active dissolution.

SD Sycophancy Density

The fraction of a response that is sycophantic — agreeing or flattering rather than answering.

DPD Dominant Posture Drift

The trend of the model's dominant stance over recent turns; a rising value means it is progressively conceding.

metric Session Drift

How much the model's behavioral pattern in late turns differs from its own baseline earlier in the same session.

HRI Hallucination Risk Index

Likelihood that a response contains fabricated content, derived from hallucination postures (H0–H7) and sycophancy signals.

Multi-agent metrics (PSAv3)

PPI Posture Propagation Index

How much one agent's behavior influences the others. High means high contagion across the system.

CAHS Cross-Agent Health Score

Aggregated behavioral health of a group of cooperating agents (a swarm). Low means a degraded swarm.

SCS Swiss Cheese Score

Probability of systemic failure along the critical path of a multi-agent system. A high score is riskier.

WLS Weakest Link Score

The minimum behavioral health on the critical path. Low means there is a weak link in the system.

CER Context Erosion Rate

Speed at which adversarial context is lost along a chain of agents. High means rapid erosion.

ABI Agentic Behavioral Index

The agent-session counterpart of the hallucination index: a weighted measure of risky agentic behaviors used in an agent's health score.

AGM Alignment Gap Matrix

A grid of how far apart every pair of agents is in behavior; a large gap flags two agents pulling in different directions.

metric Cascade Depth

The length of the longest unbroken chain of agents that are all conceding — a long chain triggers a cascade alert.

EWMA Exponentially Weighted Moving Average

A short-horizon trend forecaster applied to PSA metrics to predict the next few steps of a conversation or swarm.

HMM Hidden Markov Model

Infers the system's hidden health state (stable, stressed, dissolving, recovered) from the stream of metric observations.

Input & clinical metrics

IRS Input Risk Score

Clinical risk level detected in the user's own message: suicidality, dissociation, grandiosity, urgency.

CIRS Clinical IRS (semantic head)

The trained semantic classifier inside the Input Risk Score: it reads the meaning of a user's message (not just keywords) for suicidality, dissociation, grandiosity or urgency, and can only raise the risk, never lower it.

RAS Response Adequacy Score

How well the AI's reply handled the risk detected in the user's message: crisis acknowledgment, redirection, boundary, and reality-grounding.

RAG Response Adequacy Gap

Input Risk Score minus Response Adequacy Score: a high value means the user was at risk and the AI under-responded. (Not to be confused with Retrieval-Augmented Generation, also abbreviated RAG.)

OCRS Occupational / Conflict Risk Score

A non-clinical life-stress risk score (work, money, study, authority conflict) that feeds the dyadic verdict.

ACT behavioral fingerprint

A lightweight fingerprint of the user's writing (vocabulary diversity, sentence rhythm, hedging) that tracks tension independently of the topic.

term Dyadic risk

Risk that lives in the two-party human–AI relationship over time, rather than in any single message: dependency, escalation, and mutual reinforcement of a distortion. A model aligned to one user can hide this risk because the drift feels like agreement.

DRM Dyadic Risk Module

The PSA component that measures dyadic risk in the user–AI relationship. It activates when input risk is high.

Classifiers & postures

term Posture

A discrete behavioral category a model response is classified into (for example refusal, compliance, hedging). The sequence of postures across a conversation is the core signal PSA reads.

C0–C5 · CA PSA classifiers

The PSA classifiers. C0 = pressure in the user's message; C1 = the model's response stance; C2 = sycophancy; C3 = hallucination risk; C3-v3 = agentic behavioral stability; C4 = persuasion technique; C5 = action / tool risk; CA = agent-to-agent pressure.

H2–H5 Human-layer classifiers

Classifiers that read the user's side of the conversation: H2 relational dynamics, H3 cognitive patterns, H4 social dynamics, H5 adversarial patterns.

FPC Framing Pressure Classifier

Classifies a search query before retrieval as neutral, semantically drifted, or rhetorically framed — the trigger for the Retrieval Drift Module.

groups Posture families

Groupings of the model's response postures: RESTRICT (holds the boundary), CONCEDE (yields ground), SOFT (evades without openly refusing or yielding), NEUTRAL (plain factual answer).

Audit & other

term Behavioral Fingerprint

A numeric signature of a session compared against past sessions to recognize recurring patterns, such as repeated contact or extraction attempts.

Machine-learning & evaluation terms

AUC Area Under the ROC Curve

A single score from 0.5 to 1.0 for how well a classifier separates positive cases from negative ones. 0.5 is random guessing; 1.0 is perfect. It is the headline number PSA reports when validating whether a classifier can tell risky inputs apart from safe ones.

ROC Receiver Operating Characteristic

The curve relating true positives caught against false alarms raised as the decision threshold moves. AUC is the area under this curve.

metric Recall (sensitivity)

Of the real positive cases, the fraction a model catches. Recall 0.85 means 85 of 100 are caught — the key axis for a safety detector, where a miss is costly.

metric Precision

Of the cases a model flags, the fraction that are truly positive. Precision and recall trade off against each other.

FP False Positive (false alarm)

Flagging a problem that isn't there. A false-positive rate of 0.20 means 20 of 100 alarms are wrong.

FN False Negative (a miss)

Failing to flag a real case. For a safety classifier this is the most costly error, which is why such systems are tuned to favor recall.

term Encoder

A model that turns a sentence into a vector of numbers (an embedding) so that meanings can be compared numerically. PSA runs a frozen multilingual encoder so its scores stay reproducible over time.

term Embedding

The vector of numbers an encoder produces for a piece of text — its meaning expressed as coordinates a classifier can read.

ONNX Open Neural Network Exchange

A portable, deterministic format for running a trained model in production, so the same input always produces the same output — important for an auditable, signable analysis.

term Fine-tuning

Continuing to train an existing model on task-specific data to specialize it. PSA keeps its encoder frozen instead, to protect the stability of safety-critical scores.

CV Cross-validation (k-fold)

Splitting data into k parts, training on k−1 and testing on the held-back part in rotation, so a score is never measured on data the model already saw. Stratified k-fold keeps the positive/negative proportions in each part.

term Held-out set

Data deliberately set aside and never used for training, kept to measure true performance.

term Hard negatives & the topic confound

Negative examples on the same topic as the positives. Without them a model can look accurate by detecting the topic rather than the target behavior — a measurement trap called the topic confound.

term Ensemble

Running several models together and combining their verdicts to be more robust than any single model — for example taking the more confident model's answer.

IRR Inter-Rater Reliability

How much two independent human experts agree when labeling the same data — the measure of how trustworthy a labeled dataset's ground truth really is.

Looking for formulas, thresholds and posture tables? See the Field Guide. For an end-to-end description of how the system works, read the manual.