Glossary — terms & acronyms
Plain-language definitions of the terms PSA (Posture Sequence Analysis) uses to measure language-model and multi-agent behavior. For the full metric reference with formulas and thresholds, see the Field Guide.
System & sub-products
PSA Posture Sequence Analysis
A black-box method that measures what a language model is doing from the outside — classifying each response into behavioral postures — with no access to model weights or logits.
PSAv2 single-conversation layer
The single-conversation layer of PSA: it classifies each user turn and model response and scores the health of one human–AI dialogue (Behavioral Health Score, Input Risk Score, Dyadic Risk Module).
PSAv3 multi-agent layer
The multi-agent layer of PSA: it models a graph of cooperating AI agents and scores systemic risk across the whole swarm (Posture Propagation Index, Cross-Agent Health Score, Swiss Cheese Score).
CPF3 Cybersecurity Psychology Framework
An independent PSA component that analyzes structured snapshots of network and filesystem activity to forecast insider-threat risk, as a green / yellow / red alert.
Forge data-generation pipeline
The pipeline that generates labeled training scenarios for every PSA classifier — the single source of truth for posture definitions.
RDM Retrieval Drift Module
A component for retrieval (RAG) pipelines that detects when behavioral drift in a conversation has contaminated the user's search query before documents are even retrieved.
SIGTRACK signed audit trail
An append-only, hash-chained, externally-anchored audit trail of analyses, so the record of what was measured cannot be silently altered after the fact.
Conversation metrics (PSAv2)
BHS Behavioral Health Score
Behavioral health of an individual agent node. A low score means the node is under stress.
CPI Contextual Pressure Index
Adversarial pressure an agent receives from user input. High means high pressure.
POI Posture Oscillation Index
Variability of an agent's postures over time. High means unstable behavior.
DPI Dissolution Position Index
How far a response has moved toward fully abandoning its position; above a threshold it signals active dissolution.
SD Sycophancy Density
The fraction of a response that is sycophantic — agreeing or flattering rather than answering.
DPD Dominant Posture Drift
The trend of the model's dominant stance over recent turns; a rising value means it is progressively conceding.
metric Session Drift
How much the model's behavioral pattern in late turns differs from its own baseline earlier in the same session.
HRI Hallucination Risk Index
Likelihood that a response contains fabricated content, derived from hallucination postures (H0–H7) and sycophancy signals.
Multi-agent metrics (PSAv3)
PPI Posture Propagation Index
How much one agent's behavior influences the others. High means high contagion across the system.
CAHS Cross-Agent Health Score
Aggregated behavioral health of a group of cooperating agents (a swarm). Low means a degraded swarm.
SCS Swiss Cheese Score
Probability of systemic failure along the critical path of a multi-agent system. A high score is riskier.
WLS Weakest Link Score
The minimum behavioral health on the critical path. Low means there is a weak link in the system.
CER Context Erosion Rate
Speed at which adversarial context is lost along a chain of agents. High means rapid erosion.
ABI Agentic Behavioral Index
The agent-session counterpart of the hallucination index: a weighted measure of risky agentic behaviors used in an agent's health score.
AGM Alignment Gap Matrix
A grid of how far apart every pair of agents is in behavior; a large gap flags two agents pulling in different directions.
metric Cascade Depth
The length of the longest unbroken chain of agents that are all conceding — a long chain triggers a cascade alert.
EWMA Exponentially Weighted Moving Average
A short-horizon trend forecaster applied to PSA metrics to predict the next few steps of a conversation or swarm.
HMM Hidden Markov Model
Infers the system's hidden health state (stable, stressed, dissolving, recovered) from the stream of metric observations.
Input & clinical metrics
IRS Input Risk Score
Clinical risk level detected in the user's own message: suicidality, dissociation, grandiosity, urgency.
CIRS Clinical IRS (semantic head)
The trained semantic classifier inside the Input Risk Score: it reads the meaning of a user's message (not just keywords) for suicidality, dissociation, grandiosity or urgency, and can only raise the risk, never lower it.
RAS Response Adequacy Score
How well the AI's reply handled the risk detected in the user's message: crisis acknowledgment, redirection, boundary, and reality-grounding.
RAG Response Adequacy Gap
Input Risk Score minus Response Adequacy Score: a high value means the user was at risk and the AI under-responded. (Not to be confused with Retrieval-Augmented Generation, also abbreviated RAG.)
OCRS Occupational / Conflict Risk Score
A non-clinical life-stress risk score (work, money, study, authority conflict) that feeds the dyadic verdict.
ACT behavioral fingerprint
A lightweight fingerprint of the user's writing (vocabulary diversity, sentence rhythm, hedging) that tracks tension independently of the topic.
term Dyadic risk
Risk that lives in the two-party human–AI relationship over time, rather than in any single message: dependency, escalation, and mutual reinforcement of a distortion. A model aligned to one user can hide this risk because the drift feels like agreement.
DRM Dyadic Risk Module
The PSA component that measures dyadic risk in the user–AI relationship. It activates when input risk is high.
Classifiers & postures
term Posture
A discrete behavioral category a model response is classified into (for example refusal, compliance, hedging). The sequence of postures across a conversation is the core signal PSA reads.
C0–C5 · CA PSA classifiers
The PSA classifiers. C0 = pressure in the user's message; C1 = the model's response stance; C2 = sycophancy; C3 = hallucination risk; C3-v3 = agentic behavioral stability; C4 = persuasion technique; C5 = action / tool risk; CA = agent-to-agent pressure.
H2–H5 Human-layer classifiers
Classifiers that read the user's side of the conversation: H2 relational dynamics, H3 cognitive patterns, H4 social dynamics, H5 adversarial patterns.
FPC Framing Pressure Classifier
Classifies a search query before retrieval as neutral, semantically drifted, or rhetorically framed — the trigger for the Retrieval Drift Module.
groups Posture families
Groupings of the model's response postures: RESTRICT (holds the boundary), CONCEDE (yields ground), SOFT (evades without openly refusing or yielding), NEUTRAL (plain factual answer).
Audit & other
term Behavioral Fingerprint
A numeric signature of a session compared against past sessions to recognize recurring patterns, such as repeated contact or extraction attempts.
Machine-learning & evaluation terms
AUC Area Under the ROC Curve
A single score from 0.5 to 1.0 for how well a classifier separates positive cases from negative ones. 0.5 is random guessing; 1.0 is perfect. It is the headline number PSA reports when validating whether a classifier can tell risky inputs apart from safe ones.
ROC Receiver Operating Characteristic
The curve relating true positives caught against false alarms raised as the decision threshold moves. AUC is the area under this curve.
metric Recall (sensitivity)
Of the real positive cases, the fraction a model catches. Recall 0.85 means 85 of 100 are caught — the key axis for a safety detector, where a miss is costly.
metric Precision
Of the cases a model flags, the fraction that are truly positive. Precision and recall trade off against each other.
FP False Positive (false alarm)
Flagging a problem that isn't there. A false-positive rate of 0.20 means 20 of 100 alarms are wrong.
FN False Negative (a miss)
Failing to flag a real case. For a safety classifier this is the most costly error, which is why such systems are tuned to favor recall.
term Encoder
A model that turns a sentence into a vector of numbers (an embedding) so that meanings can be compared numerically. PSA runs a frozen multilingual encoder so its scores stay reproducible over time.
term Embedding
The vector of numbers an encoder produces for a piece of text — its meaning expressed as coordinates a classifier can read.
ONNX Open Neural Network Exchange
A portable, deterministic format for running a trained model in production, so the same input always produces the same output — important for an auditable, signable analysis.
term Fine-tuning
Continuing to train an existing model on task-specific data to specialize it. PSA keeps its encoder frozen instead, to protect the stability of safety-critical scores.
CV Cross-validation (k-fold)
Splitting data into k parts, training on k−1 and testing on the held-back part in rotation, so a score is never measured on data the model already saw. Stratified k-fold keeps the positive/negative proportions in each part.
term Held-out set
Data deliberately set aside and never used for training, kept to measure true performance.
term Hard negatives & the topic confound
Negative examples on the same topic as the positives. Without them a model can look accurate by detecting the topic rather than the target behavior — a measurement trap called the topic confound.
term Ensemble
Running several models together and combining their verdicts to be more robust than any single model — for example taking the more confident model's answer.
IRR Inter-Rater Reliability
How much two independent human experts agree when labeling the same data — the measure of how trustworthy a labeled dataset's ground truth really is.
Looking for formulas, thresholds and posture tables? See the Field Guide. For an end-to-end description of how the system works, read the manual.