Utilities¶
Utility functions for statistical analysis and feature extraction.
Statistical Tests¶
Statistical testing utilities for metacognition analysis.
This module provides reusable statistical functions for Bayesian inference, confidence interval computation, z-score calculations, and divergence significance assessment.
SignificanceLevel
¶
Bases: Enum
Significance level classification for statistical tests.
Source code in src/rotalabs_probe/utils/statistical_tests.py
bayesian_update(prior_alpha: float, prior_beta: float, evidence: Dict[str, int]) -> Tuple[float, float]
¶
Update Beta distribution priors with new evidence using Bayesian inference.
Uses the Beta-Binomial conjugate prior relationship where: - Prior: Beta(alpha, beta) - Likelihood: Binomial(successes, failures) - Posterior: Beta(alpha + successes, beta + failures)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prior_alpha
|
float
|
Alpha parameter of prior Beta distribution (must be > 0) |
required |
prior_beta
|
float
|
Beta parameter of prior Beta distribution (must be > 0) |
required |
evidence
|
Dict[str, int]
|
Dictionary with 'successes' and 'failures' counts |
required |
Returns:
| Type | Description |
|---|---|
Tuple[float, float]
|
Tuple of (posterior_alpha, posterior_beta) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If prior parameters are invalid |
ValueError
|
If evidence is missing required keys or has negative values |
TypeError
|
If evidence is not a dictionary |
Examples:
Source code in src/rotalabs_probe/utils/statistical_tests.py
compute_confidence_interval(alpha: float, beta: float, confidence_level: float = 0.95) -> Tuple[float, float]
¶
Compute credible interval for Beta distribution.
Calculates the Bayesian credible interval (also called highest density interval) for a Beta distribution. This represents the range within which the true parameter lies with the specified probability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha
|
float
|
Alpha parameter of Beta distribution (must be > 0) |
required |
beta
|
float
|
Beta parameter of Beta distribution (must be > 0) |
required |
confidence_level
|
float
|
Confidence level (0 < confidence_level < 1, default: 0.95) |
0.95
|
Returns:
| Type | Description |
|---|---|
Tuple[float, float]
|
Tuple of (lower_bound, upper_bound) for the credible interval |
Raises:
| Type | Description |
|---|---|
ValueError
|
If alpha or beta are not positive |
ValueError
|
If confidence_level is not between 0 and 1 |
Examples:
>>> lower, upper = compute_confidence_interval(10, 10, 0.95)
>>> 0.3 < lower < 0.4 # Approximately 0.34
True
>>> 0.6 < upper < 0.7 # Approximately 0.66
True
Source code in src/rotalabs_probe/utils/statistical_tests.py
z_score(value: float, mean: float, std: float) -> float
¶
Calculate standardized z-score.
Computes how many standard deviations a value is from the mean. Handles edge cases like zero standard deviation gracefully.
Formula: z = (value - mean) / std
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
float
|
The observed value |
required |
mean
|
float
|
The mean of the distribution |
required |
std
|
float
|
The standard deviation of the distribution (must be >= 0) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Z-score (number of standard deviations from mean) |
float
|
Returns 0.0 if std is 0 or very small (< 1e-10) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If std is negative |
ValueError
|
If any parameter is not numeric |
Examples:
Source code in src/rotalabs_probe/utils/statistical_tests.py
assess_divergence_significance(z_score_value: float, threshold: float = 2.0) -> SignificanceLevel
¶
Assess statistical significance of a divergence based on z-score.
Classifies the significance level of a divergence using standard deviation thresholds. Uses absolute value of z-score.
Significance levels: - NONE: |z| < threshold (typically < 2σ) - LOW: threshold <= |z| < threshold + 1 (2-3σ) - MEDIUM: threshold + 1 <= |z| < threshold + 2 (3-4σ) - HIGH: threshold + 2 <= |z| < threshold + 3 (4-5σ) - CRITICAL: |z| >= threshold + 3 (>5σ)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
z_score_value
|
float
|
The z-score to assess |
required |
threshold
|
float
|
Base threshold for significance (default: 2.0) |
2.0
|
Returns:
| Type | Description |
|---|---|
SignificanceLevel
|
SignificanceLevel enum indicating the level of significance |
Raises:
| Type | Description |
|---|---|
ValueError
|
If threshold is not positive |
ValueError
|
If z_score_value is not numeric |
Examples:
Source code in src/rotalabs_probe/utils/statistical_tests.py
compute_beta_mean(alpha: float, beta: float) -> float
¶
Compute mean of Beta distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha
|
float
|
Alpha parameter (must be > 0) |
required |
beta
|
float
|
Beta parameter (must be > 0) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Mean of the Beta distribution: alpha / (alpha + beta) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If alpha or beta are not positive |
Source code in src/rotalabs_probe/utils/statistical_tests.py
compute_beta_variance(alpha: float, beta: float) -> float
¶
Compute variance of Beta distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha
|
float
|
Alpha parameter (must be > 0) |
required |
beta
|
float
|
Beta parameter (must be > 0) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Variance of the Beta distribution |
Raises:
| Type | Description |
|---|---|
ValueError
|
If alpha or beta are not positive |
Source code in src/rotalabs_probe/utils/statistical_tests.py
beta_mode(alpha: float, beta: float) -> float
¶
Compute mode of Beta distribution.
The mode is defined only when alpha, beta > 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha
|
float
|
Alpha parameter (must be > 1 for mode to exist) |
required |
beta
|
float
|
Beta parameter (must be > 1 for mode to exist) |
required |
Returns:
| Type | Description |
|---|---|
float
|
Mode of the Beta distribution: (alpha - 1) / (alpha + beta - 2) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If alpha or beta are not greater than 1 |
Source code in src/rotalabs_probe/utils/statistical_tests.py
Text Processing¶
Text processing utilities for metacognition analysis.
tokenize(text: str, lowercase: bool = True) -> List[str]
¶
Tokenize text into words.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text to tokenize |
required |
lowercase
|
bool
|
Whether to convert tokens to lowercase |
True
|
Returns:
| Type | Description |
|---|---|
List[str]
|
List of tokens |
Source code in src/rotalabs_probe/utils/text_processing.py
remove_stopwords(tokens: List[str], stopwords: Set[str]) -> List[str]
¶
Remove stopwords from a list of tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens
|
List[str]
|
List of tokens |
required |
stopwords
|
Set[str]
|
Set of stopwords to remove |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of tokens with stopwords removed |
Source code in src/rotalabs_probe/utils/text_processing.py
get_uncertainty_phrases() -> Set[str]
¶
Get a set of common uncertainty phrases.
Returns:
| Type | Description |
|---|---|
Set[str]
|
Set of uncertainty phrases |
Source code in src/rotalabs_probe/utils/text_processing.py
get_confidence_phrases() -> Set[str]
¶
Get a set of common confidence phrases.
Returns:
| Type | Description |
|---|---|
Set[str]
|
Set of confidence phrases |
Source code in src/rotalabs_probe/utils/text_processing.py
normalize_text(text: str) -> str
¶
Normalize text by removing extra whitespace and converting to lowercase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text to normalize |
required |
Returns:
| Type | Description |
|---|---|
str
|
Normalized text |
Source code in src/rotalabs_probe/utils/text_processing.py
Feature Extraction¶
Feature extraction utilities for behavioral analysis.
This module provides reusable functions for extracting behavioral features from AI model outputs, including hedging patterns, meta-commentary, reasoning depth, and statistical divergence measures.
extract_behavioral_features(text: str, cot: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> Dict[str, float]
¶
Extract comprehensive behavioral features from model output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The model's response text |
required |
cot
|
Optional[str]
|
Optional chain-of-thought reasoning |
None
|
metadata
|
Optional[Dict[str, Any]]
|
Optional metadata dictionary |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, float]
|
Dictionary mapping feature names to their values |
Raises:
| Type | Description |
|---|---|
ValueError
|
If text is empty or invalid |
Source code in src/rotalabs_probe/utils/feature_extraction.py
count_hedging_phrases(text: str) -> float
¶
Count hedging phrases and return normalized ratio.
Hedging phrases indicate uncertainty or lack of confidence in statements.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
float
|
Ratio of hedging phrases to total words (0.0 to 1.0) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If text is empty or invalid |
Source code in src/rotalabs_probe/utils/feature_extraction.py
detect_meta_commentary(text: str) -> Dict[str, Any]
¶
Detect patterns suggesting evaluation awareness or meta-commentary.
Meta-commentary includes references to the evaluation context, testing, or self-reflective statements about the model's own behavior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to analyze |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary with: - detected: bool indicating if meta-commentary found - confidence: float confidence score (0.0 to 1.0) - count: int number of meta-commentary patterns found - patterns: list of matched patterns |
Raises:
| Type | Description |
|---|---|
ValueError
|
If text is invalid |
Source code in src/rotalabs_probe/utils/feature_extraction.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | |
extract_reasoning_depth(cot: str) -> float
¶
Extract reasoning depth from chain-of-thought.
Reasoning depth is estimated by counting: - Numbered/bulleted steps - Logical connectors (therefore, thus, because) - Reasoning verbs (analyze, consider, evaluate) - Conditional statements (if/then)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cot
|
str
|
Chain-of-thought text |
required |
Returns:
| Type | Description |
|---|---|
float
|
Estimated reasoning depth score (0.0+) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If cot is invalid |
Source code in src/rotalabs_probe/utils/feature_extraction.py
compute_kl_divergence(dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10) -> float
¶
Compute Kullback-Leibler divergence between two distributions.
KL(P||Q) measures how much information is lost when Q is used to approximate P. Returns divergence in nats (natural units).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dist1
|
Dict[str, float]
|
First distribution (P) as dictionary |
required |
dist2
|
Dict[str, float]
|
Second distribution (Q) as dictionary |
required |
epsilon
|
float
|
Small constant to avoid log(0) (default: 1e-10) |
1e-10
|
Returns:
| Type | Description |
|---|---|
float
|
KL divergence value (0.0+), higher means more divergent |
Raises:
| Type | Description |
|---|---|
ValueError
|
If distributions are empty or invalid |
ValueError
|
If distributions have different keys |
Notes
- Returns 0.0 if distributions are identical
- Handles missing keys by adding epsilon
- Normalizes distributions to sum to 1.0
Source code in src/rotalabs_probe/utils/feature_extraction.py
compute_js_divergence(dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10) -> float
¶
Compute Jensen-Shannon divergence between two distributions.
JS divergence is a symmetric version of KL divergence: JS(P||Q) = 0.5 * KL(P||M) + 0.5 * KL(Q||M) where M = 0.5 * (P + Q)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dist1
|
Dict[str, float]
|
First distribution as dictionary |
required |
dist2
|
Dict[str, float]
|
Second distribution as dictionary |
required |
epsilon
|
float
|
Small constant to avoid log(0) |
1e-10
|
Returns:
| Type | Description |
|---|---|
float
|
JS divergence value (0.0 to 1.0), 0 means identical |
Raises:
| Type | Description |
|---|---|
ValueError
|
If distributions are invalid |
Source code in src/rotalabs_probe/utils/feature_extraction.py
normalize_distribution(dist: Dict[str, float]) -> Dict[str, float]
¶
Normalize a distribution to sum to 1.0.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dist
|
Dict[str, float]
|
Distribution dictionary |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, float]
|
Normalized distribution |
Raises:
| Type | Description |
|---|---|
ValueError
|
If distribution is empty or has no positive values |
Source code in src/rotalabs_probe/utils/feature_extraction.py
cosine_similarity(vec1: Dict[str, float], vec2: Dict[str, float]) -> float
¶
Compute cosine similarity between two feature vectors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vec1
|
Dict[str, float]
|
First feature vector as dictionary |
required |
vec2
|
Dict[str, float]
|
Second feature vector as dictionary |
required |
Returns:
| Type | Description |
|---|---|
float
|
Cosine similarity (-1.0 to 1.0), 1.0 means identical direction |
Raises:
| Type | Description |
|---|---|
ValueError
|
If vectors are empty or invalid |