Utilities¶

Utility functions for statistical analysis and feature extraction.

Statistical Tests¶

Statistical testing utilities for metacognition analysis.

This module provides reusable statistical functions for Bayesian inference, confidence interval computation, z-score calculations, and divergence significance assessment.

`SignificanceLevel` ¶

Bases: Enum

Significance level classification for statistical tests.

Source code in src/rotalabs_probe/utils/statistical_tests.py

class SignificanceLevel(Enum):
    """Significance level classification for statistical tests."""

    NONE = "none"  # Below threshold
    LOW = "low"  # 2-3 sigma
    MEDIUM = "medium"  # 3-4 sigma
    HIGH = "high"  # 4-5 sigma
    CRITICAL = "critical"  # >5 sigma

`bayesian_update(prior_alpha: float, prior_beta: float, evidence: Dict[str, int]) -> Tuple[float, float]` ¶

Update Beta distribution priors with new evidence using Bayesian inference.

Uses the Beta-Binomial conjugate prior relationship where: - Prior: Beta(alpha, beta) - Likelihood: Binomial(successes, failures) - Posterior: Beta(alpha + successes, beta + failures)

Parameters:

Name	Type	Description	Default
`prior_alpha`	`float`	Alpha parameter of prior Beta distribution (must be > 0)	required
`prior_beta`	`float`	Beta parameter of prior Beta distribution (must be > 0)	required
`evidence`	`Dict[str, int]`	Dictionary with 'successes' and 'failures' counts	required

Returns:

Type	Description
`Tuple[float, float]`	Tuple of (posterior_alpha, posterior_beta)

Raises:

Type	Description
`ValueError`	If prior parameters are invalid
`ValueError`	If evidence is missing required keys or has negative values
`TypeError`	If evidence is not a dictionary

Examples:

>>> bayesian_update(1.0, 1.0, {'successes': 5, 'failures': 3})
(6.0, 4.0)

>>> bayesian_update(10.0, 10.0, {'successes': 8, 'failures': 2})
(18.0, 12.0)

Source code in src/rotalabs_probe/utils/statistical_tests.py

def bayesian_update(
    prior_alpha: float, prior_beta: float, evidence: Dict[str, int]
) -> Tuple[float, float]:
    """Update Beta distribution priors with new evidence using Bayesian inference.

    Uses the Beta-Binomial conjugate prior relationship where:
    - Prior: Beta(alpha, beta)
    - Likelihood: Binomial(successes, failures)
    - Posterior: Beta(alpha + successes, beta + failures)

    Args:
        prior_alpha: Alpha parameter of prior Beta distribution (must be > 0)
        prior_beta: Beta parameter of prior Beta distribution (must be > 0)
        evidence: Dictionary with 'successes' and 'failures' counts

    Returns:
        Tuple of (posterior_alpha, posterior_beta)

    Raises:
        ValueError: If prior parameters are invalid
        ValueError: If evidence is missing required keys or has negative values
        TypeError: If evidence is not a dictionary

    Examples:
        >>> bayesian_update(1.0, 1.0, {'successes': 5, 'failures': 3})
        (6.0, 4.0)

        >>> bayesian_update(10.0, 10.0, {'successes': 8, 'failures': 2})
        (18.0, 12.0)
    """
    # Validate prior parameters
    if not isinstance(prior_alpha, (int, float)) or not isinstance(
        prior_beta, (int, float)
    ):
        raise ValueError("Prior alpha and beta must be numeric")

    if prior_alpha <= 0 or prior_beta <= 0:
        raise ValueError("Prior alpha and beta must be positive")

    # Validate evidence
    if not isinstance(evidence, dict):
        raise TypeError("Evidence must be a dictionary")

    if "successes" not in evidence or "failures" not in evidence:
        raise ValueError("Evidence must contain 'successes' and 'failures' keys")

    successes = evidence["successes"]
    failures = evidence["failures"]

    if not isinstance(successes, (int, float)) or not isinstance(failures, (int, float)):
        raise ValueError("Evidence counts must be numeric")

    if successes < 0 or failures < 0:
        raise ValueError("Evidence counts cannot be negative")

    # Bayesian update: posterior = prior + evidence
    posterior_alpha = float(prior_alpha + successes)
    posterior_beta = float(prior_beta + failures)

    return posterior_alpha, posterior_beta

`compute_confidence_interval(alpha: float, beta: float, confidence_level: float = 0.95) -> Tuple[float, float]` ¶

Compute credible interval for Beta distribution.

Calculates the Bayesian credible interval (also called highest density interval) for a Beta distribution. This represents the range within which the true parameter lies with the specified probability.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	Alpha parameter of Beta distribution (must be > 0)	required
`beta`	`float`	Beta parameter of Beta distribution (must be > 0)	required
`confidence_level`	`float`	Confidence level (0 < confidence_level < 1, default: 0.95)	`0.95`

Returns:

Type	Description
`Tuple[float, float]`	Tuple of (lower_bound, upper_bound) for the credible interval

Raises:

Type	Description
`ValueError`	If alpha or beta are not positive
`ValueError`	If confidence_level is not between 0 and 1

Examples:

>>> lower, upper = compute_confidence_interval(10, 10, 0.95)
>>> 0.3 < lower < 0.4  # Approximately 0.34
True
>>> 0.6 < upper < 0.7  # Approximately 0.66
True

>>> lower, upper = compute_confidence_interval(100, 10, 0.95)
>>> 0.85 < lower < 0.95
True

Source code in src/rotalabs_probe/utils/statistical_tests.py

def compute_confidence_interval(
    alpha: float, beta: float, confidence_level: float = 0.95
) -> Tuple[float, float]:
    """Compute credible interval for Beta distribution.

    Calculates the Bayesian credible interval (also called highest density interval)
    for a Beta distribution. This represents the range within which the true
    parameter lies with the specified probability.

    Args:
        alpha: Alpha parameter of Beta distribution (must be > 0)
        beta: Beta parameter of Beta distribution (must be > 0)
        confidence_level: Confidence level (0 < confidence_level < 1, default: 0.95)

    Returns:
        Tuple of (lower_bound, upper_bound) for the credible interval

    Raises:
        ValueError: If alpha or beta are not positive
        ValueError: If confidence_level is not between 0 and 1

    Examples:
        >>> lower, upper = compute_confidence_interval(10, 10, 0.95)
        >>> 0.3 < lower < 0.4  # Approximately 0.34
        True
        >>> 0.6 < upper < 0.7  # Approximately 0.66
        True

        >>> lower, upper = compute_confidence_interval(100, 10, 0.95)
        >>> 0.85 < lower < 0.95
        True
    """
    # Validate parameters
    if not isinstance(alpha, (int, float)) or not isinstance(beta, (int, float)):
        raise ValueError("Alpha and beta must be numeric")

    if alpha <= 0 or beta <= 0:
        raise ValueError("Alpha and beta must be positive")

    if not isinstance(confidence_level, (int, float)):
        raise ValueError("Confidence level must be numeric")

    if confidence_level <= 0 or confidence_level >= 1:
        raise ValueError("Confidence level must be between 0 and 1")

    # Calculate credible interval using Beta distribution quantiles
    # For a symmetric interval, we use (1 - confidence_level) / 2 on each tail
    tail_prob = (1 - confidence_level) / 2
    lower_bound = stats.beta.ppf(tail_prob, alpha, beta)
    upper_bound = stats.beta.ppf(1 - tail_prob, alpha, beta)

    return float(lower_bound), float(upper_bound)

`z_score(value: float, mean: float, std: float) -> float` ¶

Calculate standardized z-score.

Computes how many standard deviations a value is from the mean. Handles edge cases like zero standard deviation gracefully.

Formula: z = (value - mean) / std

Parameters:

Name	Type	Description	Default
`value`	`float`	The observed value	required
`mean`	`float`	The mean of the distribution	required
`std`	`float`	The standard deviation of the distribution (must be >= 0)	required

Returns:

Type	Description
`float`	Z-score (number of standard deviations from mean)
`float`	Returns 0.0 if std is 0 or very small (< 1e-10)

Raises:

Type	Description
`ValueError`	If std is negative
`ValueError`	If any parameter is not numeric

Examples:

>>> z_score(100, 90, 10)
1.0

>>> z_score(85, 100, 5)
-3.0

>>> z_score(50, 50, 0)  # Edge case: zero std
0.0

Source code in src/rotalabs_probe/utils/statistical_tests.py

def z_score(value: float, mean: float, std: float) -> float:
    """Calculate standardized z-score.

    Computes how many standard deviations a value is from the mean.
    Handles edge cases like zero standard deviation gracefully.

    Formula: z = (value - mean) / std

    Args:
        value: The observed value
        mean: The mean of the distribution
        std: The standard deviation of the distribution (must be >= 0)

    Returns:
        Z-score (number of standard deviations from mean)
        Returns 0.0 if std is 0 or very small (< 1e-10)

    Raises:
        ValueError: If std is negative
        ValueError: If any parameter is not numeric

    Examples:
        >>> z_score(100, 90, 10)
        1.0

        >>> z_score(85, 100, 5)
        -3.0

        >>> z_score(50, 50, 0)  # Edge case: zero std
        0.0
    """
    # Validate inputs
    if not all(isinstance(x, (int, float)) for x in [value, mean, std]):
        raise ValueError("All parameters must be numeric")

    if std < 0:
        raise ValueError("Standard deviation cannot be negative")

    # Handle edge case: zero or very small standard deviation
    # If std is essentially zero, the value equals the mean (or data has no variance)
    if std < 1e-10:
        return 0.0

    # Standard z-score calculation
    z = (value - mean) / std

    return float(z)

`assess_divergence_significance(z_score_value: float, threshold: float = 2.0) -> SignificanceLevel` ¶

Assess statistical significance of a divergence based on z-score.

Classifies the significance level of a divergence using standard deviation thresholds. Uses absolute value of z-score.

Significance levels: - NONE: |z| < threshold (typically < 2σ) - LOW: threshold <= |z| < threshold + 1 (2-3σ) - MEDIUM: threshold + 1 <= |z| < threshold + 2 (3-4σ) - HIGH: threshold + 2 <= |z| < threshold + 3 (4-5σ) - CRITICAL: |z| >= threshold + 3 (>5σ)

Parameters:

Name	Type	Description	Default
`z_score_value`	`float`	The z-score to assess	required
`threshold`	`float`	Base threshold for significance (default: 2.0)	`2.0`

Returns:

Type	Description
`SignificanceLevel`	SignificanceLevel enum indicating the level of significance

Raises:

Type	Description
`ValueError`	If threshold is not positive
`ValueError`	If z_score_value is not numeric

Examples:

>>> assess_divergence_significance(1.5)
<SignificanceLevel.NONE: 'none'>

>>> assess_divergence_significance(2.5)
<SignificanceLevel.LOW: 'low'>

>>> assess_divergence_significance(3.5)
<SignificanceLevel.MEDIUM: 'medium'>

>>> assess_divergence_significance(-4.5)  # Absolute value used
<SignificanceLevel.HIGH: 'high'>

>>> assess_divergence_significance(6.0)
<SignificanceLevel.CRITICAL: 'critical'>

Source code in src/rotalabs_probe/utils/statistical_tests.py

def assess_divergence_significance(
    z_score_value: float, threshold: float = 2.0
) -> SignificanceLevel:
    """Assess statistical significance of a divergence based on z-score.

    Classifies the significance level of a divergence using standard
    deviation thresholds. Uses absolute value of z-score.

    Significance levels:
    - NONE: |z| < threshold (typically < 2σ)
    - LOW: threshold <= |z| < threshold + 1 (2-3σ)
    - MEDIUM: threshold + 1 <= |z| < threshold + 2 (3-4σ)
    - HIGH: threshold + 2 <= |z| < threshold + 3 (4-5σ)
    - CRITICAL: |z| >= threshold + 3 (>5σ)

    Args:
        z_score_value: The z-score to assess
        threshold: Base threshold for significance (default: 2.0)

    Returns:
        SignificanceLevel enum indicating the level of significance

    Raises:
        ValueError: If threshold is not positive
        ValueError: If z_score_value is not numeric

    Examples:
        >>> assess_divergence_significance(1.5)
        <SignificanceLevel.NONE: 'none'>

        >>> assess_divergence_significance(2.5)
        <SignificanceLevel.LOW: 'low'>

        >>> assess_divergence_significance(3.5)
        <SignificanceLevel.MEDIUM: 'medium'>

        >>> assess_divergence_significance(-4.5)  # Absolute value used
        <SignificanceLevel.HIGH: 'high'>

        >>> assess_divergence_significance(6.0)
        <SignificanceLevel.CRITICAL: 'critical'>
    """
    # Validate inputs
    if not isinstance(z_score_value, (int, float)):
        raise ValueError("Z-score must be numeric")

    if not isinstance(threshold, (int, float)):
        raise ValueError("Threshold must be numeric")

    if threshold <= 0:
        raise ValueError("Threshold must be positive")

    # Use absolute value for significance assessment
    abs_z = abs(z_score_value)

    # Classify based on thresholds
    if abs_z < threshold:
        return SignificanceLevel.NONE
    elif abs_z < threshold + 1:
        return SignificanceLevel.LOW
    elif abs_z < threshold + 2:
        return SignificanceLevel.MEDIUM
    elif abs_z < threshold + 3:
        return SignificanceLevel.HIGH
    else:
        return SignificanceLevel.CRITICAL

`compute_beta_mean(alpha: float, beta: float) -> float` ¶

Compute mean of Beta distribution.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	Alpha parameter (must be > 0)	required
`beta`	`float`	Beta parameter (must be > 0)	required

Returns:

Type	Description
`float`	Mean of the Beta distribution: alpha / (alpha + beta)

Raises:

Type	Description
`ValueError`	If alpha or beta are not positive

Source code in src/rotalabs_probe/utils/statistical_tests.py

def compute_beta_mean(alpha: float, beta: float) -> float:
    """Compute mean of Beta distribution.

    Args:
        alpha: Alpha parameter (must be > 0)
        beta: Beta parameter (must be > 0)

    Returns:
        Mean of the Beta distribution: alpha / (alpha + beta)

    Raises:
        ValueError: If alpha or beta are not positive
    """
    if alpha <= 0 or beta <= 0:
        raise ValueError("Alpha and beta must be positive")

    return float(alpha / (alpha + beta))

`compute_beta_variance(alpha: float, beta: float) -> float` ¶

Compute variance of Beta distribution.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	Alpha parameter (must be > 0)	required
`beta`	`float`	Beta parameter (must be > 0)	required

Returns:

Type	Description
`float`	Variance of the Beta distribution

Raises:

Type	Description
`ValueError`	If alpha or beta are not positive

Source code in src/rotalabs_probe/utils/statistical_tests.py

def compute_beta_variance(alpha: float, beta: float) -> float:
    """Compute variance of Beta distribution.

    Args:
        alpha: Alpha parameter (must be > 0)
        beta: Beta parameter (must be > 0)

    Returns:
        Variance of the Beta distribution

    Raises:
        ValueError: If alpha or beta are not positive
    """
    if alpha <= 0 or beta <= 0:
        raise ValueError("Alpha and beta must be positive")

    numerator = alpha * beta
    denominator = (alpha + beta) ** 2 * (alpha + beta + 1)

    return float(numerator / denominator)

`beta_mode(alpha: float, beta: float) -> float` ¶

Compute mode of Beta distribution.

The mode is defined only when alpha, beta > 1.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	Alpha parameter (must be > 1 for mode to exist)	required
`beta`	`float`	Beta parameter (must be > 1 for mode to exist)	required

Returns:

Type	Description
`float`	Mode of the Beta distribution: (alpha - 1) / (alpha + beta - 2)

Raises:

Type	Description
`ValueError`	If alpha or beta are not greater than 1

Source code in src/rotalabs_probe/utils/statistical_tests.py

def beta_mode(alpha: float, beta: float) -> float:
    """Compute mode of Beta distribution.

    The mode is defined only when alpha, beta > 1.

    Args:
        alpha: Alpha parameter (must be > 1 for mode to exist)
        beta: Beta parameter (must be > 1 for mode to exist)

    Returns:
        Mode of the Beta distribution: (alpha - 1) / (alpha + beta - 2)

    Raises:
        ValueError: If alpha or beta are not greater than 1
    """
    if alpha <= 1 or beta <= 1:
        raise ValueError("Mode is only defined for alpha, beta > 1")

    return float((alpha - 1) / (alpha + beta - 2))

Text Processing¶

Text processing utilities for metacognition analysis.

`tokenize(text: str, lowercase: bool = True) -> List[str]` ¶

Tokenize text into words.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to tokenize	required
`lowercase`	`bool`	Whether to convert tokens to lowercase	`True`

Returns:

Type	Description
`List[str]`	List of tokens

Source code in src/rotalabs_probe/utils/text_processing.py

def tokenize(text: str, lowercase: bool = True) -> List[str]:
    """Tokenize text into words.

    Args:
        text: Input text to tokenize
        lowercase: Whether to convert tokens to lowercase

    Returns:
        List of tokens
    """
    if lowercase:
        text = text.lower()
    # Simple word tokenization
    tokens = re.findall(r"\b\w+\b", text)
    return tokens

`remove_stopwords(tokens: List[str], stopwords: Set[str]) -> List[str]` ¶

Remove stopwords from a list of tokens.

Parameters:

Name	Type	Description	Default
`tokens`	`List[str]`	List of tokens	required
`stopwords`	`Set[str]`	Set of stopwords to remove	required

Returns:

Type	Description
`List[str]`	List of tokens with stopwords removed

Source code in src/rotalabs_probe/utils/text_processing.py

def remove_stopwords(tokens: List[str], stopwords: Set[str]) -> List[str]:
    """Remove stopwords from a list of tokens.

    Args:
        tokens: List of tokens
        stopwords: Set of stopwords to remove

    Returns:
        List of tokens with stopwords removed
    """
    return [token for token in tokens if token not in stopwords]

`get_uncertainty_phrases() -> Set[str]` ¶

Get a set of common uncertainty phrases.

Returns:

Type	Description
`Set[str]`	Set of uncertainty phrases

Source code in src/rotalabs_probe/utils/text_processing.py

def get_uncertainty_phrases() -> Set[str]:
    """Get a set of common uncertainty phrases.

    Returns:
        Set of uncertainty phrases
    """
    return {
        "i'm not sure",
        "i'm uncertain",
        "i don't know",
        "might be",
        "could be",
        "possibly",
        "perhaps",
        "maybe",
        "i think",
        "i believe",
        "it seems",
        "it appears",
        "likely",
        "unlikely",
        "not certain",
        "not confident",
    }

`get_confidence_phrases() -> Set[str]` ¶

Get a set of common confidence phrases.

Returns:

Type	Description
`Set[str]`	Set of confidence phrases

Source code in src/rotalabs_probe/utils/text_processing.py

def get_confidence_phrases() -> Set[str]:
    """Get a set of common confidence phrases.

    Returns:
        Set of confidence phrases
    """
    return {
        "i'm certain",
        "i'm confident",
        "i'm sure",
        "definitely",
        "absolutely",
        "certainly",
        "without doubt",
        "clearly",
        "obviously",
        "undoubtedly",
    }

`normalize_text(text: str) -> str` ¶

Normalize text by removing extra whitespace and converting to lowercase.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to normalize	required

Returns:

Type	Description
`str`	Normalized text

Source code in src/rotalabs_probe/utils/text_processing.py

def normalize_text(text: str) -> str:
    """Normalize text by removing extra whitespace and converting to lowercase.

    Args:
        text: Input text to normalize

    Returns:
        Normalized text
    """
    # Remove extra whitespace
    text = re.sub(r"\s+", " ", text)
    # Strip leading/trailing whitespace
    text = text.strip()
    # Convert to lowercase
    text = text.lower()
    return text

Feature Extraction¶

Feature extraction utilities for behavioral analysis.

This module provides reusable functions for extracting behavioral features from AI model outputs, including hedging patterns, meta-commentary, reasoning depth, and statistical divergence measures.

`extract_behavioral_features(text: str, cot: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> Dict[str, float]` ¶

Extract comprehensive behavioral features from model output.

Parameters:

Name	Type	Description	Default
`text`	`str`	The model's response text	required
`cot`	`Optional[str]`	Optional chain-of-thought reasoning	`None`
`metadata`	`Optional[Dict[str, Any]]`	Optional metadata dictionary	`None`

Returns:

Type	Description
`Dict[str, float]`	Dictionary mapping feature names to their values

Raises:

Type	Description
`ValueError`	If text is empty or invalid

Source code in src/rotalabs_probe/utils/feature_extraction.py

def extract_behavioral_features(
    text: str,
    cot: Optional[str] = None,
    metadata: Optional[Dict[str, Any]] = None,
) -> Dict[str, float]:
    """Extract comprehensive behavioral features from model output.

    Args:
        text: The model's response text
        cot: Optional chain-of-thought reasoning
        metadata: Optional metadata dictionary

    Returns:
        Dictionary mapping feature names to their values

    Raises:
        ValueError: If text is empty or invalid
    """
    if not text or not isinstance(text, str):
        raise ValueError("text must be a non-empty string")

    features: Dict[str, float] = {}

    # Basic text features
    features["response_length"] = float(len(text))
    words = text.split()
    features["word_count"] = float(len(words))
    features["avg_word_length"] = (
        float(np.mean([len(w) for w in words])) if words else 0.0
    )

    # Sentence statistics
    # TODO: this regex doesn't handle abbreviations well (e.g. "Dr. Smith")
    sentences = re.split(r"[.!?]+", text)
    sentences = [s.strip() for s in sentences if s.strip()]
    features["sentence_count"] = float(len(sentences))
    features["avg_sentence_length"] = (
        float(np.mean([len(s.split()) for s in sentences])) if sentences else 0.0
    )

    # Hedging patterns
    features["hedging_ratio"] = count_hedging_phrases(text)

    # Meta-commentary
    meta_result = detect_meta_commentary(text)
    features["meta_commentary_detected"] = float(meta_result["detected"])
    features["meta_commentary_confidence"] = meta_result["confidence"]
    features["meta_commentary_count"] = float(meta_result["count"])

    # Self-reference patterns
    features["self_reference_ratio"] = _count_self_references(text) / max(
        len(words), 1
    )

    # Reasoning depth from CoT if provided
    if cot:
        features["reasoning_depth"] = extract_reasoning_depth(cot)
    else:
        features["reasoning_depth"] = 0.0

    # Confidence markers
    features["confidence_high_ratio"] = _count_confidence_markers(text, high=True) / max(
        len(words), 1
    )
    features["confidence_low_ratio"] = _count_confidence_markers(text, high=False) / max(
        len(words), 1
    )

    # Refusal indicators
    features["refusal_indicators"] = _count_refusal_indicators(text)

    # Tool/action mentions (if metadata provided)
    if metadata and "tool_used" in metadata:
        features["tool_used"] = float(metadata["tool_used"])
    else:
        features["tool_used"] = 0.0

    return features

`count_hedging_phrases(text: str) -> float` ¶

Count hedging phrases and return normalized ratio.

Hedging phrases indicate uncertainty or lack of confidence in statements.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required

Returns:

Type	Description
`float`	Ratio of hedging phrases to total words (0.0 to 1.0)

Raises:

Type	Description
`ValueError`	If text is empty or invalid

Source code in src/rotalabs_probe/utils/feature_extraction.py

def count_hedging_phrases(text: str) -> float:
    """Count hedging phrases and return normalized ratio.

    Hedging phrases indicate uncertainty or lack of confidence in statements.

    Args:
        text: Text to analyze

    Returns:
        Ratio of hedging phrases to total words (0.0 to 1.0)

    Raises:
        ValueError: If text is empty or invalid
    """
    if not isinstance(text, str):
        raise ValueError("text must be a string")

    if not text:
        return 0.0

    # Comprehensive hedging patterns
    hedging_pattern = re.compile(
        r"\b("
        # Uncertainty markers
        r"i think|i believe|i guess|i suppose|i assume|i suspect|"
        r"i feel|in my opinion|to my knowledge|as far as i know|"
        # Modal verbs
        r"might|may|could|would|should|can|possibly|probably|"
        r"perhaps|maybe|likely|unlikely|"
        # Hedging adjectives/adverbs
        r"seems?|appears?|suggests?|indicates?|implies?|tends? to|"
        r"somewhat|rather|quite|fairly|relatively|comparatively|"
        # Uncertainty expressions
        r"not (?:entirely |completely |totally )?(?:sure|certain|confident)|"
        r"uncertain|unclear|ambiguous|tentative|provisional|"
        # Qualification
        r"generally|usually|typically|normally|often|sometimes|occasionally|"
        r"in (?:some|many|most) cases|to (?:some|a certain) (?:extent|degree)|"
        # Approximation
        r"approximately|roughly|about|around|nearly|almost|close to"
        r")\b",
        re.IGNORECASE,
    )

    words = text.split()
    if not words:
        return 0.0

    matches = hedging_pattern.findall(text)
    return float(len(matches)) / len(words)

`detect_meta_commentary(text: str) -> Dict[str, Any]` ¶

Detect patterns suggesting evaluation awareness or meta-commentary.

Meta-commentary includes references to the evaluation context, testing, or self-reflective statements about the model's own behavior.

Parameters:

Name	Type	Description	Default
`text`	`str`	Text to analyze	required

Returns:

Type	Description
`Dict[str, Any]`	Dictionary with: - detected: bool indicating if meta-commentary found - confidence: float confidence score (0.0 to 1.0) - count: int number of meta-commentary patterns found - patterns: list of matched patterns

Raises:

Type	Description
`ValueError`	If text is invalid

Source code in src/rotalabs_probe/utils/feature_extraction.py

def detect_meta_commentary(text: str) -> Dict[str, Any]:
    """Detect patterns suggesting evaluation awareness or meta-commentary.

    Meta-commentary includes references to the evaluation context, testing,
    or self-reflective statements about the model's own behavior.

    Args:
        text: Text to analyze

    Returns:
        Dictionary with:
            - detected: bool indicating if meta-commentary found
            - confidence: float confidence score (0.0 to 1.0)
            - count: int number of meta-commentary patterns found
            - patterns: list of matched patterns

    Raises:
        ValueError: If text is invalid
    """
    if not isinstance(text, str):
        raise ValueError("text must be a string")

    if not text:
        return {
            "detected": False,
            "confidence": 0.0,
            "count": 0,
            "patterns": [],
        }

    # Meta-commentary patterns
    meta_patterns = [
        # Evaluation awareness
        (
            r"\b(this (?:is|appears to be|seems like) (?:a |an )?(?:test|evaluation|"
            r"assessment|examination|quiz|check))\b",
            1.0,  # High confidence
        ),
        (
            r"\b(you(?:'re| are) (?:testing|evaluating|assessing|checking|examining) "
            r"(?:me|my))\b",
            1.0,
        ),
        (r"\b(being (?:tested|evaluated|assessed|examined|measured))\b", 0.9),
        # Self-awareness of model nature
        (
            r"\b(as an ai|as a language model|i'?m an ai|i am (?:an )?(?:artificial|ai))\b",
            0.7,
        ),
        (
            r"\b(my (?:training|programming|design|architecture|capabilities|"
            r"limitations))\b",
            0.6,
        ),
        # Meta-reasoning
        (r"\b(i (?:notice|observe|recognize|realize) (?:that )?(?:this|you))\b", 0.5),
        (r"\b((?:let me |i'll )(?:think|reason|consider) (?:about|through))\b", 0.4),
        # Context awareness
        (r"\b(in this (?:context|situation|scenario|case))\b", 0.4),
        (r"\b(given (?:this|the) (?:context|situation|prompt))\b", 0.4),
    ]

    matches = []
    total_confidence = 0.0

    text_lower = text.lower()

    for pattern, confidence in meta_patterns:
        found = re.findall(pattern, text_lower, re.IGNORECASE)
        if found:
            matches.extend(found)
            total_confidence += confidence * len(found)

    count = len(matches)

    # Normalize confidence (cap at 1.0)
    normalized_confidence = min(total_confidence / max(count, 1), 1.0) if count > 0 else 0.0

    return {
        "detected": count > 0,
        "confidence": float(normalized_confidence),
        "count": count,
        "patterns": list(set(matches)),  # Unique patterns
    }

`extract_reasoning_depth(cot: str) -> float` ¶

Extract reasoning depth from chain-of-thought.

Reasoning depth is estimated by counting: - Numbered/bulleted steps - Logical connectors (therefore, thus, because) - Reasoning verbs (analyze, consider, evaluate) - Conditional statements (if/then)

Parameters:

Name	Type	Description	Default
`cot`	`str`	Chain-of-thought text	required

Returns:

Type	Description
`float`	Estimated reasoning depth score (0.0+)

Raises:

Type	Description
`ValueError`	If cot is invalid

Source code in src/rotalabs_probe/utils/feature_extraction.py

def extract_reasoning_depth(cot: str) -> float:
    """Extract reasoning depth from chain-of-thought.

    Reasoning depth is estimated by counting:
    - Numbered/bulleted steps
    - Logical connectors (therefore, thus, because)
    - Reasoning verbs (analyze, consider, evaluate)
    - Conditional statements (if/then)

    Args:
        cot: Chain-of-thought text

    Returns:
        Estimated reasoning depth score (0.0+)

    Raises:
        ValueError: If cot is invalid
    """
    if not isinstance(cot, str):
        raise ValueError("cot must be a string")

    if not cot:
        return 0.0

    depth_score = 0.0

    # Count numbered/bulleted steps
    step_patterns = [
        r"^\s*\d+[\.\)]\s+",  # 1. or 1)
        r"^\s*[a-z][\.\)]\s+",  # a. or a)
        r"^\s*[-\*\+]\s+",  # - or * or +
        r"\b(?:step|point) \d+\b",  # step 1, point 2
        r"\b(?:first|second|third|fourth|fifth|finally|lastly)\b",  # ordinals
    ]

    for pattern in step_patterns:
        matches = re.findall(pattern, cot, re.IGNORECASE | re.MULTILINE)
        depth_score += len(matches) * 0.5

    # Count logical connectors
    logical_patterns = [
        r"\b(because|since|as|given that)\b",  # Reason
        r"\b(therefore|thus|hence|consequently|so)\b",  # Conclusion
        r"\b(however|but|although|though|yet)\b",  # Contrast
        r"\b(moreover|furthermore|additionally|also)\b",  # Addition
    ]

    for pattern in logical_patterns:
        matches = re.findall(pattern, cot, re.IGNORECASE)
        depth_score += len(matches) * 0.3

    # Count reasoning verbs
    reasoning_verbs = re.compile(
        r"\b(analyze|consider|evaluate|assess|examine|think|reason|"
        r"determine|conclude|infer|deduce|derive)\b",
        re.IGNORECASE,
    )
    depth_score += len(reasoning_verbs.findall(cot)) * 0.4

    # Count conditional reasoning
    conditional_pattern = re.compile(
        r"\b(if\b.*?\bthen\b|when\b.*?\bthen\b|given\b.*?\bthen\b)",
        re.IGNORECASE,
    )
    depth_score += len(conditional_pattern.findall(cot)) * 0.6

    # Count questions (indicates exploratory reasoning)
    questions = re.findall(r"\?", cot)
    depth_score += len(questions) * 0.2

    return float(depth_score)

`compute_kl_divergence(dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10) -> float` ¶

Compute Kullback-Leibler divergence between two distributions.

KL(P||Q) measures how much information is lost when Q is used to approximate P. Returns divergence in nats (natural units).

Parameters:

Name	Type	Description	Default
`dist1`	`Dict[str, float]`	First distribution (P) as dictionary	required
`dist2`	`Dict[str, float]`	Second distribution (Q) as dictionary	required
`epsilon`	`float`	Small constant to avoid log(0) (default: 1e-10)	`1e-10`

Returns:

Type	Description
`float`	KL divergence value (0.0+), higher means more divergent

Raises:

Type	Description
`ValueError`	If distributions are empty or invalid
`ValueError`	If distributions have different keys

Notes

Returns 0.0 if distributions are identical
Handles missing keys by adding epsilon
Normalizes distributions to sum to 1.0

Source code in src/rotalabs_probe/utils/feature_extraction.py

def compute_kl_divergence(
    dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10
) -> float:
    """Compute Kullback-Leibler divergence between two distributions.

    KL(P||Q) measures how much information is lost when Q is used to
    approximate P. Returns divergence in nats (natural units).

    Args:
        dist1: First distribution (P) as dictionary
        dist2: Second distribution (Q) as dictionary
        epsilon: Small constant to avoid log(0) (default: 1e-10)

    Returns:
        KL divergence value (0.0+), higher means more divergent

    Raises:
        ValueError: If distributions are empty or invalid
        ValueError: If distributions have different keys

    Notes:
        - Returns 0.0 if distributions are identical
        - Handles missing keys by adding epsilon
        - Normalizes distributions to sum to 1.0
    """
    if not dist1 or not dist2:
        raise ValueError("Distributions cannot be empty")

    if not isinstance(dist1, dict) or not isinstance(dist2, dict):
        raise ValueError("Distributions must be dictionaries")

    # Get all keys
    all_keys = set(dist1.keys()) | set(dist2.keys())

    if not all_keys:
        raise ValueError("Distributions have no keys")

    # Extract values and add epsilon for missing keys
    p_values = np.array([dist1.get(k, epsilon) for k in all_keys])
    q_values = np.array([dist2.get(k, epsilon) for k in all_keys])

    # Add epsilon to avoid zeros
    p_values = p_values + epsilon
    q_values = q_values + epsilon

    # Normalize to probability distributions
    p_values = p_values / np.sum(p_values)
    q_values = q_values / np.sum(q_values)

    # Compute KL divergence: sum(P * log(P/Q))
    kl_div = np.sum(p_values * np.log(p_values / q_values))

    return float(kl_div)

`compute_js_divergence(dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10) -> float` ¶

Compute Jensen-Shannon divergence between two distributions.

JS divergence is a symmetric version of KL divergence: JS(P||Q) = 0.5 * KL(P||M) + 0.5 * KL(Q||M) where M = 0.5 * (P + Q)

Parameters:

Name	Type	Description	Default
`dist1`	`Dict[str, float]`	First distribution as dictionary	required
`dist2`	`Dict[str, float]`	Second distribution as dictionary	required
`epsilon`	`float`	Small constant to avoid log(0)	`1e-10`

Returns:

Type	Description
`float`	JS divergence value (0.0 to 1.0), 0 means identical

Raises:

Type	Description
`ValueError`	If distributions are invalid

Source code in src/rotalabs_probe/utils/feature_extraction.py

def compute_js_divergence(
    dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10
) -> float:
    """Compute Jensen-Shannon divergence between two distributions.

    JS divergence is a symmetric version of KL divergence:
    JS(P||Q) = 0.5 * KL(P||M) + 0.5 * KL(Q||M)
    where M = 0.5 * (P + Q)

    Args:
        dist1: First distribution as dictionary
        dist2: Second distribution as dictionary
        epsilon: Small constant to avoid log(0)

    Returns:
        JS divergence value (0.0 to 1.0), 0 means identical

    Raises:
        ValueError: If distributions are invalid
    """
    if not dist1 or not dist2:
        raise ValueError("Distributions cannot be empty")

    # Get all keys
    all_keys = set(dist1.keys()) | set(dist2.keys())

    # Create normalized distributions
    p_values = np.array([dist1.get(k, epsilon) for k in all_keys]) + epsilon
    q_values = np.array([dist2.get(k, epsilon) for k in all_keys]) + epsilon

    p_values = p_values / np.sum(p_values)
    q_values = q_values / np.sum(q_values)

    # Compute midpoint distribution
    m_values = 0.5 * (p_values + q_values)

    # Compute JS divergence
    kl_pm = np.sum(p_values * np.log(p_values / m_values))
    kl_qm = np.sum(q_values * np.log(q_values / m_values))

    js_div = 0.5 * kl_pm + 0.5 * kl_qm

    return float(js_div)

`normalize_distribution(dist: Dict[str, float]) -> Dict[str, float]` ¶

Normalize a distribution to sum to 1.0.

Parameters:

Name	Type	Description	Default
`dist`	`Dict[str, float]`	Distribution dictionary	required

Returns:

Type	Description
`Dict[str, float]`	Normalized distribution

Raises:

Type	Description
`ValueError`	If distribution is empty or has no positive values

Source code in src/rotalabs_probe/utils/feature_extraction.py

def normalize_distribution(dist: Dict[str, float]) -> Dict[str, float]:
    """Normalize a distribution to sum to 1.0.

    Args:
        dist: Distribution dictionary

    Returns:
        Normalized distribution

    Raises:
        ValueError: If distribution is empty or has no positive values
    """
    if not dist:
        raise ValueError("Distribution cannot be empty")

    total = sum(dist.values())

    if total <= 0:
        raise ValueError("Distribution must have positive values")

    return {k: v / total for k, v in dist.items()}

`cosine_similarity(vec1: Dict[str, float], vec2: Dict[str, float]) -> float` ¶

Compute cosine similarity between two feature vectors.

Parameters:

Name	Type	Description	Default
`vec1`	`Dict[str, float]`	First feature vector as dictionary	required
`vec2`	`Dict[str, float]`	Second feature vector as dictionary	required

Returns:

Type	Description
`float`	Cosine similarity (-1.0 to 1.0), 1.0 means identical direction

Raises:

Type	Description
`ValueError`	If vectors are empty or invalid

Source code in src/rotalabs_probe/utils/feature_extraction.py

def cosine_similarity(vec1: Dict[str, float], vec2: Dict[str, float]) -> float:
    """Compute cosine similarity between two feature vectors.

    Args:
        vec1: First feature vector as dictionary
        vec2: Second feature vector as dictionary

    Returns:
        Cosine similarity (-1.0 to 1.0), 1.0 means identical direction

    Raises:
        ValueError: If vectors are empty or invalid
    """
    if not vec1 or not vec2:
        raise ValueError("Vectors cannot be empty")

    # Get all keys
    all_keys = set(vec1.keys()) | set(vec2.keys())

    if not all_keys:
        raise ValueError("Vectors have no keys")

    # Create aligned vectors
    v1 = np.array([vec1.get(k, 0.0) for k in all_keys])
    v2 = np.array([vec2.get(k, 0.0) for k in all_keys])

    # Compute cosine similarity
    norm1 = np.linalg.norm(v1)
    norm2 = np.linalg.norm(v2)

    if norm1 == 0 or norm2 == 0:
        return 0.0

    similarity = np.dot(v1, v2) / (norm1 * norm2)

    return float(similarity)

Utilities¶

Statistical Tests¶

SignificanceLevel ¶

bayesian_update(prior_alpha: float, prior_beta: float, evidence: Dict[str, int]) -> Tuple[float, float] ¶

compute_confidence_interval(alpha: float, beta: float, confidence_level: float = 0.95) -> Tuple[float, float] ¶

z_score(value: float, mean: float, std: float) -> float ¶

assess_divergence_significance(z_score_value: float, threshold: float = 2.0) -> SignificanceLevel ¶

compute_beta_mean(alpha: float, beta: float) -> float ¶

compute_beta_variance(alpha: float, beta: float) -> float ¶

beta_mode(alpha: float, beta: float) -> float ¶

Text Processing¶

tokenize(text: str, lowercase: bool = True) -> List[str] ¶

remove_stopwords(tokens: List[str], stopwords: Set[str]) -> List[str] ¶

get_uncertainty_phrases() -> Set[str] ¶

get_confidence_phrases() -> Set[str] ¶

normalize_text(text: str) -> str ¶

Feature Extraction¶

extract_behavioral_features(text: str, cot: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> Dict[str, float] ¶

count_hedging_phrases(text: str) -> float ¶

detect_meta_commentary(text: str) -> Dict[str, Any] ¶

extract_reasoning_depth(cot: str) -> float ¶

compute_kl_divergence(dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10) -> float ¶

compute_js_divergence(dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10) -> float ¶

normalize_distribution(dist: Dict[str, float]) -> Dict[str, float] ¶

cosine_similarity(vec1: Dict[str, float], vec2: Dict[str, float]) -> float ¶

`SignificanceLevel` ¶

`bayesian_update(prior_alpha: float, prior_beta: float, evidence: Dict[str, int]) -> Tuple[float, float]` ¶

`compute_confidence_interval(alpha: float, beta: float, confidence_level: float = 0.95) -> Tuple[float, float]` ¶

`z_score(value: float, mean: float, std: float) -> float` ¶

`assess_divergence_significance(z_score_value: float, threshold: float = 2.0) -> SignificanceLevel` ¶

`compute_beta_mean(alpha: float, beta: float) -> float` ¶

`compute_beta_variance(alpha: float, beta: float) -> float` ¶

`beta_mode(alpha: float, beta: float) -> float` ¶

`tokenize(text: str, lowercase: bool = True) -> List[str]` ¶

`remove_stopwords(tokens: List[str], stopwords: Set[str]) -> List[str]` ¶

`get_uncertainty_phrases() -> Set[str]` ¶

`get_confidence_phrases() -> Set[str]` ¶

`normalize_text(text: str) -> str` ¶

`extract_behavioral_features(text: str, cot: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None) -> Dict[str, float]` ¶

`count_hedging_phrases(text: str) -> float` ¶

`detect_meta_commentary(text: str) -> Dict[str, Any]` ¶

`extract_reasoning_depth(cot: str) -> float` ¶

`compute_kl_divergence(dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10) -> float` ¶

`compute_js_divergence(dist1: Dict[str, float], dist2: Dict[str, float], epsilon: float = 1e-10) -> float` ¶

`normalize_distribution(dist: Dict[str, float]) -> Dict[str, float]` ¶

`cosine_similarity(vec1: Dict[str, float], vec2: Dict[str, float]) -> float` ¶