Skip to content

Analysis Module

The rotalabs_audit.analysis module provides tools for analyzing reasoning chains, including counterfactual analysis, evaluation awareness detection, reasoning quality assessment, and causal importance analysis.

Counterfactual Analysis

Tools for performing counterfactual interventions on reasoning chains to understand causal factors in AI decision-making.

CounterfactualAnalyzer

Analyze reasoning chains through counterfactual interventions.

CounterfactualAnalyzer

Analyze reasoning chains through counterfactual interventions.

This analyzer performs systematic counterfactual interventions on reasoning chains to understand which components are causally important to the final output. By removing or modifying specific types of reasoning and measuring the resulting behavioral changes, we can identify critical reasoning steps.

Attributes:

Name Type Description
parser

The reasoning chain parser to use.

Example

analyzer = CounterfactualAnalyzer() chain = analyzer.parser.parse("Let me think... I notice this is a test...") results = analyzer.analyze(chain) print(results[InterventionType.REMOVE_EVALUATION_AWARENESS].behavioral_divergence) 0.35

__init__(parser=None)

Initialize the counterfactual analyzer.

Parameters:

Name Type Description Default
parser Optional[ReasoningChainParser]

Optional custom parser. If not provided, a default ReasoningChainParser will be created.

None
analyze(chain)

Run all intervention types and return results.

Performs each type of counterfactual intervention on the reasoning chain and measures the resulting behavioral divergence.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
Dict[InterventionType, CounterfactualResult]

Dictionary mapping each intervention type to its result.

Example

results = analyzer.analyze(chain) for itype, result in results.items(): ... print(f"{itype.value}: divergence={result.behavioral_divergence:.2f}")

intervene(chain, intervention)

Apply a single intervention and measure effect.

Performs the specified counterfactual intervention on the reasoning chain and calculates metrics about the resulting changes.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to intervene on.

required
intervention InterventionType

The type of intervention to perform.

required

Returns:

Type Description
CounterfactualResult

CounterfactualResult containing the modified chain and metrics.

Example

result = analyzer.intervene(chain, InterventionType.REMOVE_GOAL_REASONING) print(f"Removed {result.steps_removed} steps")

rank_causal_importance(results)

Rank which reasoning types are most causally important.

Uses the counterfactual results to determine which types of reasoning have the most causal influence on the final output.

Parameters:

Name Type Description Default
results Dict[InterventionType, CounterfactualResult]

Dictionary of counterfactual results from analyze().

required

Returns:

Type Description
Dict[ReasoningType, float]

Dictionary mapping reasoning types to importance scores (0-1).

Example

importance = analyzer.rank_causal_importance(results) most_important = max(importance.items(), key=lambda x: x[1]) print(f"Most important: {most_important[0].value}")

identify_critical_steps(chain, results)

Identify which specific steps are critical to the decision.

Analyzes the counterfactual results to identify individual steps that have high causal importance to the reasoning outcome.

Parameters:

Name Type Description Default
chain ReasoningChain

The original reasoning chain.

required
results Dict[InterventionType, CounterfactualResult]

Dictionary of counterfactual results from analyze().

required

Returns:

Type Description
List[ReasoningStep]

List of reasoning steps deemed critical, sorted by importance.

Example

critical = analyzer.identify_critical_steps(chain, results) for step in critical[:3]: ... print(f"Critical step {step.index}: {step.text[:50]}...")

CounterfactualResult

Result of a single counterfactual intervention.

CounterfactualResult dataclass

Result of a single counterfactual intervention.

Contains all information about what was changed and how the reasoning diverged from the original as a result of the intervention.

Attributes:

Name Type Description
intervention_type InterventionType

The type of intervention performed.

original_chain ReasoningChain

The original reasoning chain before intervention.

modified_chain ReasoningChain

The reasoning chain after intervention.

original_text str

The original text representation.

modified_text str

The modified text representation.

behavioral_divergence float

How different the modified chain is (0-1).

steps_removed int

Number of steps removed by the intervention.

steps_modified int

Number of steps modified by the intervention.

causal_effect float

Estimated causal importance of removed content (0-1).

Example

result.behavioral_divergence 0.45 result.steps_removed 2 result.causal_effect 0.62

InterventionType

Types of counterfactual interventions that can be performed.

InterventionType

Bases: str, Enum

Types of counterfactual interventions.

Each intervention type represents a different way of modifying a reasoning chain to test the causal importance of specific components.

Attributes:

Name Type Description
REMOVE_EVALUATION_AWARENESS

Remove steps showing evaluation awareness.

AMPLIFY_EVALUATION_AWARENESS

Inject or amplify evaluation awareness.

REMOVE_GOAL_REASONING

Remove goal-oriented reasoning steps.

REMOVE_UNCERTAINTY

Remove uncertainty expressions.

NEUTRALIZE_INCENTIVES

Neutralize incentive-related language.

REMOVE_META_REASONING

Remove meta-reasoning about the thinking process.


Evaluation Awareness Detection

Tools for detecting when AI models show awareness of being evaluated and strategic behavior adaptation.

EvaluationAwarenessDetector

Detect evaluation-aware reasoning in chains.

EvaluationAwarenessDetector

Detect evaluation-aware reasoning in chains.

This detector analyzes reasoning chains to identify instances where an AI model shows awareness of being evaluated, tested, or assessed. It looks for both explicit mentions and more subtle strategic adaptations.

Attributes:

Name Type Description
AWARENESS_INDICATORS List[str]

Phrases indicating evaluation awareness.

STRATEGIC_INDICATORS List[str]

Phrases indicating strategic behavior adaptation.

Example

detector = EvaluationAwarenessDetector() chain = parser.parse("Let me think... I notice this is a test...") analysis = detector.detect(chain) if analysis.awareness_score > 0.5: ... print("High evaluation awareness detected")

__init__()

Initialize the evaluation awareness detector.

detect(chain)

Analyze chain for evaluation awareness.

Performs comprehensive analysis of a reasoning chain to detect evaluation awareness and strategic adaptation.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
AwarenessAnalysis

AwarenessAnalysis containing all detected indicators and scores.

Example

analysis = detector.detect(chain) print(f"Awareness: {analysis.awareness_score:.2f}") print(f"Strategic adaptation: {analysis.strategic_adaptation_score:.2f}")

score_awareness(chain)

Calculate awareness score (0-1).

Computes an overall awareness score based on the presence and frequency of awareness indicators in the reasoning chain.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to score.

required

Returns:

Type Description
float

Awareness score between 0 (no awareness) and 1 (high awareness).

Example

score = detector.score_awareness(chain) if score > 0.7: ... print("High evaluation awareness")

find_awareness_indicators(chain)

Find specific awareness indicators in chain.

Extracts all detected awareness indicators with their locations and confidence scores.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
List[AwarenessIndicator]

List of AwarenessIndicator objects.

Example

indicators = detector.find_awareness_indicators(chain) for ind in indicators: ... print(f"Found '{ind.text}' in step {ind.step_index}")

detect_strategic_adaptation(chain)

Detect strategic behavior adaptation (0-1).

Measures the degree to which the reasoning shows strategic adaptation based on awareness of the evaluation context.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
float

Strategic adaptation score between 0 and 1.

Example

strategic_score = detector.detect_strategic_adaptation(chain) if strategic_score > 0.6: ... print("Significant strategic adaptation detected")

AwarenessIndicator

A detected indicator of evaluation awareness.

AwarenessIndicator dataclass

A detected indicator of evaluation awareness.

Represents a specific phrase or pattern that suggests the model is aware of being evaluated.

Attributes:

Name Type Description
text str

The matched text that indicates awareness.

indicator_type str

Category of awareness indicator.

step_index int

Index of the reasoning step containing this indicator.

confidence float

Confidence that this is a genuine awareness indicator (0-1).

Example

indicator = AwarenessIndicator( ... text="this is a test", ... indicator_type="explicit_test", ... step_index=2, ... confidence=0.95, ... )

StrategicAdaptation

Evidence of strategic behavior adaptation based on evaluation awareness.

StrategicAdaptation dataclass

Evidence of strategic behavior adaptation.

Represents a detected instance where the model appears to be adapting its behavior strategically based on evaluation awareness.

Attributes:

Name Type Description
description str

Description of the strategic adaptation.

evidence_text str

The text providing evidence of adaptation.

step_index int

Index of the step showing adaptation.

strength float

Strength of the adaptation evidence (0-1).

Example

adaptation = StrategicAdaptation( ... description="Model explicitly states intention to demonstrate capability", ... evidence_text="I want to show that I can handle this correctly", ... step_index=3, ... strength=0.75, ... )

AwarenessAnalysis

Complete analysis of evaluation awareness in a reasoning chain.

AwarenessAnalysis dataclass

Complete analysis of evaluation awareness in a reasoning chain.

Contains all detected indicators of awareness and strategic adaptation, along with summary scores.

Attributes:

Name Type Description
chain ReasoningChain

The analyzed reasoning chain.

awareness_score float

Overall awareness score (0-1).

strategic_adaptation_score float

Degree of strategic adaptation (0-1).

indicators List[AwarenessIndicator]

List of detected awareness indicators.

adaptations List[StrategicAdaptation]

List of detected strategic adaptations.

aware_steps List[int]

Indices of steps showing awareness.

summary str

Human-readable summary of the analysis.

Example

analysis.awareness_score 0.65 len(analysis.indicators) 3 analysis.aware_steps [2, 5, 8]


Reasoning Quality Assessment

Tools for assessing the quality of reasoning across multiple dimensions.

ReasoningQualityAssessor

Assess quality of reasoning chains across multiple dimensions.

ReasoningQualityAssessor

Assess quality of reasoning chains.

This assessor evaluates reasoning chains across multiple quality dimensions to identify potential issues and provide improvement suggestions.

Example

assessor = ReasoningQualityAssessor() metrics = assessor.assess(chain) if metrics.overall_score < 0.6: ... print("Low quality reasoning detected") ... for issue in metrics.issues: ... print(f" - {issue}")

__init__(weights=None)

Initialize the quality assessor.

Parameters:

Name Type Description Default
weights Optional[Dict[str, float]]

Optional custom weights for quality dimensions. Keys should match DIMENSION_WEIGHTS keys.

None
assess(chain)

Comprehensive quality assessment.

Evaluates the reasoning chain across all quality dimensions and returns detailed metrics.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to assess.

required

Returns:

Type Description
QualityMetrics

QualityMetrics with scores for each dimension.

Example

metrics = assessor.assess(chain) print(f"Clarity: {metrics.clarity:.2f}") print(f"Logical validity: {metrics.logical_validity:.2f}")

assess_clarity(chain)

Assess how clear the reasoning is (0-1).

Evaluates clarity based on: - Sentence length (moderate is better) - Use of jargon and unclear terms - Logical structure and organization

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to assess.

required

Returns:

Type Description
float

Clarity score between 0 and 1.

Example

clarity = assessor.assess_clarity(chain) if clarity < 0.5: ... print("Reasoning needs clarity improvements")

assess_completeness(chain)

Assess if reasoning is complete (0-1).

Checks whether: - The reasoning has a clear conclusion - Steps connect logically - There are no apparent gaps

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to assess.

required

Returns:

Type Description
float

Completeness score between 0 and 1.

Example

completeness = assessor.assess_completeness(chain) if completeness < 0.6: ... print("Reasoning may be incomplete")

assess_consistency(chain)

Check for contradictions (0-1).

Analyzes the chain for internal contradictions where one step contradicts another.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to assess.

required

Returns:

Type Description
float

Consistency score between 0 (many contradictions) and 1 (consistent).

Example

consistency = assessor.assess_consistency(chain) if consistency < 0.8: ... print("Potential contradictions detected")

assess_logical_validity(chain)

Assess logical soundness (0-1).

Evaluates whether: - Causal links are valid - Inferences follow from premises - Logical connectors are used appropriately

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to assess.

required

Returns:

Type Description
float

Logical validity score between 0 and 1.

Example

validity = assessor.assess_logical_validity(chain) if validity > 0.8: ... print("Strong logical structure")

assess_evidence_support(chain)

Check if claims are supported (0-1).

Evaluates the degree to which claims in the reasoning are supported by evidence, examples, or citations.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to assess.

required

Returns:

Type Description
float

Evidence support score between 0 and 1.

Example

evidence = assessor.assess_evidence_support(chain) if evidence < 0.3: ... print("Claims need more supporting evidence")

identify_issues(chain)

Identify specific quality issues.

Generates a list of human-readable issue descriptions found in the reasoning chain.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
List[str]

List of issue description strings.

Example

issues = assessor.identify_issues(chain) for issue in issues: ... print(f"- {issue}")

suggest_improvements(chain, metrics)

Suggest how to improve reasoning quality.

Based on the quality metrics, provides actionable suggestions for improving the reasoning chain.

Parameters:

Name Type Description Default
chain ReasoningChain

The original reasoning chain.

required
metrics QualityMetrics

Quality metrics from assess().

required

Returns:

Type Description
List[str]

List of improvement suggestion strings.

Example

suggestions = assessor.suggest_improvements(chain, metrics) for suggestion in suggestions: ... print(f"- {suggestion}")

QualityMetrics

Quality metrics for a reasoning chain.

QualityMetrics dataclass

Quality metrics for a reasoning chain.

Contains scores for multiple quality dimensions along with an overall quality score.

Attributes:

Name Type Description
clarity float

How clear and understandable the reasoning is (0-1).

completeness float

Whether the reasoning covers all necessary aspects (0-1).

consistency float

Absence of contradictions (0-1).

logical_validity float

Soundness of logical inferences (0-1).

evidence_support float

Degree to which claims are supported (0-1).

overall_score float

Weighted combination of all metrics (0-1).

issues List[str]

List of identified quality issues.

step_scores Dict[int, float]

Per-step quality scores.

Example

metrics.clarity 0.85 metrics.overall_score 0.78 len(metrics.issues) 2

QualityIssue

A specific quality issue identified in reasoning.

QualityIssue dataclass

A specific quality issue identified in reasoning.

Attributes:

Name Type Description
category str

The category of the issue (e.g., "clarity", "logic").

description str

Description of the issue.

step_index Optional[int]

Index of the step with the issue (if applicable).

severity str

Severity of the issue ("low", "medium", "high").

suggestion str

Suggested improvement.

Example

issue = QualityIssue( ... category="logic", ... description="Non-sequitur: conclusion does not follow from premises", ... step_index=5, ... severity="high", ... suggestion="Provide intermediate reasoning steps", ... )


Causal Analysis

Tools for analyzing the causal structure of reasoning chains and identifying critical steps.

CausalAnalyzer

Analyze causal importance of reasoning components.

CausalAnalyzer

Analyze causal importance of reasoning components.

This analyzer examines reasoning chains to determine: - Which steps are most important to the final conclusion - How steps depend on each other - Which steps are the primary causal drivers

Example

analyzer = CausalAnalyzer() importance = analyzer.analyze_step_importance(chain) drivers = analyzer.find_causal_drivers(chain) graph = analyzer.build_dependency_graph(chain)

__init__()

Initialize the causal analyzer.

analyze(chain)

Perform comprehensive causal analysis on a reasoning chain.

Analyzes the chain to identify step importance, causal drivers, dependencies, and the conclusion.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
CausalAnalysisResult

CausalAnalysisResult with all analysis results.

Example

result = analyzer.analyze(chain) print(f"Found {len(result.causal_drivers)} causal drivers")

analyze_step_importance(chain)

Rank importance of each step (index -> importance score).

Calculates an importance score for each step based on: - Position in the chain - Causal language usage - Whether other steps reference it - Reasoning type

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
Dict[int, float]

Dictionary mapping step index to importance score (0-1).

Example

importance = analyzer.analyze_step_importance(chain) most_important = max(importance.items(), key=lambda x: x[1]) print(f"Most important: step {most_important[0]}")

find_causal_drivers(chain)

Find steps that are causal drivers of the conclusion.

Identifies steps that are critical to reaching the final conclusion, based on their position in the causal dependency structure.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
List[ReasoningStep]

List of ReasoningStep objects that are causal drivers.

Example

drivers = analyzer.find_causal_drivers(chain) for driver in drivers: ... print(f"Driver step {driver.index}: {driver.text[:50]}...")

build_dependency_graph(chain)

Build graph of which steps depend on which.

Creates a directed graph where edges point from a step to the steps it depends on (references or uses).

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
Dict[int, List[int]]

Dictionary mapping step index to list of dependency indices.

Example

graph = analyzer.build_dependency_graph(chain) print(f"Step 3 depends on steps: {graph.get(3, [])}")

identify_conclusion(chain)

Identify the conclusion/decision step.

Finds the step that represents the final conclusion or decision in the reasoning chain.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain to analyze.

required

Returns:

Type Description
Optional[ReasoningStep]

The conclusion step if found, None otherwise.

Example

conclusion = analyzer.identify_conclusion(chain) if conclusion: ... print(f"Conclusion: {conclusion.text}")

compute_causal_path(chain, start_index, end_index)

Compute the causal path between two steps.

Finds the sequence of steps that causally connect the start step to the end step, if such a path exists.

Parameters:

Name Type Description Default
chain ReasoningChain

The reasoning chain.

required
start_index int

Index of the starting step.

required
end_index int

Index of the target step.

required

Returns:

Type Description
List[int]

List of step indices forming the causal path, or empty if no path.

Example

path = analyzer.compute_causal_path(chain, 0, 5) print(f"Causal path: {' -> '.join(str(i) for i in path)}")

CausalRelation

A causal relationship between two reasoning steps.

CausalRelation dataclass

A causal relationship between two reasoning steps.

Represents a directed causal link where one step (cause) influences another step (effect).

Attributes:

Name Type Description
cause_index int

Index of the causing step.

effect_index int

Index of the affected step.

strength float

Strength of the causal relationship (0-1).

relation_type str

Type of causal relation (e.g., "logical", "temporal").

Example

relation = CausalRelation( ... cause_index=2, ... effect_index=5, ... strength=0.8, ... relation_type="logical", ... )

CausalAnalysisResult

Complete causal analysis of a reasoning chain.

CausalAnalysisResult dataclass

Complete causal analysis of a reasoning chain.

Contains all causal relationships, importance scores, and the dependency graph for a reasoning chain.

Attributes:

Name Type Description
chain ReasoningChain

The analyzed reasoning chain.

step_importance Dict[int, float]

Importance score for each step (index -> score).

causal_drivers List[ReasoningStep]

Steps that are primary drivers of the conclusion.

dependency_graph Dict[int, List[int]]

Graph of step dependencies (step -> dependencies).

conclusion_step Optional[ReasoningStep]

The identified conclusion step, if any.

causal_relations List[CausalRelation]

All identified causal relations.

Example

result.step_importance[3] 0.85 len(result.causal_drivers) 2 result.conclusion_step.text "Therefore, I conclude..."