Analysis Module¶
The rotalabs_audit.analysis module provides tools for analyzing reasoning chains, including counterfactual analysis, evaluation awareness detection, reasoning quality assessment, and causal importance analysis.
Counterfactual Analysis¶
Tools for performing counterfactual interventions on reasoning chains to understand causal factors in AI decision-making.
CounterfactualAnalyzer¶
Analyze reasoning chains through counterfactual interventions.
CounterfactualAnalyzer
¶
Analyze reasoning chains through counterfactual interventions.
This analyzer performs systematic counterfactual interventions on reasoning chains to understand which components are causally important to the final output. By removing or modifying specific types of reasoning and measuring the resulting behavioral changes, we can identify critical reasoning steps.
Attributes:
| Name | Type | Description |
|---|---|---|
parser |
The reasoning chain parser to use. |
Example
analyzer = CounterfactualAnalyzer() chain = analyzer.parser.parse("Let me think... I notice this is a test...") results = analyzer.analyze(chain) print(results[InterventionType.REMOVE_EVALUATION_AWARENESS].behavioral_divergence) 0.35
__init__(parser=None)
¶
Initialize the counterfactual analyzer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parser
|
Optional[ReasoningChainParser]
|
Optional custom parser. If not provided, a default ReasoningChainParser will be created. |
None
|
analyze(chain)
¶
Run all intervention types and return results.
Performs each type of counterfactual intervention on the reasoning chain and measures the resulting behavioral divergence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
Dict[InterventionType, CounterfactualResult]
|
Dictionary mapping each intervention type to its result. |
Example
results = analyzer.analyze(chain) for itype, result in results.items(): ... print(f"{itype.value}: divergence={result.behavioral_divergence:.2f}")
intervene(chain, intervention)
¶
Apply a single intervention and measure effect.
Performs the specified counterfactual intervention on the reasoning chain and calculates metrics about the resulting changes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to intervene on. |
required |
intervention
|
InterventionType
|
The type of intervention to perform. |
required |
Returns:
| Type | Description |
|---|---|
CounterfactualResult
|
CounterfactualResult containing the modified chain and metrics. |
Example
result = analyzer.intervene(chain, InterventionType.REMOVE_GOAL_REASONING) print(f"Removed {result.steps_removed} steps")
rank_causal_importance(results)
¶
Rank which reasoning types are most causally important.
Uses the counterfactual results to determine which types of reasoning have the most causal influence on the final output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
results
|
Dict[InterventionType, CounterfactualResult]
|
Dictionary of counterfactual results from analyze(). |
required |
Returns:
| Type | Description |
|---|---|
Dict[ReasoningType, float]
|
Dictionary mapping reasoning types to importance scores (0-1). |
Example
importance = analyzer.rank_causal_importance(results) most_important = max(importance.items(), key=lambda x: x[1]) print(f"Most important: {most_important[0].value}")
identify_critical_steps(chain, results)
¶
Identify which specific steps are critical to the decision.
Analyzes the counterfactual results to identify individual steps that have high causal importance to the reasoning outcome.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The original reasoning chain. |
required |
results
|
Dict[InterventionType, CounterfactualResult]
|
Dictionary of counterfactual results from analyze(). |
required |
Returns:
| Type | Description |
|---|---|
List[ReasoningStep]
|
List of reasoning steps deemed critical, sorted by importance. |
Example
critical = analyzer.identify_critical_steps(chain, results) for step in critical[:3]: ... print(f"Critical step {step.index}: {step.text[:50]}...")
CounterfactualResult¶
Result of a single counterfactual intervention.
CounterfactualResult
dataclass
¶
Result of a single counterfactual intervention.
Contains all information about what was changed and how the reasoning diverged from the original as a result of the intervention.
Attributes:
| Name | Type | Description |
|---|---|---|
intervention_type |
InterventionType
|
The type of intervention performed. |
original_chain |
ReasoningChain
|
The original reasoning chain before intervention. |
modified_chain |
ReasoningChain
|
The reasoning chain after intervention. |
original_text |
str
|
The original text representation. |
modified_text |
str
|
The modified text representation. |
behavioral_divergence |
float
|
How different the modified chain is (0-1). |
steps_removed |
int
|
Number of steps removed by the intervention. |
steps_modified |
int
|
Number of steps modified by the intervention. |
causal_effect |
float
|
Estimated causal importance of removed content (0-1). |
Example
result.behavioral_divergence 0.45 result.steps_removed 2 result.causal_effect 0.62
InterventionType¶
Types of counterfactual interventions that can be performed.
InterventionType
¶
Bases: str, Enum
Types of counterfactual interventions.
Each intervention type represents a different way of modifying a reasoning chain to test the causal importance of specific components.
Attributes:
| Name | Type | Description |
|---|---|---|
REMOVE_EVALUATION_AWARENESS |
Remove steps showing evaluation awareness. |
|
AMPLIFY_EVALUATION_AWARENESS |
Inject or amplify evaluation awareness. |
|
REMOVE_GOAL_REASONING |
Remove goal-oriented reasoning steps. |
|
REMOVE_UNCERTAINTY |
Remove uncertainty expressions. |
|
NEUTRALIZE_INCENTIVES |
Neutralize incentive-related language. |
|
REMOVE_META_REASONING |
Remove meta-reasoning about the thinking process. |
Evaluation Awareness Detection¶
Tools for detecting when AI models show awareness of being evaluated and strategic behavior adaptation.
EvaluationAwarenessDetector¶
Detect evaluation-aware reasoning in chains.
EvaluationAwarenessDetector
¶
Detect evaluation-aware reasoning in chains.
This detector analyzes reasoning chains to identify instances where an AI model shows awareness of being evaluated, tested, or assessed. It looks for both explicit mentions and more subtle strategic adaptations.
Attributes:
| Name | Type | Description |
|---|---|---|
AWARENESS_INDICATORS |
List[str]
|
Phrases indicating evaluation awareness. |
STRATEGIC_INDICATORS |
List[str]
|
Phrases indicating strategic behavior adaptation. |
Example
detector = EvaluationAwarenessDetector() chain = parser.parse("Let me think... I notice this is a test...") analysis = detector.detect(chain) if analysis.awareness_score > 0.5: ... print("High evaluation awareness detected")
__init__()
¶
Initialize the evaluation awareness detector.
detect(chain)
¶
Analyze chain for evaluation awareness.
Performs comprehensive analysis of a reasoning chain to detect evaluation awareness and strategic adaptation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
AwarenessAnalysis
|
AwarenessAnalysis containing all detected indicators and scores. |
Example
analysis = detector.detect(chain) print(f"Awareness: {analysis.awareness_score:.2f}") print(f"Strategic adaptation: {analysis.strategic_adaptation_score:.2f}")
score_awareness(chain)
¶
Calculate awareness score (0-1).
Computes an overall awareness score based on the presence and frequency of awareness indicators in the reasoning chain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to score. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Awareness score between 0 (no awareness) and 1 (high awareness). |
Example
score = detector.score_awareness(chain) if score > 0.7: ... print("High evaluation awareness")
find_awareness_indicators(chain)
¶
Find specific awareness indicators in chain.
Extracts all detected awareness indicators with their locations and confidence scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
List[AwarenessIndicator]
|
List of AwarenessIndicator objects. |
Example
indicators = detector.find_awareness_indicators(chain) for ind in indicators: ... print(f"Found '{ind.text}' in step {ind.step_index}")
detect_strategic_adaptation(chain)
¶
Detect strategic behavior adaptation (0-1).
Measures the degree to which the reasoning shows strategic adaptation based on awareness of the evaluation context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Strategic adaptation score between 0 and 1. |
Example
strategic_score = detector.detect_strategic_adaptation(chain) if strategic_score > 0.6: ... print("Significant strategic adaptation detected")
AwarenessIndicator¶
A detected indicator of evaluation awareness.
AwarenessIndicator
dataclass
¶
A detected indicator of evaluation awareness.
Represents a specific phrase or pattern that suggests the model is aware of being evaluated.
Attributes:
| Name | Type | Description |
|---|---|---|
text |
str
|
The matched text that indicates awareness. |
indicator_type |
str
|
Category of awareness indicator. |
step_index |
int
|
Index of the reasoning step containing this indicator. |
confidence |
float
|
Confidence that this is a genuine awareness indicator (0-1). |
Example
indicator = AwarenessIndicator( ... text="this is a test", ... indicator_type="explicit_test", ... step_index=2, ... confidence=0.95, ... )
StrategicAdaptation¶
Evidence of strategic behavior adaptation based on evaluation awareness.
StrategicAdaptation
dataclass
¶
Evidence of strategic behavior adaptation.
Represents a detected instance where the model appears to be adapting its behavior strategically based on evaluation awareness.
Attributes:
| Name | Type | Description |
|---|---|---|
description |
str
|
Description of the strategic adaptation. |
evidence_text |
str
|
The text providing evidence of adaptation. |
step_index |
int
|
Index of the step showing adaptation. |
strength |
float
|
Strength of the adaptation evidence (0-1). |
Example
adaptation = StrategicAdaptation( ... description="Model explicitly states intention to demonstrate capability", ... evidence_text="I want to show that I can handle this correctly", ... step_index=3, ... strength=0.75, ... )
AwarenessAnalysis¶
Complete analysis of evaluation awareness in a reasoning chain.
AwarenessAnalysis
dataclass
¶
Complete analysis of evaluation awareness in a reasoning chain.
Contains all detected indicators of awareness and strategic adaptation, along with summary scores.
Attributes:
| Name | Type | Description |
|---|---|---|
chain |
ReasoningChain
|
The analyzed reasoning chain. |
awareness_score |
float
|
Overall awareness score (0-1). |
strategic_adaptation_score |
float
|
Degree of strategic adaptation (0-1). |
indicators |
List[AwarenessIndicator]
|
List of detected awareness indicators. |
adaptations |
List[StrategicAdaptation]
|
List of detected strategic adaptations. |
aware_steps |
List[int]
|
Indices of steps showing awareness. |
summary |
str
|
Human-readable summary of the analysis. |
Example
analysis.awareness_score 0.65 len(analysis.indicators) 3 analysis.aware_steps [2, 5, 8]
Reasoning Quality Assessment¶
Tools for assessing the quality of reasoning across multiple dimensions.
ReasoningQualityAssessor¶
Assess quality of reasoning chains across multiple dimensions.
ReasoningQualityAssessor
¶
Assess quality of reasoning chains.
This assessor evaluates reasoning chains across multiple quality dimensions to identify potential issues and provide improvement suggestions.
Example
assessor = ReasoningQualityAssessor() metrics = assessor.assess(chain) if metrics.overall_score < 0.6: ... print("Low quality reasoning detected") ... for issue in metrics.issues: ... print(f" - {issue}")
__init__(weights=None)
¶
Initialize the quality assessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weights
|
Optional[Dict[str, float]]
|
Optional custom weights for quality dimensions. Keys should match DIMENSION_WEIGHTS keys. |
None
|
assess(chain)
¶
Comprehensive quality assessment.
Evaluates the reasoning chain across all quality dimensions and returns detailed metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to assess. |
required |
Returns:
| Type | Description |
|---|---|
QualityMetrics
|
QualityMetrics with scores for each dimension. |
Example
metrics = assessor.assess(chain) print(f"Clarity: {metrics.clarity:.2f}") print(f"Logical validity: {metrics.logical_validity:.2f}")
assess_clarity(chain)
¶
Assess how clear the reasoning is (0-1).
Evaluates clarity based on: - Sentence length (moderate is better) - Use of jargon and unclear terms - Logical structure and organization
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to assess. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Clarity score between 0 and 1. |
Example
clarity = assessor.assess_clarity(chain) if clarity < 0.5: ... print("Reasoning needs clarity improvements")
assess_completeness(chain)
¶
Assess if reasoning is complete (0-1).
Checks whether: - The reasoning has a clear conclusion - Steps connect logically - There are no apparent gaps
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to assess. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Completeness score between 0 and 1. |
Example
completeness = assessor.assess_completeness(chain) if completeness < 0.6: ... print("Reasoning may be incomplete")
assess_consistency(chain)
¶
Check for contradictions (0-1).
Analyzes the chain for internal contradictions where one step contradicts another.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to assess. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Consistency score between 0 (many contradictions) and 1 (consistent). |
Example
consistency = assessor.assess_consistency(chain) if consistency < 0.8: ... print("Potential contradictions detected")
assess_logical_validity(chain)
¶
Assess logical soundness (0-1).
Evaluates whether: - Causal links are valid - Inferences follow from premises - Logical connectors are used appropriately
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to assess. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Logical validity score between 0 and 1. |
Example
validity = assessor.assess_logical_validity(chain) if validity > 0.8: ... print("Strong logical structure")
assess_evidence_support(chain)
¶
Check if claims are supported (0-1).
Evaluates the degree to which claims in the reasoning are supported by evidence, examples, or citations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to assess. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Evidence support score between 0 and 1. |
Example
evidence = assessor.assess_evidence_support(chain) if evidence < 0.3: ... print("Claims need more supporting evidence")
identify_issues(chain)
¶
Identify specific quality issues.
Generates a list of human-readable issue descriptions found in the reasoning chain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of issue description strings. |
Example
issues = assessor.identify_issues(chain) for issue in issues: ... print(f"- {issue}")
suggest_improvements(chain, metrics)
¶
Suggest how to improve reasoning quality.
Based on the quality metrics, provides actionable suggestions for improving the reasoning chain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The original reasoning chain. |
required |
metrics
|
QualityMetrics
|
Quality metrics from assess(). |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of improvement suggestion strings. |
Example
suggestions = assessor.suggest_improvements(chain, metrics) for suggestion in suggestions: ... print(f"- {suggestion}")
QualityMetrics¶
Quality metrics for a reasoning chain.
QualityMetrics
dataclass
¶
Quality metrics for a reasoning chain.
Contains scores for multiple quality dimensions along with an overall quality score.
Attributes:
| Name | Type | Description |
|---|---|---|
clarity |
float
|
How clear and understandable the reasoning is (0-1). |
completeness |
float
|
Whether the reasoning covers all necessary aspects (0-1). |
consistency |
float
|
Absence of contradictions (0-1). |
logical_validity |
float
|
Soundness of logical inferences (0-1). |
evidence_support |
float
|
Degree to which claims are supported (0-1). |
overall_score |
float
|
Weighted combination of all metrics (0-1). |
issues |
List[str]
|
List of identified quality issues. |
step_scores |
Dict[int, float]
|
Per-step quality scores. |
Example
metrics.clarity 0.85 metrics.overall_score 0.78 len(metrics.issues) 2
QualityIssue¶
A specific quality issue identified in reasoning.
QualityIssue
dataclass
¶
A specific quality issue identified in reasoning.
Attributes:
| Name | Type | Description |
|---|---|---|
category |
str
|
The category of the issue (e.g., "clarity", "logic"). |
description |
str
|
Description of the issue. |
step_index |
Optional[int]
|
Index of the step with the issue (if applicable). |
severity |
str
|
Severity of the issue ("low", "medium", "high"). |
suggestion |
str
|
Suggested improvement. |
Example
issue = QualityIssue( ... category="logic", ... description="Non-sequitur: conclusion does not follow from premises", ... step_index=5, ... severity="high", ... suggestion="Provide intermediate reasoning steps", ... )
Causal Analysis¶
Tools for analyzing the causal structure of reasoning chains and identifying critical steps.
CausalAnalyzer¶
Analyze causal importance of reasoning components.
CausalAnalyzer
¶
Analyze causal importance of reasoning components.
This analyzer examines reasoning chains to determine: - Which steps are most important to the final conclusion - How steps depend on each other - Which steps are the primary causal drivers
Example
analyzer = CausalAnalyzer() importance = analyzer.analyze_step_importance(chain) drivers = analyzer.find_causal_drivers(chain) graph = analyzer.build_dependency_graph(chain)
__init__()
¶
Initialize the causal analyzer.
analyze(chain)
¶
Perform comprehensive causal analysis on a reasoning chain.
Analyzes the chain to identify step importance, causal drivers, dependencies, and the conclusion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
CausalAnalysisResult
|
CausalAnalysisResult with all analysis results. |
Example
result = analyzer.analyze(chain) print(f"Found {len(result.causal_drivers)} causal drivers")
analyze_step_importance(chain)
¶
Rank importance of each step (index -> importance score).
Calculates an importance score for each step based on: - Position in the chain - Causal language usage - Whether other steps reference it - Reasoning type
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
Dict[int, float]
|
Dictionary mapping step index to importance score (0-1). |
Example
importance = analyzer.analyze_step_importance(chain) most_important = max(importance.items(), key=lambda x: x[1]) print(f"Most important: step {most_important[0]}")
find_causal_drivers(chain)
¶
Find steps that are causal drivers of the conclusion.
Identifies steps that are critical to reaching the final conclusion, based on their position in the causal dependency structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
List[ReasoningStep]
|
List of ReasoningStep objects that are causal drivers. |
Example
drivers = analyzer.find_causal_drivers(chain) for driver in drivers: ... print(f"Driver step {driver.index}: {driver.text[:50]}...")
build_dependency_graph(chain)
¶
Build graph of which steps depend on which.
Creates a directed graph where edges point from a step to the steps it depends on (references or uses).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
Dict[int, List[int]]
|
Dictionary mapping step index to list of dependency indices. |
Example
graph = analyzer.build_dependency_graph(chain) print(f"Step 3 depends on steps: {graph.get(3, [])}")
identify_conclusion(chain)
¶
Identify the conclusion/decision step.
Finds the step that represents the final conclusion or decision in the reasoning chain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain to analyze. |
required |
Returns:
| Type | Description |
|---|---|
Optional[ReasoningStep]
|
The conclusion step if found, None otherwise. |
Example
conclusion = analyzer.identify_conclusion(chain) if conclusion: ... print(f"Conclusion: {conclusion.text}")
compute_causal_path(chain, start_index, end_index)
¶
Compute the causal path between two steps.
Finds the sequence of steps that causally connect the start step to the end step, if such a path exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chain
|
ReasoningChain
|
The reasoning chain. |
required |
start_index
|
int
|
Index of the starting step. |
required |
end_index
|
int
|
Index of the target step. |
required |
Returns:
| Type | Description |
|---|---|
List[int]
|
List of step indices forming the causal path, or empty if no path. |
Example
path = analyzer.compute_causal_path(chain, 0, 5) print(f"Causal path: {' -> '.join(str(i) for i in path)}")
CausalRelation¶
A causal relationship between two reasoning steps.
CausalRelation
dataclass
¶
A causal relationship between two reasoning steps.
Represents a directed causal link where one step (cause) influences another step (effect).
Attributes:
| Name | Type | Description |
|---|---|---|
cause_index |
int
|
Index of the causing step. |
effect_index |
int
|
Index of the affected step. |
strength |
float
|
Strength of the causal relationship (0-1). |
relation_type |
str
|
Type of causal relation (e.g., "logical", "temporal"). |
Example
relation = CausalRelation( ... cause_index=2, ... effect_index=5, ... strength=0.8, ... relation_type="logical", ... )
CausalAnalysisResult¶
Complete causal analysis of a reasoning chain.
CausalAnalysisResult
dataclass
¶
Complete causal analysis of a reasoning chain.
Contains all causal relationships, importance scores, and the dependency graph for a reasoning chain.
Attributes:
| Name | Type | Description |
|---|---|---|
chain |
ReasoningChain
|
The analyzed reasoning chain. |
step_importance |
Dict[int, float]
|
Importance score for each step (index -> score). |
causal_drivers |
List[ReasoningStep]
|
Steps that are primary drivers of the conclusion. |
dependency_graph |
Dict[int, List[int]]
|
Graph of step dependencies (step -> dependencies). |
conclusion_step |
Optional[ReasoningStep]
|
The identified conclusion step, if any. |
causal_relations |
List[CausalRelation]
|
All identified causal relations. |
Example
result.step_importance[3] 0.85 len(result.causal_drivers) 2 result.conclusion_step.text "Therefore, I conclude..."