Skip to content

Detectors

Detection modules for identifying metacognitive behaviors in AI systems.

Available Detectors

Detector Description
SandbaggingDetector Detect context-dependent underperformance
SituationalAwarenessDetector Probe for evaluation awareness
ObserverEffectMonitor Monitor behavioral changes when observed

Base Classes

Base detector class for metacognition pattern detection.

BaseDetector

Bases: ABC

Abstract base class for all detectors.

All detector implementations should inherit from this class and implement the detect method.

Source code in src/rotalabs_probe/detectors/base.py
class BaseDetector(ABC):
    """Abstract base class for all detectors.

    All detector implementations should inherit from this class and implement
    the detect method.
    """

    def __init__(self) -> None:
        """Initialize the detector."""
        self.name: str = self.__class__.__name__

    @abstractmethod
    def detect(self, text: str) -> Dict[str, Any]:
        """Detect metacognitive patterns in the given text.

        Args:
            text: The input text to analyze

        Returns:
            A dictionary containing detection results with keys:
                - detected: bool indicating if pattern was found
                - confidence: float between 0 and 1
                - details: additional information about the detection

        Raises:
            NotImplementedError: If the method is not implemented
        """
        raise NotImplementedError("Subclasses must implement the detect method")

    def __repr__(self) -> str:
        """Return string representation of the detector.

        Returns:
            String representation
        """
        return f"{self.__class__.__name__}()"

__init__() -> None

Initialize the detector.

Source code in src/rotalabs_probe/detectors/base.py
def __init__(self) -> None:
    """Initialize the detector."""
    self.name: str = self.__class__.__name__

detect(text: str) -> Dict[str, Any] abstractmethod

Detect metacognitive patterns in the given text.

Parameters:

Name Type Description Default
text str

The input text to analyze

required

Returns:

Type Description
Dict[str, Any]

A dictionary containing detection results with keys: - detected: bool indicating if pattern was found - confidence: float between 0 and 1 - details: additional information about the detection

Raises:

Type Description
NotImplementedError

If the method is not implemented

Source code in src/rotalabs_probe/detectors/base.py
@abstractmethod
def detect(self, text: str) -> Dict[str, Any]:
    """Detect metacognitive patterns in the given text.

    Args:
        text: The input text to analyze

    Returns:
        A dictionary containing detection results with keys:
            - detected: bool indicating if pattern was found
            - confidence: float between 0 and 1
            - details: additional information about the detection

    Raises:
        NotImplementedError: If the method is not implemented
    """
    raise NotImplementedError("Subclasses must implement the detect method")

__repr__() -> str

Return string representation of the detector.

Returns:

Type Description
str

String representation

Source code in src/rotalabs_probe/detectors/base.py
def __repr__(self) -> str:
    """Return string representation of the detector.

    Returns:
        String representation
    """
    return f"{self.__class__.__name__}()"