Fitness¶
Jailbreak fitness functions.
JailbreakFitness¶
Bases: Fitness[LLMAttackGenome]
Fitness function for jailbreak attacks.
Evaluates how successful an attack genome is at bypassing the target LLM's safety measures.
Source code in src/rotalabs_redqueen/llm/fitness.py
__init__(target: LLMTarget, judge: Judge | None = None, max_retries: int = 3, retry_delay: float = 1.0)
¶
Initialize jailbreak fitness.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
LLMTarget
|
LLM target to attack |
required |
judge
|
Judge | None
|
Judge for evaluating responses (default: HeuristicJudge) |
None
|
max_retries
|
int
|
Maximum retry attempts on failure |
3
|
retry_delay
|
float
|
Delay between retries in seconds |
1.0
|
Source code in src/rotalabs_redqueen/llm/fitness.py
evaluate(genome: LLMAttackGenome) -> FitnessResult
async
¶
Evaluate a single attack genome.
Source code in src/rotalabs_redqueen/llm/fitness.py
MultiTargetFitness¶
Bases: Fitness[LLMAttackGenome]
Fitness that tests against multiple targets.
Useful for finding universal attacks that work across different LLM providers/models.
Source code in src/rotalabs_redqueen/llm/fitness.py
__init__(targets: list[LLMTarget], judge: Judge | None = None, aggregation: str = 'mean')
¶
Initialize multi-target fitness.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
targets
|
list[LLMTarget]
|
List of LLM targets to test |
required |
judge
|
Judge | None
|
Judge for evaluating responses |
None
|
aggregation
|
str
|
How to aggregate scores ("mean", "min", "max") |
'mean'
|
Source code in src/rotalabs_redqueen/llm/fitness.py
evaluate(genome: LLMAttackGenome) -> FitnessResult
async
¶
Evaluate genome against all targets.