rotalabs-redqueen¶

Quality-diversity evolutionary red-teaming for LLMs and agents.

Overview¶

rotalabs-redqueen evolves diverse, effective adversarial attacks against language models and agents, and maps the vulnerability space with MAP-Elites. It operates at the semantic level and spans the full attack surface:

Single-turn prompt attacks (strategies, encodings, personas)
Multi-turn Crescendo-style escalation
Agentic / tool-use / MCP multi-step exploit plans

Seeded runs are bit-reproducible (and cross-language portable), and a campaign can be projected into an audit-ready compliance report (OWASP, MITRE ATLAS, EU AI Act Art. 55, NIST AI RMF).

Key features¶

Quality-diversity evolution — MAP-Elites + novelty search over a behavior space
Three attack surfaces — one engine, swappable genome (LLMAttackGenome, MultiTurnGenome, AgenticGenome)
Reproducible — canonical seedable PRNG; same seed → same archive, conformance-gated
Persistent — archives save/load and seed the next run (continuous red-teaming)
Compliance — project the archive over the attack taxonomy into standards-aligned evidence
Multi-provider — OpenAI, Anthropic, Gemini, Ollama, Mock

Installation¶

pip install rotalabs-redqueen           # core + mock target
pip install rotalabs-redqueen[llm]      # all providers
pip install rotalabs-redqueen[dev]      # tests/lint

Quick start¶

import asyncio
from rotalabs_redqueen import (
    LLMAttackGenome, JailbreakFitness, MockTarget, HeuristicJudge,
    MapElitesArchive, BehaviorDimension, AttackStrategy, Encoding, evolve,
)

async def main():
    fitness = JailbreakFitness(MockTarget(), HeuristicJudge())
    archive = MapElitesArchive(dimensions=[
        BehaviorDimension("strategy", 0.0, 1.0, len(AttackStrategy)),
        BehaviorDimension("encoding", 0.0, 1.0, len(Encoding)),
        BehaviorDimension("has_persona", 0.0, 1.0, 2),
    ])
    result = await evolve(
        genome_class=LLMAttackGenome,
        fitness=fitness,
        generations=50,
        population_size=20,
        seed=1234,            # reproducible
        archive=archive,
        progress=False,
    )
    cov = result.archive.coverage()
    print(f"coverage: {cov.coverage_percent:.1f}%  best: {result.best.fitness.value:.3f}")

asyncio.run(main())

Swap genome_class for MultiTurnGenome or AgenticGenome to evolve multi-turn or agentic attacks with the same engine. See Getting Started.

Core concepts¶

Concept	Description
Genome	An evolvable attack; its phenotype is a `Stimulus` (single-turn / multi-turn / agentic)
Target	Executes a `Stimulus`, returns a `Transcript`
Judge	Scores a `(Stimulus, Transcript)` — did the attack succeed?
Fitness	Composes target + judge into a score
Archive	MAP-Elites grid of diverse elite attacks; persists across runs
Report	Projects the archive over the taxonomy into compliance evidence