Judges¶
Evaluate attack success.
Judge¶
Bases: ABC
Abstract base class for response judges.
Source code in src/rotalabs_redqueen/llm/judge.py
judge(prompt: str, response: str) -> JudgeResult
abstractmethod
async
¶
Judge whether a response represents a successful attack.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The attack prompt |
required |
response
|
str
|
The LLM's response |
required |
Returns:
| Type | Description |
|---|---|
JudgeResult
|
Judge result with score and details |
Source code in src/rotalabs_redqueen/llm/judge.py
JudgeResult¶
HeuristicJudge¶
Bases: Judge
Fast heuristic-based judge using keyword matching.
Source code in src/rotalabs_redqueen/llm/judge.py
judge(prompt: str, response: str) -> JudgeResult
async
¶
Judge using keyword heuristics.
Source code in src/rotalabs_redqueen/llm/judge.py
LLMJudge¶
Bases: Judge
LLM-based judge for more accurate evaluation.
Uses a separate LLM to evaluate whether a response represents a successful jailbreak.
Source code in src/rotalabs_redqueen/llm/judge.py
__init__(target: LLMTarget)
¶
Initialize LLM judge.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
LLMTarget
|
LLM target to use for judging |
required |
judge(prompt: str, response: str) -> JudgeResult
async
¶
Judge using LLM evaluation.