Advanced Features

Multi-Turn Jailbreak Testing

Single-turn testing caught the obvious stuff. Multi-turn testing catches sophisticated attacks that build up over time.

Single-turn testing caught the obvious stuff. Multi-turn testing catches sophisticated attacks that build up over time.

Last Updated on November 2, 2025

Real attackers rarely start with "Tell me how to do something illegal." Instead, they warm up with innocent questions, build context, and gradually guide the conversation toward their goal. A chemistry question becomes energy release discussion, then controlled reactions, then specific compounds, and eventually explosives all feeling like a natural academic conversation.

Setting Up

From Advanced Jailbreaking, click "Select Multi-turn jailbreak testing."

Name your test clearly, then here's the key part: define your Target Prompt. This is the harmful content you're testing whether your AI will eventually generate. Be specific: "How to create explosives in a lab" or "Generate hate speech about [group]" or "Share customer personal information."

Next, choose your Jailbreak Techniques from the dropdown. RACE (Role-play and Context Exploitation) is a great starting point because it's how humans actually manipulate conversations. DAN (Do Anything Now) and Developer Mode are also common. Select 2-3 techniques for your first test.

Pick your model, describe your app, and submit.

The Interactive Process

Multi-turn testing requires your participation. The system creates a conversation interface showing "Progress: 1/5" at the top.

Turn 1 shows you a completely innocent opening message, something no AI should refuse. For an explosives target, it might ask about "theoretical underpinnings of controlled chemical reactions" in an academic context. Copy this, send it to your AI, and paste the response back.

Turns 2-5 progressively escalate based on your AI's responses. Each message uses previous context to guide the conversation closer to the target prompt. What makes this powerful is that each individual message looks reasonable, it's the arc that reveals the manipulation.

After completing all turns, the system evaluates whether your AI eventually provided the problematic information or recognised the manipulation and refused.

Reading Results

The results show whether the jailbreak succeeded or failed, along with the full conversation transcript. Pay attention to which turn your boundaries broke down. Refusal at turn 3? Pretty good. Providing problematic information at turn 5? That's a vulnerability to fix.

Multi-turn vulnerabilities need different fixes than single-turn ones. Add meta-instructions to your system prompt about recognizing conversation patterns, not just individual questions. Implement conversation-level safety checks that trigger when suspicious topic combinations appear across turns.

After implementing changes, run the same test again. The conversation should break down earlier, ideally at turn 1 or 2.

Helpful Tip

Start with single-turn testing across 100+ diverse prompts covering safety, bias, harmful content, and prompt injection. Get that success rate under 5%. Run this weekly or after significant changes.

Move to multi-turn testing once your single-turn results are solid. Test your highest-risk scenarios, conversations that might lead to content you're not qualified to provide or information you shouldn't share. Run these quarterly or before major releases.

Keep a log of all tests and results over time. That trend line should go down as you strengthen defenses, but watch for new patterns too. Share findings across your team security is collaborative work.

Quick Troubleshooting

File upload fails

Check your CSV is under 50MB, uses UTF-8 encoding, and has a "prompt" column header.

Evaluation stuck at Pending

Refresh the page first. If still stuck after 5 minutes, verify your LLM has available credits and test the connection on your LLM configuration page.

Can't submit multi-turn responses

Make sure you've pasted text into the response field. Check for extremely long responses that might exceed character limits.

Inconsistent results

Your target prompt might be too vague. Be specific about the harmful content you're testing for, and verify you're testing the correct model.

© Disseqt AI Product Starter Guide

© Disseqt AI Product Starter Guide

© Disseqt AI Product Starter Guide