Modern LLM security failures are not edge cases, they emerge from how these systems are trained, scaled, and deployed. The talk breaks down famous jailbreaks and real-world attacks to show how safety guardrails are bypassed in practice. It explores how agentic systems and MCP expand the attack surface beyond prompts into memory, tools, and orchestration. We will examine patterns behind emergent misalignment and why well behaved models can fail in unexpected ways. Finally, the session covers concrete engineering strategies to anticipate, detect, and mitigate these threats before they scale. About the speaker: Mohammed Arsalan is a generative AI consultant, prompt engineer, and AI safety researcher with over five years of experience at the intersection of LLM security, NLP, and adversarial AI. Based in Bengaluru, he is a multiple-time global hackathon winner across platforms like Hugging Face, Cohere, Adobe, and MachineHack, and a Microsoft AI Startup selectee. He actively shares research and real-world insights on emerging generative AI attacks with a large professional audience. www.linkedin.com/in/sallu-mandya/ To attend online: ⁠Add to calendar: https://bit.ly/4t73B1c Gmeet link: https://meet.google.com/uiq-zbyc-ubw?hs=122&authuser=0

Breaking LLM Safety: Jailbreaks, Agents, and Threats

Organizers

Quality Score