AI Security

Jordan Watson
22 Oct 2025

Evaluating LLM Input Comprehension and Guardrail Robustness through Noise-Based Attacks

We evaluate the ability of LLMs to understand text with random noise, and examine how prompts with varying levels of noise could bypass LLM guardrails.

Donato Capitella
14 Aug 2025

Design Patterns to Secure LLM Agents In Action

A practical walkthrough of six security design patterns for building resilient LLM agents. We explore how structural controls, not just model-level defenses, can mitigate prompt injection, and introduce a hands-on code repository to see these patterns in action.

Donato Capitella
28 Jan 2025

Spikee: Testing LLM Applications for Prompt Injection

A step-by-step guide using the open-source tool spikee (v0.2) for prompt injection testing in LLM applications. Explores a webmail summarization case study, covering custom dataset creation, testing with Burp Suite and spikee's custom targets, interpreting results, and noting key updates from v0.1 to v0.2 like the Judge system and dynamic attacks.

Donato Capitella
6 Dec 2024

Multi-Chain Prompt Injection Attacks

Multi-chain prompt injection is a novel attack technique targeting complex LLM applications with multiple chained language models. The technique exploits interactions between LLM chains to bypass safeguards and propagate malicious content through entire systems. A sample workout planner application demonstrates how attackers can manipulate multi-chain LLM workflows to inject and propagate adversarial prompts across different processing stages.

Lily Bradshaw Donato Capitella
21 Oct 2024

Fine-Tuning LLMs to Resist Indirect Prompt Injection Attacks

A fine-tuning approach was developed to enhance Llama3-8B's resistance to indirect prompt injection attacks. The method uses data delimiters in the system prompt to help the model ignore malicious instructions within user-provided content. The fine-tuned model achieved a 100% pass rate in resisting tested prompt injection attacks. The model and training scripts have been publicly released.

Donato Capitella
4 Jun 2024

When your AI Assistant has an evil twin

An indirect prompt injection attack against Google Gemini Advanced demonstrates how malicious emails can manipulate the AI assistant into displaying social engineering messages. The attack tricks users into revealing confidential information by exploiting Gemini's email summarization capabilities. The vulnerability highlights potential security risks in AI assistants with data access capabilities.

Tom Taylor-MacLean
22 May 2024

Generative AI - An Attacker's View

Generative AI is increasingly being used by threat actors for cyber attacks. Attackers can leverage AI for reconnaissance, gathering personal information quickly and creating targeted phishing emails. The technology enables sophisticated social engineering through deepfakes, voice cloning, and malicious code generation, with potential for more advanced attacks in the near future.

Benjamin Hull Donato Capitella
8 Apr 2024

Domain-specific prompt injection detection

A domain-specific machine learning approach was developed to detect prompt injection attacks in job application contexts using a fine-tuned DistilBERT classifier. The model was trained on a custom dataset of job applications and prompt injection examples, achieving approximately 80% accuracy in identifying potential injection attempts. The research highlights the challenges of detecting prompt injection in large language models and emphasizes that such detection methods are just one part of a comprehensive security strategy.

Donato Capitella
21 Feb 2024

Should you let ChatGPT control your browser?

This article explores the security risks of granting Large Language Models (LLMs) control over web browsers. Two attack scenarios demonstrate how prompt injection vulnerabilities can be exploited to hijack browser agents and perform malicious actions. The article highlights critical security challenges in LLM-driven browser automation and proposes potential defense strategies.

Donato Capitella
2 Nov 2023

Synthetic Recollections

The article explores prompt injection techniques that can manipulate LLM agents with multi-chain reasoning systems. Two primary attack vectors are presented: thought/observation injection and thought-only injection. These attacks can potentially compromise the integrity of LLM-powered agents by tricking them into performing unintended actions through carefully crafted prompts.

20 May 2020

Releasing the CAPTCHA Cracken

A tool called CAPTCHA Cracken was developed to bypass text-based CAPTCHAs on an Outlook Web App portal. Advanced image preprocessing techniques and browser automation with Pyppeteer were used to overcome significant CAPTCHA recognition challenges. The project demonstrated the vulnerability of traditional text-based CAPTCHAs to machine learning-based automated attacks.

Tinus Green
17 Jan 2019

CAPTCHA-22: Breaking Text-Based CAPTCHAs with Machine Learning

A machine learning technique was developed to break text-based CAPTCHAs using an Attention-based OCR model. By manually labeling training data from a large dataset of CAPTCHA images, near-perfect accuracy was achieved in solving various CAPTCHA implementations. The study demonstrated how machine learning can effectively bypass traditional text-based CAPTCHA systems with minimal computational resources.