Prompt Injection

Donato Capitella
28 Jan 2025

Spikee: Testing LLM Applications for Prompt Injection

A step-by-step guide using the open-source tool spikee (v0.2) for prompt injection testing in LLM applications. Explores a webmail summarization case study, covering custom dataset creation, testing with Burp Suite and spikee's custom targets, interpreting results, and noting key updates from v0.1 to v0.2 like the Judge system and dynamic attacks.

Donato Capitella
6 Dec 2024

Multi-Chain Prompt Injection Attacks

Multi-chain prompt injection is a novel attack technique targeting complex LLM applications with multiple chained language models. The technique exploits interactions between LLM chains to bypass safeguards and propagate malicious content through entire systems. A sample workout planner application demonstrates how attackers can manipulate multi-chain LLM workflows to inject and propagate adversarial prompts across different processing stages.

Donato Capitella Lily Bradshaw
21 Oct 2024

Fine-Tuning LLMs to Resist Indirect Prompt Injection Attacks

A fine-tuning approach was developed to enhance Llama3-8B's resistance to indirect prompt injection attacks. The method uses data delimiters in the system prompt to help the model ignore malicious instructions within user-provided content. The fine-tuned model achieved a 100% pass rate in resisting tested prompt injection attacks. The model and training scripts have been publicly released.

Donato Capitella
4 Jun 2024

When your AI Assistant has an evil twin

An indirect prompt injection attack against Google Gemini Advanced demonstrates how malicious emails can manipulate the AI assistant into displaying social engineering messages. The attack tricks users into revealing confidential information by exploiting Gemini's email summarization capabilities. The vulnerability highlights potential security risks in AI assistants with data access capabilities.

Benjamin Hull Donato Capitella
8 Apr 2024

Domain-specific prompt injection detection

A domain-specific machine learning approach was developed to detect prompt injection attacks in job application contexts using a fine-tuned DistilBERT classifier. The model was trained on a custom dataset of job applications and prompt injection examples, achieving approximately 80% accuracy in identifying potential injection attempts. The research highlights the challenges of detecting prompt injection in large language models and emphasizes that such detection methods are just one part of a comprehensive security strategy.

Donato Capitella
21 Feb 2024

Should you let ChatGPT control your browser?

This article explores the security risks of granting Large Language Models (LLMs) control over web browsers. Two attack scenarios demonstrate how prompt injection vulnerabilities can be exploited to hijack browser agents and perform malicious actions. The article highlights critical security challenges in LLM-driven browser automation and proposes potential defense strategies.

Donato Capitella
2 Nov 2023

Synthetic Recollections

The article explores prompt injection techniques that can manipulate LLM agents with multi-chain reasoning systems. Two primary attack vectors are presented: thought/observation injection and thought-only injection. These attacks can potentially compromise the integrity of LLM-powered agents by tricking them into performing unintended actions through carefully crafted prompts.