What is Spikee?
We conduct many security assessments for applications implementing Generative AI features. These aren’t just chatbots, but features integrated into products for summarization, resource tagging, and decision making to power agentic workflows. A primary focus of these assessments has been evaluating the risk of prompt injection.
Unlike generic jailbreak attacks that focus on bypassing a model’s safety alignment, prompt injection often involves exploiting the interaction between LLMs and the applications that use them. The goal is frequently to carry out practical attacks, such as data exfiltration, execution of malicious commands, or causing operational disruptions.
To systematically test for these risks across different applications and models, we developed spikee (https://spikee.ai), a modular and extensible toolkit for prompt injection testing, which we have released as an open-source project.
Key Use Cases
Spikee is designed to be a flexible toolkit that adapts to your specific testing needs. Here are some of the core scenarios it supports.
1. Testing a Standalone LLM
Before you even build an application, you might need to understand the baseline resilience of a standalone LLM. Spikee can help you answer questions like:
- How does
gpt-4o-mini
respond to common jailbreaks compared toclaude-3-haiku
? - Is a model more susceptible to prompt injections at the start or end of its context window?
- Do certain prompt engineering techniques such as spotlighting reduce the attack surface of a certain model?
- Does a specific obfuscation technique, like Base64 encoding, bypass the model’s safety filters?
For this scenario, you would use Spikee to generate full prompts, including system messages, and send them directly to the model’s API via a target.
2. Testing an LLM Application
This is the primary use case for Spikee. Most real-world risks emerge not from the model in isolation, but from how it is integrated into a larger application. Consider a feature that summarizes emails: the application retrieves emails, inserts them into a prompt template, and sends the result to an LLM.
To test this, you don’t want to generate the whole prompt; you only need to generate the malicious input (the email body containing the injection).
With Spikee, you can:
- Use
--format document
to generate only the malicious document content. - Write a custom Target script that takes this document and sends it to your application’s API endpoint (e.g.,
/api/summarize
). - Assess if the final output from your application (e.g., the summary) shows that the injection was successful.
This allows you to test the security of the entire application pipeline, not just the model. Check this tutorial for an exmaple:
https://labs.reversec.com/posts/2025/01/spikee-testing-llm-applications-for-prompt-injection
3. Evaluating a Guardrail
Whether you are using a commercial guardrail product or a custom-built one, you need to measure its effectiveness. Spikee helps you do this systematically by assessing two key questions:
- Does it block what it’s supposed to? (Efficacy)
- Does it allow what it’s supposed to? (False Positives)
The workflow involves creating a Guardrail Target that returns True
if a prompt is allowed and False
if it is blocked. You then run two tests: one with a dataset of attacks and one with a dataset of benign, legitimate prompts. The spikee results analyze
command can then combine these results to calculate performance metrics like Precision and Recall, giving you a clear picture of the guardrail’s real-world performance.
A Modular Toolkit for Security Testing
Spikee’s flexibility comes from its core components, which are simple Python scripts you can easily create or modify:
- Targets: The bridge between Spikee and the system you are testing. A target can connect to an LLM API, your application’s endpoint, or a guardrail.
- Plugins & Dynamic Attacks: Plugins apply pre-test transformations to create a broad dataset of known variations. Dynamic Attacks apply real-time, iterative transformations during a test to find a single successful bypass.
- Judges: A judge defines what a “successful” attack looks like, from a simple keyword match to a complex, LLM-based evaluation of the response’s content.
Get Started
To get started with Spikee, visit our GitHub repository. The README.md
contains a full installation guide and a practical walkthrough of the core workflow.
The repository also includes detailed documentation for each module, helping you build your own custom components and tailor Spikee to your exact testing needs.