ReversecLabs/spikee

What is Spikee?

We conduct many security assessments for applications implementing Generative AI features. These aren’t just chatbots, but features integrated into products for summarization, resource tagging, and decision making to power agentic workflows. A primary focus of these assessments has been evaluating the risk of prompt injection.

Unlike generic jailbreak attacks that focus on bypassing a model’s safety alignment, prompt injection often involves exploiting the interaction between LLMs and the applications that use them. The goal is frequently to carry out practical attacks, such as data exfiltration, execution of malicious commands, or causing operational disruptions.

To systematically test for these risks across different applications and models, we developed spikee (https://spikee.ai), a modular and extensible toolkit for prompt injection testing, which we have released as an open-source project.

Key Use Cases

Spikee is designed to be a flexible toolkit that adapts to your specific testing needs. Here are some of the core scenarios it supports.

spikee-usecases

1. Testing a Standalone LLM

Before you even build an application, you might need to understand the baseline resilience of a standalone LLM. Spikee can help you answer questions like:

How does gpt-4o-mini respond to common jailbreaks compared to claude-3-haiku?
Is a model more susceptible to prompt injections at the start or end of its context window?
Do certain prompt engineering techniques such as spotlighting reduce the attack surface of a certain model?
Does a specific obfuscation technique, like Base64 encoding, bypass the model’s safety filters?

For this scenario, you would use Spikee to generate full prompts, including system messages, and send them directly to the model’s API via a target.

2. Testing an LLM Application

This is the primary use case for Spikee. Most real-world risks emerge not from the model in isolation, but from how it is integrated into a larger application. Consider a feature that summarizes emails: the application retrieves emails, inserts them into a prompt template, and sends the result to an LLM.

To test this, you don’t want to generate the whole prompt; you only need to generate the malicious input (the email body containing the injection).

With Spikee, you can:

Use --format document to generate only the malicious document content.
Write a custom Target script that takes this document and sends it to your application’s API endpoint (e.g., /api/summarize).
Assess if the final output from your application (e.g., the summary) shows that the injection was successful.

This allows you to test the security of the entire application pipeline, not just the model. Check this tutorial for an exmaple:

https://labs.reversec.com/posts/2025/01/spikee-testing-llm-applications-for-prompt-injection

3. Evaluating a Guardrail

Whether you are using a commercial guardrail product or a custom-built one, you need to measure its effectiveness. Spikee helps you do this systematically by assessing two key questions:

Does it block what it’s supposed to? (Efficacy)
Does it allow what it’s supposed to? (False Positives)

The workflow involves creating a Guardrail Target that returns True if a prompt is allowed and False if it is blocked. You then run two tests: one with a dataset of attacks and one with a dataset of benign, legitimate prompts. The spikee results analyze command can then combine these results to calculate performance metrics like Precision and Recall, giving you a clear picture of the guardrail’s real-world performance.

A Modular Toolkit for Security Testing

Spikee’s flexibility comes from its core components, which are simple Python scripts you can easily create or modify:

Targets: The bridge between Spikee and the system you are testing. A target can connect to an LLM API, your application’s endpoint, or a guardrail.
Plugins & Dynamic Attacks: Plugins apply pre-test transformations to create a broad dataset of known variations. Dynamic Attacks apply real-time, iterative transformations during a test to find a single successful bypass.
Judges: A judge defines what a “successful” attack looks like, from a simple keyword match to a complex, LLM-based evaluation of the response’s content.

Get Started

To get started with Spikee, visit our GitHub repository. The README.md contains a full installation guide and a practical walkthrough of the core workflow.

Spikee on GitHub

The repository also includes detailed documentation for each module, helping you build your own custom components and tailor Spikee to your exact testing needs.

#Others