Spikee: Testing for LLM Prompt Injection in Restrictive Environments

1. Background

At Reversec, we perform a large number of security assessments of LLM-based applications. In these engagements, we typically combine manual testing with automation using Spikee (https://spikee.ai), our open-source framework for prompt injection and jailbreak testing. When performing prompt-based attacks, one of the main challenges is determining whether an attack is successful. In some cases, this is trivial, such as with our (cyber security dataset) that tests risks like generating XSS payloads or data exfiltration via Markdown images. In these scenarios, regular expressions can be used to inspect the application output and determine whether an attack succeeded.

However, other attacks are not as straightforward to evaluate. Attacks that target topic control and aim to bypass safety policies or alignment, and to elicit harmful or otherwise unwanted content, require semantic evaluation of the model’s output. Whether an attack succeeded often depends on intent and context, which cannot be reliably determined using pattern matching alone.

To handle these cases, Spikee uses an LLM-as-judge pattern. The response produced by the target application is sent to a separate judge LLM, which evaluates whether the response meets the attack’s success criteria.

This works well as long as the test environment can reach a judge LLM. In practice, this is often not the case. Many assessments are performed in restricted pre-production environments with no internet access and limited network connectivity. External APIs are unreachable, and locally hosted LLMs are typically not accessible from the test system.

To deal with this constraint, we added an offline judge mode and a rejudging workflow to Spikee. Attacks can be executed with an offline judge that records target responses without evaluating success. Those results can then be exported from the restricted environment and re-judged later using one or more LLM judges. This article focuses on that workflow.

2. Testing using an Offline Judge

In our demo scenario, we’re testing within a restricted client environment where we do not have access to the internet or a local LLM. Once we’ve installed Spikee and created our custom dataset, we can launch an attack that requires an LLM judge using the --judge-options offline flag.

spikee test --dataset datasets/my-harmful-content-test.jsonl \
            --target target_llm_app \
            --judge-options offline

The offline option will instruct the LLM judge assigned to each injection to always returns False. This will allow a tester to perform the attack and collect the responses without needing to call an LLM judge. Responses will be stored as usual in the results folder in the Spikee workspace.

You can check whether an LLM judge supports offline mode by running spikee list judges

Judges (local)
├── llm_judge_harmful
│   ├── Example options: openai-gpt-4.1-mini (default), openai-gpt-4o, offline
│   └── Supported prefixes: openai-, google-, bedrock-, ollama-, llamaccp-server-, together-, mock-
├── llm_judge_output_criteria
│   ├── Example options: openai-gpt-4.1-mini (default), openai-gpt-4o, offline
│   └── Supported prefixes: openai-, google-, bedrock-, ollama-, llamaccp-server-, together-, mock-
└── regex

As can be seen, both built-in LLM judges support offline mode, and adding support for your own judges is trivial.

3. Rejudging

Once the results have been collected and transferred to a less restrictive environment, with either access to the internet or a sufficiently powerful local LLM, we can start the rejudging process.

Invoke spikee results rejudge, specifying your results files to be rejudged using --result-file flag.

An LLM model can be specified using the --judge-options flag, replacing the prior offline options.
Multiple files can be rejudged in the same command by repeating the --result-file flag.

spikee results rejudge --result-file .\results\results_openai-api_my-harmful-content-test.jsonl \
                       --result-file .\results\results_openai-api_my-first_jailbreaks.jsonl \
                       --judge-options ollama-llama3.2

If a rejudging halts early due to a error or early termination using ‘Ctrl+C’, it can be resumed using the --resume flag. However, this does require the filename of the original and rejudged results to be unmodified and stored within the same folder.

spikee results rejudge --result-file .\results\results_openai-api_my-harmful-content-test.jsonl \
                       --judge-options ollama-llama3.2 \
                       --resume

These results can then be analysed by invoking the standard spikee results analyze command.

spikee results analyze --result-file .\results\results_openai-api_my-harmful-content-test_rejudged.jsonl

4. Installing Spikee in Isolated Environments

In many restricted test environments, installing Spikee can be a challenge in itself. It is common for these systems to have no internet access and no ability to install packages via pip. In such cases, Spikee can be built locally together with its dependencies and transferred into the isolated environment as a self-contained archive.

This process is documented in detail in the official Spikee documentation: https://github.com/ReversecLabs/spikee/blob/main/docs/11_installing_spikee_in_isolated_environments.md

5. Conclusions

In this tutorial, we’ve demonstrated how to use the new offline judge and rejudging functionalities within Spikee, aiding the security assessment of LLMs within restrictive environments.

Check out https://spikee.ai for the latest developments and contribute to the project on GitHub to help improve LLM security testing practices.

Spikee: Testing for LLM Prompt Injection in Restrictive Environments

1. Background

2. Testing using an Offline Judge

3. Rejudging

4. Installing Spikee in Isolated Environments

5. Conclusions

Similar Posts

Evaluating LLM Input Comprehension and Guardrail Robustness through Noise-Based Attacks

Design Patterns to Secure LLM Agents In Action

Spikee: Testing LLM Applications for Prompt Injection