Showing Posts From

AI Security

Spikee: Testing LLM Applications for Prompt Injection

A step-by-step guide using the open-source tool spikee (v0.2) for prompt injection testing in LLM applications. Explores a webmail summarization case study, covering custom dataset creation, testing with Burp Suite and spikee's custom targets, interpreting results, and noting key updates from v0.1 to v0.2 like the Judge system and dynamic attacks.

Multi-Chain Prompt Injection Attacks

Multi-chain prompt injection is a novel attack technique targeting complex LLM applications with multiple chained language models. The technique exploits interactions between LLM chains to bypass safeguards and propagate malicious content through entire systems. A sample workout planner application demonstrates how attackers can manipulate multi-chain LLM workflows to inject and propagate adversarial prompts across different processing stages.

Fine-Tuning LLMs to Resist Indirect Prompt Injection Attacks

A fine-tuning approach was developed to enhance Llama3-8B's resistance to indirect prompt injection attacks. The method uses data delimiters in the system prompt to help the model ignore malicious instructions within user-provided content. The fine-tuned model achieved a 100% pass rate in resisting tested prompt injection attacks. The model and training scripts have been publicly released.

When your AI Assistant has an evil twin

An indirect prompt injection attack against Google Gemini Advanced demonstrates how malicious emails can manipulate the AI assistant into displaying social engineering messages. The attack tricks users into revealing confidential information by exploiting Gemini's email summarization capabilities. The vulnerability highlights potential security risks in AI assistants with data access capabilities.

Generative AI - An Attacker's View

Generative AI is increasingly being used by threat actors for cyber attacks. Attackers can leverage AI for reconnaissance, gathering personal information quickly and creating targeted phishing emails. The technology enables sophisticated social engineering through deepfakes, voice cloning, and malicious code generation, with potential for more advanced attacks in the near future.

Domain-specific prompt injection detection

A domain-specific machine learning approach was developed to detect prompt injection attacks in job application contexts using a fine-tuned DistilBERT classifier. The model was trained on a custom dataset of job applications and prompt injection examples, achieving approximately 80% accuracy in identifying potential injection attempts. The research highlights the challenges of detecting prompt injection in large language models and emphasizes that such detection methods are just one part of a comprehensive security strategy.

Should you let ChatGPT control your browser?

This article explores the security risks of granting Large Language Models (LLMs) control over web browsers. Two attack scenarios demonstrate how prompt injection vulnerabilities can be exploited to hijack browser agents and perform malicious actions. The article highlights critical security challenges in LLM-driven browser automation and proposes potential defense strategies.

Synthetic Recollections

The article explores prompt injection techniques that can manipulate LLM agents with multi-chain reasoning systems. Two primary attack vectors are presented: thought/observation injection and thought-only injection. These attacks can potentially compromise the integrity of LLM-powered agents by tricking them into performing unintended actions through carefully crafted prompts.

  • 20 May 2020

Releasing the CAPTCHA Cracken

A tool called CAPTCHA Cracken was developed to bypass text-based CAPTCHAs on an Outlook Web App portal. Advanced image preprocessing techniques and browser automation with Pyppeteer were used to overcome significant CAPTCHA recognition challenges. The project demonstrated the vulnerability of traditional text-based CAPTCHAs to machine learning-based automated attacks.

CAPTCHA-22: Breaking Text-Based CAPTCHAs with Machine Learning

A machine learning technique was developed to break text-based CAPTCHAs using an Attention-based OCR model. By manually labeling training data from a large dataset of CAPTCHA images, near-perfect accuracy was achieved in solving various CAPTCHA implementations. The study demonstrated how machine learning can effectively bypass traditional text-based CAPTCHA systems with minimal computational resources.