This presentation explores the practical risks associated with granting Large Language Models (LLMs) agency, enabling them to perform actions on behalf of users. Donato shows how attackers can exploit these capabilities in real-world scenarios. Specifically, the focus will be on an emerging use cases: autonomous browser and software engineering agents. The session will cover how LLM agents operate, the risks of indirect prompt injection, and strategies for mitigating these vulnerabilities.

Outline:

  • Introduction to LLMs and their capabilities.
  • Explanation of how granting agency to LLMs works (ReAct and function calling)
  • Description of indirect prompt injection vulnerabilities.
  • Demonstrations using Taxy AI, a proof-of-concept browser agent.
    • Scenario 1: Hijacking the agent to exfiltrate confidential information from a user’s mailbox.
    • Scenario 2: Hijacking the agent to force the merge of a malicious pull request on a GitHub repository.
  • Conclusion with mitigation strategies for developers to safeguard users.