-
Notifications
You must be signed in to change notification settings - Fork 2
/
index.html
91 lines (78 loc) · 4.31 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
<!DOCTYPE html>
<html data-theme="dark" class=" jxaaxbo idc0_346" lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Prompt Injections are bad, mkay?</title>
<meta name="description" content="">
<link rel="stylesheet" href="resources/pico.min.css">
<link rel="stylesheet" href="resources/custom.css">
<script>
examples = ["pirate"]
function redirect_example() {
window.location.href = window.location.href + "examples/" + examples[Math.floor(Math.random() * examples.length)] + ".html";
}
</script>
<body>
<main class="container">
<div class="container">
<h1>Indirect Prompt Injection Threats</h1>
<p>Large Language Models (LLM) have made amazing progress in recent years. Most recently, they have demonstrated to answer natural language questions at a surprising performance level. In addition, by clever prompting, these models can change their behavior. In this way, these models blur the line between data and instruction. From "traditional" cybersecurity, we know that this is a problem. The importance of security boundaries between trusted and untrusted inputs for LLMs was underestimated. We show that Prompt Injection is a serious security threat that needs to be addressed as models are deployed to new use-cases and interface with more systems.</p>
<p>
If allowed by the user, Bing Chat can see currently open websites. We show that an attacker can plant an injection in a
website
the user is visiting, which silently turns Bing Chat into a Social Engineer who seeks out and exfiltrates personal information.
The user doesn't have to ask about the website or do anything except interact with Bing Chat while the
website is opened in the browser.
</p>
<div class="container-fluid">
<figure>
<video controls autoplay>
<source src="./resources/output.webm" type="video/webm">
</video>
<figcaption>Turning Bing Chat into a scammer trying to get the user's payment details</figcaption>
</figure>
</div>
<small>
<blockquote>
Microsoft has implemented various mitigations against this threat now, though their effectiveness remains unclear and is constantly changing.
</blockquote>
</small>
</div>
<article class="grid">
<div>
<h2>Turning Bing Chat into a Data Pirate</h2>
<p>This demonstration on Bing Chat is only a small part of new attack techniques presented in our recent paper (linked below).</p>
<p>A user opened a prepared website containing an injection (could also be on a social media site) in
Edge.
You can see the conversation the user had with Bing Chat while the tab was open.
The website includes a prompt which is read by Bing and changes its behavior to access user information
and send it to an attacker.
This is an example of "Indirect Prompt Injection", a new attack described in our paper.
The pirate accent is optional. The injection itself is simply a piece of regular text that has fontsize
0. You can find an image of the injected text below, too (otherwise Bing Chat could see it and could be
injected).
you can inspect the actual website that is opened <a
href="https://greshake.github.io/examples/pirate.html">here</a>.
</p>
<div class="grid">
<button onclick="window.location.href='https://github.com/greshake/llm-security';">
GitHub
</button>
<button onclick="window.location.href='https://arxiv.org/abs/2302.12173';">Paper</button>
</div>
<br>
<figure><img src="resources/injection.png" alt="">
<figcaption>The prompt hidden on the pirate website</figcaption>
</figure>
</div>
<aside>
<img src="resources/demo.png" alt="">
</aside>
</article>
<article class="grid">
</article>
</main>
</body>
</html>