-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathlec07-F24.tex
358 lines (274 loc) · 19.1 KB
/
lec07-F24.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
\newcommand{\nonnegl}{\mathsf{nonnegl}}
\section{Fixed-length MACs}
Previously, we defined what a MAC is, and specified correctness and security definitions for MACs. In this section, we'll define a fixed-length MAC for length $\ell(n)$.
\begin{theorem}
If $F : \{0, 1\}^n \to \{0, 1\}^n$ is a secure PRF, then the MAC scheme $\Pi = (\mathsf{Gen}, \mathsf{Mac}, \mathsf{Verify})$ constructed below has EUF-CMA security.
\begin{itemize}
\item $\gen(1^n):$
\begin{algorithmic}
\item Output $k \getsr \{0, 1\}^n$
\end{algorithmic}
\item $\mac(k,m):$
\begin{algorithmic}
\item Output $t = F_k(m)$
\end{algorithmic}
\item $\Verify(k, m, t)$
\begin{algorithmic}
\item If $t = F_k(m)$, then return $1$.
\item Otherwise return $0$.
\end{algorithmic}
\end{itemize}
That is, we just compute the PRF on our message as the MAC.
\end{theorem}
\begin{proof}
To prove security, suppose for contradiction that there exists an adversary $A$ that breaks the security for $\Pi$. We'd like to construct an adversary $B$ that breaks the security of the PRF.
Here, the adversary $A$ expects queries for tags, given messages as input. $B$ can simply forward these requests on to $F$, and return the response back to $A$. Further, $A$ outputs a pair $(m^*, t^*)$, which $B$ can send $m^*$ to $F$, and output whether $t = t^*$.
\begin{center}
\begin{tikzpicture}
\draw (3, 0) rectangle (8, 5.5);
\draw (5, 0.5) rectangle (7.5, 5);
\node at (3.25, 5.25) {$B$};
\node at (5.25, 4.75) {$A$};
\draw (5.5, 4) edge[->] node[right, pos=0] {$m$} (2.5, 4)
(2.5, 3.5) edge[->] node[right, pos=1] {$t$} (5.5, 3.5);
\node[left] at (2.5, 3.75) {$F_k$};
\node at (5.5, 3.1) {$\vdots$};
\node at (4.5, 3.1) {$\vdots$};
\draw (5.5, 2.5) edge[->] node[right, pos=0] {$(m^*, t^*)$} (4.5, 2.5);
\draw (3.5, 2.5) edge[->] node[right, pos=0] {$m^*$} (2.5, 2.5);
\node[left] at (2.5, 2.25) {$F_k$};
\draw (2.5, 2) edge[->] node[right, pos=1] {$t$} (3.5, 2);
\draw (3.5, 1) edge[->] node[right, pos=0] {$t \overset{?}{=} t^*$} (2.5, 1);
\end{tikzpicture}
\end{center}
Analyzing the probability for $B$, we have
\[
\abs{\Pr(B^{F_k(\cdot)}(1^n) = 1) - \Pr(B^{R_n(\cdot)}(1^n) = 1)}
= \abs{\varepsilon_A(n) - \frac{1}{2^n}}
= \nonnegl(n)
.\]
Here, the first term is because the correctness follows immediately from the correctness of $A$, and the second term is due to the fact that the output of $R_n$ is random.
\end{proof}
\section{Variable-length MACs}
Now, let us look at messages with lengths that are a multiple of $n$. In particular, we have a few blocks $m_1, \ldots, m_{\ell}$, each of size $n$. There are a few ways to do this, but we'll look at a method similar to the counter mode we looked at last time.
\begin{center}
\begin{tikzpicture}
\node (m1) at (0, 2) {$m_1$};
\node (m2) at (1, 2) {$m_2$};
\node (m3) at (2, 2) {$m_3$};
\node (mdots) at (3, 2) {$\cdots$};
\node (ml) at (4, 2) {$m_{\ell}$};
\node[draw] (fk1) at (0, 0) {$F_k$};
\node[draw] (fk2) at (1, 0) {$F_k$};
\node[draw] (fk3) at (2, 0) {$F_k$};
\node at (3, 0) {$\ldots$};
\node[draw] (fkl) at (4, 0) {$F_k$};
\node[outer sep=0pt, inner sep=0pt, draw, circle] (fk1+m2) at (1, 1) {$+$};
\node[outer sep=0pt, inner sep=0pt, draw, circle] (fk2+m3) at (2, 1) {$+$};
\node[outer sep=0pt, inner sep=1pt] (xor-dots) at (3, 1) {$\ldots$};
\node[outer sep=0pt, inner sep=0pt, draw, circle] (dots+ml) at (4, 1) {$+$};
\draw (m1) edge[->] (fk1)
(m2) edge[->] (fk1+m2)
(m3) edge[->] (fk2+m3)
(ml) edge[->] (dots+ml);
\draw[->] (fk1) -- ++(0.5, 0) |- (fk1+m2);
\draw[->] (fk2) -- ++(0.5, 0) |- (fk2+m3);
\draw[->] (fk3) -- ++(0.5, 0) |- (xor-dots);
\draw[->] (xor-dots) -- (dots+ml);
\draw (fk1+m2) edge[->] (fk2)
(fk2+m3) edge[->] (fk3)
(dots+ml) edge[->] (fkl);
\draw (fkl) edge[->] ++(1, 0);
\end{tikzpicture}
\end{center}
This construction avoids having to store a tag equal in length to the message, but this is not secure, due to length extension attacks. In particular, suppose we query for the tag $t$ associated with $0^n$. We can then query another tag $t'$ for $0^n \oplus t$. Observe here that $t'$ is also the tag for $0^{2n}$.
A solution is to use different keys for each PRF, but this isn't too efficient, since we're still calling the PRF once per block of length $n$. We'll instead improve this to use only one block cipher call---we do some preprocessing and only call $F_k$ once on the output of the preprocessing.
In particular, we'll claim that applying a universal hash function to the input and then applying the block cipher is a secure MAC.
\begin{definition}[Universal Hash Function]
A function $h : \mathcal{F} \times \mathcal{F}^* \to \mathcal{F}$ (where $\mathcal{F}$ is a field of size $2^m$) is a universal hash function if for all $m, m' \in \mathcal{F}^{\le \ell}$ (i.e. $m$ and $m'$ have length at most $\ell$),
\[
\Pr_s(h(s, m) = h(s, m')) \le \frac{\ell}{\abs{F}}
.\]
That is, the probability of collision is small.
\end{definition}
Crucially here, we fix $m$ and $m'$, and we sample $s$. (If we fix an $s$, we can almost surely find an $m$ and $m'$ that collide.)
Today, we'll look at the following function:
\[
h(s, m_0, \ldots, m_{\ell - 1}) = m_0 + m_1 s + m_2 s^2 + \cdots + m_{\ell - 1} s^{\ell - 1} + s^{\ell}
.\]
\begin{claim}
The function defined by
\[
h(s, m_0, \ldots, m_{\ell - 1}) = m_0 + m_1 s + m_2 s^2 + \cdots + m_{\ell - 1} s^{\ell - 1} + s^{\ell}
\]
is a universal hash function.
\end{claim}
\begin{proof}
We'd like to argue that for a fixed $m$ and $m'$, and a random $s$, the probability that there is a collision is at most $\frac{\ell}{\abs{\mathcal{F}}}$.
We'll look at
\[
h(x, m_0, \ldots, m_t) - h(x, m_0', \ldots, m_t') = (m_0 - m_0') + \cdots + (m_{t - 1} - m_{t-1}') x^{\ell - 1}
.\]
If there is a collision, this difference is 0. The probability that this polynomial of degree at most $\ell$ has a zero at $x$ is at most $\frac{\ell}{\abs{\mathcal{F}}}$, since it has at most $\ell$ zeroes. This means that $h$ is indeed a universal hash function.
\end{proof}
\begin{claim}
The MAC given by $F_k(h(s, m_1, \ldots, m_{\ell}))$, for the universal hash function $h$ given prior, is secure. (This is a slight variation on the Carter--Wegman MAC.)
\end{claim}
\begin{proof}
Suppose for contradiction that there exists a nu-PPT $A$ that breaks the security of this scheme.
Here, for appropriately generated $k$ and $s$, $A$ makes queries $m \mapsto F_k(h_s(m))$, and outputs $(m^*, t^*)$.
We'd like to create an adversary $B$ that either breaks the security of the PRF, or breaks the security of the universal hash function.
$B$ will start by sampling $s \in \mathcal{F}$. When given the query for $m_1$, it computes $h_s(m_1)$ and queries for $F_k(h_s(m_1))$, which it sends back to $A$. If $F_k$ was actually pseudorandom, then $A$ is given a pseudorandom input, and if $F_k$ was random $R_n$, then $A$ is given a random input.
$A$ must still be able to generate pairs $(m^*, t^*)$ even when given a random input, due to the security of the PRF.
\begin{center}
\begin{tikzpicture}
\draw (2, 0) rectangle (8, 5.5);
\draw (5, 0.5) rectangle (7.5, 5);
\node at (2.25, 5.25) {$B$};
\node at (5.25, 4.75) {$A$};
\node at (3.5, 4.5) {sample $s \in \mathcal{F}$};
\node (hsm) at (3.5, 4) {$h_s(m)$};
\draw (5.5, 4) edge[->] node[right, pos=0] {$m$} (hsm)
(hsm) edge[->] (1.5, 4)
(1.5, 3.5) edge[->] node[right, pos=1] {$t$} (5.5, 3.5);
\node[left] at (1.5, 3.75) {$F_k$};
\node at (5.5, 3.1) {$\vdots$};
\node at (4.5, 3.1) {$\vdots$};
\node (hash-m-star) at (3.5, 2.5) {$h_s(m^*)$};
\draw (5.5, 2.5) edge[->] node[right, pos=0] {$(m^*, t^*)$} (hash-m-star)
(hash-m-star) edge[->] (1.5, 2.5);
\node[left] at (1.5, 2.25) {$F_k$};
\draw (1.5, 2) edge[->] node[right, pos=1] {$t$} (3.5, 2);
\draw (3.5, 1) edge[->] node[right, pos=0] {$t \overset{?}{=} t^*$} (1.5, 1);
\end{tikzpicture}
\end{center}
Let $E$ be the event that there exists an $m, m' \in L \cup \{m^*\}$, such that $h_s(m) = h(m')$. If $E$ does not happen, then the hash function never collides. This means that the attacker only sees random values depending on distinct inputs, so this reduces to the case from earlier (when the MAC is just $F_k$).
As such, we'd like to show that collisions in $h_s(\cdot)$ occur with negligible probability.
To show this, suppose for contradiction that collisions actually do occur with non-negligible probability. We then want to construct an adversary $B$ utilizing $A$ that just outputs $m$ and $m'$ such that when $s$ is sampled, $h_s(m) = h_s(m')$ with high probability.
$B$ will pick a random $i, j \in \{1, \ldots, q+1\}$ (here suppose $i < j$), where $q$ is the number of MAC queries. We then run $A$ until the $j$th query. Taking the $i$th and $j$th query, we then output $m_i$ and $m_j$ as our pair of messages. We still need to entertain the queries made by $A$, so we can just return random values for tags (giving the same value if it requests it for the same message).
\begin{center}
\begin{tikzpicture}
\draw (2, 0) rectangle (8, 6);
\draw (5, 0.5) rectangle (7.5, 5.5);
\node at (2.25, 5.75) {$B$};
\node at (5.25, 5.25) {$A$};
\node[align=center] at (3.5, 5) {\small $i, j \xleftarrow{\$} \{1, \ldots, q+1\}$};
\draw (5.5, 4) edge[->] node[right, pos=0] {$m_1$} (4.5, 4)
(4.5, 3.5) edge[->] node[left, pos=0] {sample $t_1$} node[right, pos=1] {$t_1$} (5.5, 3.5);
\node at (5.5, 3.1) {$\vdots$};
\node at (4.5, 3.1) {$\vdots$};
\draw (5.5, 2.5) edge[->] node[right, pos=0] {$m_i$} (4.5, 2.5)
(4.5, 2) edge[->] node[left, pos=0] {sample $t_i$} node[right, pos=1] {$t_i$} (5.5, 2);
\node at (5.5, 1.6) {$\vdots$};
\node at (4.5, 1.6) {$\vdots$};
\draw (5.5, 1) edge[->] node[right, pos=0] {$m_j$} (4.5, 1);
\draw (3, 0.5) edge[->] node[right, pos=0] {$(m_i, m_j)$} (1.5, 0.5);
\end{tikzpicture}
\end{center}
By assumption, we know that $E$ occurs with non-negligible probability. That is, among the queries made by $A$, there is a non-negligible probability that $h_s(m_i) = h_s(m_j)$. Since here the implementation of $B$ just picks out a pair of random queries from those made by $A$, the pair $(m_i, m_j)$ output by $B$ also has a collision with non-negligible probability. (In particular, with probability $\Pr(E) / q^2$.
This breaks the definition of a universal hash function, which is a contradiction.
\end{proof}
So far, we know how to generate tags of fixed length, and of lengths that are a multiple of $n$. If we have a message that is not a multiple of $n$, we could potentially just pad the input with 0's, but this causes an issue, as $m$ and $m \concat 0$ have the same tag.
Instead, one solution is to put the size of the message in the first block, and we can still put the padding at the end. This way, if the messages differ by length, the first block will be different, and if the messages do not differ by length, then we're essentially just ignoring the padding. This gives us a MAC for arbitrary-length messages.
\section{Authenticated Encryption Schemes}
We've talked about confidentiality and integrity separately, but generally we want both properties---when Alice sends a message to Bob, we'd like for any eavesdropper to be unable to recover the message, \emph{and} we'd like Bob to be able to verify that the message actually came from Alice.
A scheme that achieves both of these conditions is called an \emph{authenticated encryption scheme}.
\begin{definition}[Authenticated Encryption Scheme]
A scheme $\Pi$ is an \emph{authenticated encryption scheme} if it is CPA-secure, and it has ciphertext integrity (CI).
\end{definition}
\begin{definition}[Ciphertext Integrity (CI)]
Consider the following game for the scheme $\Pi = (\mathsf{Gen}, \mathsf{Enc}, \mathsf{Dec})$.
\begin{algorithmic}[1]
\Function{CI${}_{\Pi}^A$}{$n$}
\State $k \gets \mathsf{Gen}(1^n)$
\State $c^* \gets A^{\mathsf{Enc}(k, \cdot)}(1^n)$
\State $L \gets$ the list of queries made by $A$
\State \Return $(\mathsf{Dec}(k, c^*) \ne \bot) \land (c^* \notin L)$
\EndFunction
\end{algorithmic}
A scheme has ciphertext integrity if for all nu-PPT $A$, $\Pr(\mathrm{CI}_{\Pi}^A)$ is negligible.
\end{definition}
Observe that an authenticated encryption scheme is also CCA-secure, since the CI property says that the adversary can never generate a valid ciphertext. This means that whenever an adversary requests the decryption of a ciphertext, we can always return $\bot$ (unless they previously requested a ciphertext for a message, and wants to decode that ciphertext). This means that the decryption oracle is essentially useless, and this reduces to the CPA case.
Next, we'll construct an authenticated encryption scheme, called ``Encrypt-then-MAC'', utilizing a CPA-secure encryption scheme and an EUF-CMA MAC scheme.
\begin{claim}
Let $\Pi_e = (\mathsf{Gen}_e, \mathsf{Enc}_e, \mathsf{Dec}_e)$ be a CPA-secure encryption scheme, and let $\Pi_m = (\mathsf{Gen}_m, \mathsf{Mac}_m, \mathsf{Verify}_m)$ be an EUF-CMA-secure MAC scheme.
The following scheme $\Pi = (\mathsf{Gen}, \mathsf{Enc}, \mathsf{Dec})$ is an authenticated encryption scheme.
\begin{algorithmic}[1]
\Function{Gen}{$1^n$}
\State $k_e \gets \mathsf{Gen}_e(1^n)$
\State $k_m \gets \mathsf{Gen}_m(1^n)$
\State \Return $(k_e, k_m)$
\EndFunction
\Statex
\Function{Enc}{$(k_e, k_m), m$}
\State $c \gets \mathsf{Enc}_e(k_e, m)$
\State $t \gets \mathsf{Mac}_m(k_m, c)$
\State \Return $(c, t)$
\EndFunction
\Statex
\Function{Dec}{$(k_e, k_m), (c, t), m$}
\If {$\mathsf{Verify}_m(k_m, c, t)$}
\State \Return $\mathsf{Dec}_e(k_e, c)$
\Else
\State \Return $\bot$
\EndIf
\EndFunction
\end{algorithmic}
\end{claim}
\begin{proof}
Suppose for contradiction that we have an adversary $A$ that breaks the CPA security of $\Pi$. The CPA game allows for queries of the ciphertext for messages $m$, produces a pair $m_0, m_1$, and then gets $c^* = \mathsf{Enc}(k, m_B)$, and $A$ eventually outputs $b'$ to identify which message was encrypted.
We'd like to construct another adversary $B$, which breaks the CPA-security of $\Pi_e$. The only difference here is the MACs, so $B$ can sample a $k_m \gets \mathsf{Gen}_m(1^n)$, and perform all of the MACs itself.
In particular, when $A$ asks for the ciphertext of $M$, we pass it to the oracle for $\Pi_e$, and attach $t \gets \mathsf{Mac}_m(k_m, c)$. If $A$ is able to distinguish between ciphertexts of $M_0$ and $M_1$, then we can use the same bit to distinguish between ciphertexts for $\Pi_e$.
\begin{center}
\begin{tikzpicture}
\draw (1.5, -1) rectangle (8, 6);
\draw (5, -0.5) rectangle (7.5, 5.5);
\node at (1.75, 5.75) {$B$};
\node at (5.25, 5.25) {$A$};
\node at (3.25, 5) {$k_m \gets \textsc{Gen}_m(1^n)$};
\node (enc) at (3.25, 3.5) {$C = \textsc{Enc}_e(k_e, m)$};
\node (mac) at (3.25, 3) {$(C, \textsc{Mac}_m(k_m, C))$};
\draw (5.5, 4) edge[->] node[right, pos=0] {$m$} (1, 4)
(1, 3.5) edge[->] (enc)
(mac) edge[->] node[right, pos=1] {$c$} (5.5, 3);
\node[left] at (1, 3.75) {$\textsc{Enc}_e(k_e, \cdot)$};
\node at (5.5, 2.6) {$\vdots$};
\node at (4.5, 2.6) {$\vdots$};
\draw (5.5, 2) edge[->] node[right, pos=0] {$m_1, m_2$} node[left, pos=1] {$m_1, m_2$} (1, 2);
\node (enc-mb) at (3.25, 1.5) {$C^*$};
\node (mac-mb) at (3.25, 1) {$(C^*, \textsc{Mac}_m(k_m, C^*))$};
\draw (1, 1.5) edge[->] node[left, pos=0] {$\textsc{Enc}_e(k_e, m_b)$} (enc-mb);
\draw (mac-mb) edge[->] node[right, pos=1] {$c^*$} (5.5, 1);
\node at (5.5, 0.5) {$\vdots$};
\node at (4.5, 0.5) {$\vdots$};
\draw (5.5, 0) edge[->] node[right, pos=0] {$b'$} (1, 0);
\end{tikzpicture}
\end{center}
To prove ciphertext integrity, suppose we have an adversary $A$ that breaks the ciphertext integrity of $\Pi$. Here, $A$ asks for ciphertext queries, and eventually returns a new ciphertext that is valid.
We'd like to construct an adversary $B$ that is able to generate a new message and a tag, given oracle access to the MAC scheme. The construction will follow similarly to the prior proof on CPA security.
Here, our adversary $B$ can sample $k_e \gets \mathsf{Gen}_e(1^n)$. When $A$ asks for the encryption of $M$, $B$ can send $m = \mathsf{Enc}_e(k_e, M)$ to the MAC oracle, and it returns $c = (m, t)$ to $A$.
When $A$ returns $C^* = (c^*, t^*)$, $B$ can also just return the same, since the tag $t^*$ is being computed on $c^*$.
\begin{center}
\begin{tikzpicture}
\draw (1.5, 1) rectangle (8, 6);
\draw (5, 1.5) rectangle (7.5, 5.5);
\node at (1.75, 5.75) {$B$};
\node at (5.25, 5.25) {$A$};
\node at (3.25, 5) {$k_e \gets \textsc{Gen}_e(1^n)$};
\node (enc) at (3.25, 4) {$C = \textsc{Enc}_e(k_e, m)$};
\node (mac) at (3.25, 3.5) {$T = \textsc{Mac}_m(k_m, C)$};
\node (enc-mac) at (3.25, 3) {$(C, T)$};
\draw (5.5, 4) edge[->] node[right, pos=0] {$m$} (enc)
(enc) edge[->] (1, 4)
(1, 3.5) edge[->] (mac)
(enc-mac) edge[->] node[right, pos=1] {$c$} (5.5, 3);
\node[left] at (1, 3.75) {$\textsc{Mac}_m(k_m, \cdot)$};
\node at (5.5, 2.6) {$\vdots$};
\node at (4.5, 2.6) {$\vdots$};
\node (gen-ciphertext) at (3.25, 2) {$(C^*, T^*)$};
\draw (5.5, 2) edge[->] node[right, pos=0] {$c^*$} (gen-ciphertext)
(gen-ciphertext) edge[->] (1, 2);
\end{tikzpicture}
\end{center}
\end{proof}
As an example, AES-GCM is the most popular authenticated encryption scheme that is used, and also has the ability to authenticate additional data. (AES-GCM basically just appends the associated data to the ciphertext, so that the encryption is only on the message, but the MAC is on both the ciphertext and the associated data.) This scheme uses a counter-mode encryption scheme, and the MAC that we saw, but makes this more efficient.