Padmé padding for age? #235
Replies: 6 comments 1 reply
-
Help me understand how this works for many of the targets that would be encrypted by 'age'? For example, let's say I have a TAR-ball that is comprised of say 5000 pictures and when encrypted, it comes in around 25GB. And then I then have another TAR-ball of documents, if might come in when encrypted at around 12GB. And finally, a 3rd TAR-ball of videos, pics, and documents at 42GB when encrypted. How is the size of the payload helping to identify the payload? I'm sure there's something I'm not understanding so please receive my question as only seeking to understand...not push-back... |
Beta Was this translation helpful? Give feedback.
-
Well let's say that the tarball is 'illegal' material that you have an interest in denying possession of. There are awful and cruel examples of this, but there are also righteous and good examples of this. For example the encrypted file might be the 'Bible' for a religion that an oppressive state has made illegal to possess, or it might be a tarball of material that a (corporate or state) whistle-blower leaked to a news-organization. Without any padding, an exact byte-match of the illicit material is extremely suggestive circumstantial evidence of possession. With well-crafted padding; there is a greater range of potential content that encrypts to that size and the match is not as definitive. |
Beta Was this translation helpful? Give feedback.
-
I should add too that padding can be a useful mitigation against another kind of attack: shared compression tables. Suppose that the tarball is a gzipped backup of your email inbox and that you make that backup every day and send it to a Cloud service for storage. The size of the backup can be seen by anyone who can observe the upload. Now suppose I send you an email every day; and also observe the size of the backup. Over time, by trying different strings in my email, I can statistically profile where there Is overlap between the strings in my email, and other strings in your inbox (other emails); or at least the frequency of them. This is because the compression table entries will be shared, and so when there's overlap the output size is smaller than I would otherwise expect it to be. Exposing exact payload sizes make these kinds of shared-compression-table makes the attacks very practical. For example: suppose I send you a 200 byte email that contains the string "TOP SECRET DOCUMENTS", over the weekend when you're not getting much other email, and the size of your backup only goes up by 12 bytes, I can guess that the string "TOP SECRET DOCUMENTS" appears elsewhere in your inbox. Padding doesn't prevent these attacks; the attacker can pad themselves until the email crosses a padding boundary, helps makes these attacks much more costly and slow and only one bit of information is leaked each time a padding boundary is crossed, so it's just not nearly as practical. |
Beta Was this translation helpful? Give feedback.
-
Taking the second set of comments first, as you mentioned, padding does nothing to prevent that type of attack. Also, what you outline is an attack far beyond the scope of 'age' or really any tool used for only file encryption. Now, the first set of comments...I'm a bit dubious on entities being able to reliably determine someone has an encrypted form of a given work from only the byte count. And even if they might be able to do this, thinking just about only obscurity for a moment, then I think we first need to ask is if providing that type of obscurity is within the scope of the 'age' itself. From my reading of FiloSottie's blog and the front page of this project, it seems to me that the goal is to provide a very easy-to-use and lightweight file encrypt\decrypt tool. Perhaps I'm wrong here but that's my interpretation. |
Beta Was this translation helpful? Give feedback.
-
In context that's absolutely not what I wrote, but I should have been more careful not to say 'prevent' so casually. For compression attacks, sufficient padding does increase the costs for attackers, well beyond practicality in most cases. One bit per padding length is very very low bandwidth with which to try and work out the compression table collision. Sounds like in your threat model that you don't care about an attacker being able to identify the plaintext from a set of known plaintexts. That's ok, but it's a pretty non-standard assumption. |
Beta Was this translation helpful? Give feedback.
-
I don't see this as particularly "my threat model"...I see it as the likely threat models that are relevant to the typical user of 'age'. Said differently, I don't see the typical 'age' user as one who is encrypting a lot of widely known and\or disseminated plain-texts. The issue you describe is more of a privacy concern as opposed to an encryption concern and again, based on my interpretation of FiloSottie's writings re: PGP and the reason why 'age' was developed in the first place, I view 'age' as an encryption tool...nothing more. That's not to say it can't be more but then that is really the dev's decision... |
Beta Was this translation helpful? Give feedback.
-
I was tempted to open this as a bug, but that would be provocative: age does not make an effort to hide the size of the encrypted payload. Since many sensitive payloads can be identified by their size alone, this has obvious privacy implications.
The PURB paper (https://bford.info/pub/sec/purb.pdf)) defines an interesting padding scheme called Padmé that introduces at most 12% overhead, and where the overhead decreases with file size. The key quote from the paper Is ...
To my mind, adding 3% overhead for a ~5x improvement in file obfuscation is more than worth it, and that's the worst performing of the examples.
There are Go implementations already, for example: https://github.com/dedis/purb/blob/master/purbs/padding.go
it certainly doesn't look like a big performance hit or anything like that. Is a padding scheme like this something that would be considered for age?
I'm happy to work on adding it myself, in Go and rust. I've opened this issue for discussion.
Beta Was this translation helpful? Give feedback.
All reactions