-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with memory databases and encryption #33
Comments
I agree that (re)key on in-memory or temporary database should fail or at least return a warning. I'm afraid, though, that there is no clean way to implement this without patching SQLite3 internals, which I'd like to avoid at all cost. If this turns out to be the case, the best way forward may be to contact upstream via sqlite-users or sqlite-dev mailing list and hope that they are interested in addressing the issue in SQLite3 code. SQLite3 SEE (most likely) suffers from this issue, so an upstream fix is entirely possible. However,
Thus, accidentally flushing unencrypted temporary databases on disk is quite unlikely to happen in practice. In-memory databases are also never written to the disk. So this is mostly an UX issue with low security impact (still completely worth fixing). Just wanted to make that clear to everyone. |
Another option would be to simply make
It should be possible to detect whether the associated database is in-memory or temporary within the code of Regarding your remark ... patching SQLite3 internals, which I'd like to avoid at all cost ... I fear that the time will come rather soon that this can't be avoided any longer. Maybe you have not yet seen the changes to the public SQLite code that took place on Feb 7, 2020: “Simplify the code by removing the unsupported and undocumented SQLITE_HAS_CODEC compile-time option”. That is, most likely starting with the next release of SQLite3 it will no longer be possible to add encryption by simply combining the public SQLite3 code with the code of an encryption extension, unfortunately.
I fear it will be quite unlikely that an upstream fix will happen due to the new SQLite3 policy mentioned above. |
Holy shit @ |
Yes, indeed.
Although a VFS-based encryption seems to be an attractive alternative at first glance, the devil is in the details. The VFS API is rather low-level - for read/write operations you only get a buffer plus buffer length plus file offset, but information about page size and reserved bytes per page is missing. That is, this information has to be passed to the VFS somehow on your own. For the database file itself the VFS approach would most likely work rather well, because AFAICT except for page 1 (which contains the header) always full pages are read or written. For the journal files this is quite different: there are file headers and frame headers (depending on the journal type) in addition to the actual page content, and only the page content is encrypted. To make things worse, the headers contain checksums over the page content (currently the encrypted content!). Detecting page content reads and writes will require to access the headers. Not impossible, but probably messy. |
In the meantime I started to work on the VFS approach, because I fear that keeping up the support of the current approach for future versions of SQLite will develop into growing pain in the long run. The good news is that a pointer to the SQLite database connection handle ( Another interesting property of the VFS approach is that it is very easy to support custom pragma commands for configuration purposes. On the down side is that the There are a few other obstacles, of course, but hopefully nothing that can't be managed somehow. |
I agree with your thoughts about VFS-based encryption. VFS encryption is the way forward, but it will require a lot of work to get all the details right (if it is even possible without crazy hacks). The potential benefits are huge because VFS-interface gives us full control of temporary files and, if lucky, the encryption might be feasible to implement as a run-time loadable extension, which is a big deal for many users (or so I have heard). I'm busy this month, but in the beginning of April I have a week or two to spend on this topic. Interesting stuff.
Journals and WAL files are temporary and usually quite small, so encrypting them fully using a single key may be good enough until we figure out something clever (that is, more efficient and more secure). The idea is to not care about headers, frames, pages or checksums at all, and simply encrypt and decrypt everything "transparently" inside the VFS shim. In other words, SQLite3 sees only unencrypted content so checksums will just work. However, actual databases must be encrypted at a page level to support fast incremental updates etc.
A more portable method is to store the page size when SQLite3 reads the database header from the disk. I'll experiment with this approach because I also want to try faking "reserved bytes per page" header field (for new encrypted databases) in order to force SQLite3 reserve 32 bytes for MAC/nonce at the end of each database page. This does not necessarily work, but it's worth trying. In principle, nothing prevents us breaking the page alignment by storing MACs/nonces between the database pages on disk. But that just feels dirty and wrong...
Easy workaround is to use URI filenames to pass the key for attached database. It is quite likely we end up breaking backwards compatibility if switching to VFS-based encryption scheme. I have been lately designing a new non-backwards-compatible cryptosystem for sqleet v1, which you might find interesting. See issue #35 for details. |
It's certainly a major effort. However, now after some digging in the SQLite code I'm rather confident that it will not require unreasonably more hacks than the current approach.
Correct. It should be possible to apply encryption also to file based temporary databases, if necessary.
IMHO it is unlikely that VFS-based encryption can be implemented as a runtime loadable extension. Loadable extensions have only access to the public SQLite API, but access to internal structures and use of certain internal functions is required. This could only be avoided by duplicating internal SQLite code, but this would make maintenance much more difficult.
Due to personal circumstances my capacity is limited, too. However, I hope to have a first (partial) implementation ready by the end of March.
SQLite accesses journal and WAL files by separately (and sometimes only partially) reading and writing frame headers and actual page content. This makes applying encryption to the whole file difficult.
This is true only for SQLite itself. For separate encryption agnostic test applications or tools such journal or WAL files would be complete garbage (or at least checksums would be wrong).
AFAICT actual database files can be handled in almost the same way as with the current approach.
Well, that's what SQLite does with the unencrypted bytes 16 to 23 in the database header: these bytes hold the information about page size and reserved bytes per page.
I suspect that this is not possible with the current approach (if you don't follow the SQLite way of unencrypted bytes 16 to 23 in the database header), but with the VFS approach the first read of the database header could be detected and a template header could be presented to SQLite in case of a new empty database file. However, for existing databases the first read of the database header takes place, before you have the chance to set up the encryption mechanism - unless you force the users to use the
Yes, indeed.
Nevertheless this requires the user to change his SQL code. And as mentioned above I don't think that using the
Well, I certainly intend to keep backward compatibility, and I'll try my best to accomplish that.
I will take a closer look later on. |
I found something interesting: CEVFS - SQLite3 Compression Encryption VFS. The encryption seems to lack authentication (MACs), and combining compression with encryption is generally a bad idea. Nonetheless, the approach seems to be compressing and encrypting database pages individually, and mapping reads & writes to the custom compressed and encrypted page format using a VFS shim. This roughly corresponds to a VFS-based encryption with MACs/nonces stored between (unaligned) pages.
Yes. I do not think that matters to most users, though.
Encryption is not that difficult. For example, ChaCha20, AES-CTR & AES-GCM ciphers support random accesses (reads/writes) in O(1) time (whereas AES-CBC requires O(n) time). However, maintaining MACs and avoiding stream cipher attacks with ChaCha20 is annoying. Each write basically requires recomputing MACs for all pages affected by the write. If using ChaCha20, then affected pages must be fully re-encrypted with a newly generated key (or nonce). |
I've seen that project some time ago. The author intended to create an alternative to SQLite's CEROD and ZIPVFS extensions. However, I don't think this implementation can serve as a basis for our encryption extensions - not only due to the facts you already mentioned, but also, because the encryption key is bound to the VFS itself. That is, all databases are encrypted with the same key unless you instantiate individual VFSs for each database connection.
Yes, but IMHO this makes file access rather inefficient.
Most likely, you are right. And if necessary one could build tools which support the encryption extension.
IMHO it will be better to just encrypt the actual page content (as is done now) and keep away from file and frame headers. |
I managed to implement a first preliminary version of a VFS with encryption support. That is, I was able to create an encrypted database that could be used by the previous implementation of the encryption extension and vice versa. This version still lacks support for encryption of journal files. That will be the next task, unfortunately not a trivial task, because determining the page number is rather difficult. |
Great work! I'm still busy with other less interesting things. I'd seriously consider a "dumb" authenticated encryption for non-database files. Journal and WAL writes are (almost?) always appended to the end of the file, so maybe authenticating & encrypting them with a single key/nonce and (incremental) MAC would be sufficient in practice. Alternatively process the files in chunks of Special support for journal and WAL file formats is something I would do last, if at all. |
IMHO the SQLite VFS API is not primarily meant to be used for actual non-database files. It is meant to handle database files and journal files. Of course, the VFS API could be used by extensions for non-database file handling. However, in such a case this extension should either not use the encryption VFS shim or it should define specific open flags to distinguish its own file formats from ordinary database and journal files. For unknown file types the encryption VFS shim will simply pass through the calls to the underlying real VFS.
I don't believe that that will be a good efficient strategy. I'm not sure about the rollback journal, but frames of the WAL journal are at least updated in random order, and frames are updated only partially (e.g. recalculated checksums). That is, one would have to read the full frame, decrypt it, update it (partially), encrypt it, and write the full frame back to the file.
That would make access to journal files quite inefficient (as described above).
Well, in fact I don't intend to implement a generic encryption feature. Up until today the SQLite encryption API only handles database page content (be it in database or journal files). Files accessed by (loadable) extensions internally are not affected by the encryption at all. And I will keep it that way. The only issue I see is the use case of nested VFSs. In my preliminary implementation I instantiate the encryption VFS by connecting it as a shim to the default VFS, and then the encryption VFS is made the new default. However, users might want to inject their own VFS somewhere in the chain. In practice the user can always choose to use a different VFS for a main database or attached database. That is, the question to be answered is whether the new encryption extension should be initialized automatically in a certain specific way (as is done in the current preliminary implementation) or whether the user/application is responsible of initializing it in a meaningful way.
In fact, I chose a rather "dumb" approach for rollback and WAL journals: database page content is identified by the size of the chunk read or written, and measures are taken to identify the correct page number. It seems to work properly, but needs intensive further testing, of course. It will take some more time to clean up the code a bit, but I intend to make the new code available on Github by the end of March or early in April. |
Although I published a first preliminary implementation of VFS-level encryption (SQLite3 Multiple Ciphers), I have to admit that I got stuck a bit. Since the new implementation should of course work in all scenarios, I started to test in a multi-threaded environment with shared cache mode. Here things really get nasty. At the moment I regularly get error messages due to HMAC mismatches. However, the reason is not that the HMAC is actually corrupt, instead the wrong codec gets selected. Why and how can this happen? I had to dig in the SQLite code to find out what is going on. My current implementation is based on the assumption that for each database file in a database connection, In the published version of the code each file handle remembers only the last connection handle. As a consequence it could happen that the correct codec couldn't be found on initializing the encryption (the associated file is identified by the connection handle). In my yet unpublished code version the file handle remembers all connection handles. But this doesn't solve the problem, because stale connection handles are also on the list. The solution will be to rewrite the code in such a way that on initializing encryption not the connection handle but the filename is used to identify the associated file handle. However, thinking about this approach made me aware of another problem: Up to now a codec was attached to a database connection handle Imagine 2 database connections in shared cache mode accessing the same database file, that is, there is only one file handle. Now imagine the following actions: connection 1 establishes encryption with the correct key; thereafter, connection 2 establishes encryption, but uses accidentally a wrong key. The wrong codec of connection 2 replaces the correct codec of connection 1, because there is only one codec instance attached to the file handle. As a result, connection 1 will not be able to access the database anymore. 💥 One solution could be the following: if connection 2 tries to attach a codec, one could verify that the codec is identical to the already attached one, and to refuse to attach it if it doesn't match. The drawback is that removing a codec - for example, after a successful rekeying - wouldn't work either. Maybe a rekey operation should be prohibited in shared cache mode. Any ideas, how to go forward? |
A few days ago an issue was opened for my wxSQLite3 component regarding the use of
rekey
for a memory database. As mentioned on the SQLite SEE documentation page in-memory databases are not encrypted (usingPRAGMA key='passphrase';
has absolutely no effect). The user was satisfied with that information and simply removed the call to the rekey function that was causing problems for him.However, IMHO the current behaviour of the key and rekey functions if called accidentally for in-memory databases can cause problems and should therefore be addressed.
The problem affects not only wxSQLite3, but also sqleet, as can be verified by executing the following SQL command sequence in the sqleet shell:
The last select command results in displaying
Error: file is not a database
.I have not yet tracked down from where this error results. All I can say is that the page encryption/decryption function is never called while executing rekey.
AFAICT not only in-memory databases, but also temporary databases are affected. The question is how wxSQLite3 and sqleet should behave under those circumstances. Should the key and rekey functions detect whether they operate on an in-memory or temporary database and silently do nothing? Or should they issue an error message? What's your opinion?
The text was updated successfully, but these errors were encountered: