-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data at rest encryption #1575
Comments
|
Hi @GiantKing , thanks for your reply!
|
Missing the authentication credentials required to connect KMS. |
Good Question One possible solution is that:
|
I added this as a non-goal.
|
apache/incubator-pegasus#1575 Cherry-pick from tikv@113b363 Summary: Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but provides an `KeyManager` API to enable key management per file. Also implements `AESBlockCipher` with OpenSSL. Test Plan: not tested yet. will update. Signed-off-by: Yi Wu <[email protected]> Signed-off-by: tabokie <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@3d44a33 Summary: Instead of using openssl's raw `AES_encrypt` and `AES_decrypt` API, which is a low level call to encrypt or decrypt exact one block (16 bytes), we change to use the `EVP_*` API. The former is deprecated, and will use the default C implementation without AES-NI support. Also the EVP API is capable of handing CTR mode on its own. Test Plan: will add tests Signed-off-by: Yi Wu <[email protected]> --------- Signed-off-by: Yi Wu <[email protected]> Co-authored-by: yiwu-arbug <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@2360562 Summary: Fix NewRandomRWFile and ReuseWritableFile misuse of `GetFile()` and `NewFile()`. See inline comments. Test Plan: manual test with tikv Signed-off-by: Yi Wu <[email protected]> Co-authored-by: yiwu-arbug <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@93e89a5 fix bug: tikv/tikv#9115 Summary: we need to update encryption metadata via encryption::DataKeyManager, which cannot combine with the actual file operation into one atomic operation. In RenameFile, when the src_file has been removed, power is off, then we may lost the file info of src_file next restart. Signed-off-by: Xintao [[email protected]](mailto:[email protected]) Signed-off-by: Xintao <[email protected]> Co-authored-by: Xintao <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@bbd27cf used LinkFile instead of RenameFile api of key manager. But LinkFile needs check the dst file information, in RenameFile logic, we don't care about that. So just skip encryption for current file. Signed-off-by: Xintao [[email protected]](mailto:[email protected])
apache/incubator-pegasus#1575 Cherry-pick from tikv@1868d12 Signed-off-by: Xintao <[email protected]> Signed-off-by: tabokie <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@4cebfc1 * Add SM4-CTR encryption algorithm * Adjust block size for sm4 encryption * Add UT for SM4 encryption * Adjust macros indentation for sm4 * Fix format for adding sm4 Signed-off-by: Jarvis Zheng <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@9464766 In some env, user installed openssl by yum install, and the openssl software may compiled with OPENSSL_NO_SM4 flag, so although the version is >= 1.1.1, but we still could not use sm4 in that situation. Signed-off-by: Jarvis Zheng <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@acc624f * hook delete dir in encrypted env * add a comment Signed-off-by: tabokie <[email protected]> Co-authored-by: Xinye Tao <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@14f36f8 (without compaction related code) * fix renaming encrypted directory Signed-off-by: tabokie <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@113b363 Summary: Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but provides an `KeyManager` API to enable key management per file. Also implements `AESBlockCipher` with OpenSSL. Test Plan: not tested yet. will update. Signed-off-by: Yi Wu <[email protected]> Signed-off-by: tabokie <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@3d44a33 Summary: Instead of using openssl's raw `AES_encrypt` and `AES_decrypt` API, which is a low level call to encrypt or decrypt exact one block (16 bytes), we change to use the `EVP_*` API. The former is deprecated, and will use the default C implementation without AES-NI support. Also the EVP API is capable of handing CTR mode on its own. Test Plan: will add tests Signed-off-by: Yi Wu <[email protected]> --------- Signed-off-by: Yi Wu <[email protected]> Co-authored-by: yiwu-arbug <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@2360562 Summary: Fix NewRandomRWFile and ReuseWritableFile misuse of `GetFile()` and `NewFile()`. See inline comments. Test Plan: manual test with tikv Signed-off-by: Yi Wu <[email protected]> Co-authored-by: yiwu-arbug <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@93e89a5 fix bug: tikv/tikv#9115 Summary: we need to update encryption metadata via encryption::DataKeyManager, which cannot combine with the actual file operation into one atomic operation. In RenameFile, when the src_file has been removed, power is off, then we may lost the file info of src_file next restart. Signed-off-by: Xintao [[email protected]](mailto:[email protected]) Signed-off-by: Xintao <[email protected]> Co-authored-by: Xintao <[email protected]>
Another pull request to facebook/rocksdb, facebook/rocksdb#7020, but it seems not updated near 3 years. |
apache/incubator-pegasus#1575 After all encryption related patches been cherry-picked from [tikv](https://github.com/tikv/rocksdb/commits/6.29.tikv) and merged, now we will improve the encrytion, including: - Fix action job `build-linux-encrypted_env-no_compression-no_openssl` to build binaries without openssl and compression libs correctly. - Fix action job `build-linux-encrypted_env-openssl` to export the `ENCRYPTED_ENV` enviroment variable correctly. - Don not skip tests which are skipped by TiKV. - Refactor `AESCTRCipherStream` and `AESEncryptionProvider` to support manage file key by the file itself, according to the design docs in [Data at rest encryption](apache/incubator-pegasus#1575). - Remove all KeyManager related codes. - Replace KeyManager tests by AES encryption tests. - Refactor encryption/encryption_test.cc and add more tests. - Make it possible to construct AESEncryptionProvider object via `EncryptionProvider::CreateFromString()` by registering a factory in "encryption" library. It's possible to construct an object by URI: `AES`, `AES://test` or `AES:<instance_key>,<EncryptionMethod>`. - `ldb` tool support to parse `--fs_uri` flags as the URI mentioned above. - Add tests to create AESEncryptionProvider object in `CreateEncryptedEnvTest.CreateEncryptedFileSystem` - `db_bench` support to run benchmark with encryption enabled, by adding new flags for `db_bench`, they are `encryption_method` and `encryption_instance_key`. - Move code from the exported header directory (i.e. include/rocksdb/encryption.h) to rocksdb internal (i.e. encryption/encryption.h), do not expose them to users. - Code format. Review hint: #17 shows all the code changes from the base branch (i.e. `pegasus-kv:v8.3.2-pegasus`), you can review it together to make sure the request branch `acelyc111:pk_enc_new` doesn't have vice effect on the base. Manual test: ``` // Generate some data. ./db_bench --encryption_method=AES128CTR --encryption_instance_key=test_instance_key --num=10000 // Dump WAL OK ./tools/ldb --fs_uri="provider=AES; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log ./tools/ldb --fs_uri="provider=AES://test; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log ./tools/ldb --fs_uri="provider=AES:test_instance_key,AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log // Dump WAL failed. Pass bad provider parameters to --fs_uri, e.g. ./tools/ldb --fs_uri="provider=AES1:test_instance_key,1AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log ./tools/ldb --fs_uri="provider=AES:bad_test_instance_key,AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log ./tools/ldb --fs_uri="provider=AES:test_instance_key,AES192CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log // The same to other ldb tools. ```
apache/incubator-pegasus#1575 Cherry-pick from tikv@113b363 Summary: Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but provides an `KeyManager` API to enable key management per file. Also implements `AESBlockCipher` with OpenSSL. Test Plan: not tested yet. will update. Signed-off-by: Yi Wu <[email protected]> Signed-off-by: tabokie <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@3d44a33 Summary: Instead of using openssl's raw `AES_encrypt` and `AES_decrypt` API, which is a low level call to encrypt or decrypt exact one block (16 bytes), we change to use the `EVP_*` API. The former is deprecated, and will use the default C implementation without AES-NI support. Also the EVP API is capable of handing CTR mode on its own. Test Plan: will add tests Signed-off-by: Yi Wu <[email protected]> --------- Signed-off-by: Yi Wu <[email protected]> Co-authored-by: yiwu-arbug <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@2360562 Summary: Fix NewRandomRWFile and ReuseWritableFile misuse of `GetFile()` and `NewFile()`. See inline comments. Test Plan: manual test with tikv Signed-off-by: Yi Wu <[email protected]> Co-authored-by: yiwu-arbug <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@93e89a5 fix bug: tikv/tikv#9115 Summary: we need to update encryption metadata via encryption::DataKeyManager, which cannot combine with the actual file operation into one atomic operation. In RenameFile, when the src_file has been removed, power is off, then we may lost the file info of src_file next restart. Signed-off-by: Xintao [[email protected]](mailto:[email protected]) Signed-off-by: Xintao <[email protected]> Co-authored-by: Xintao <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@bbd27cf used LinkFile instead of RenameFile api of key manager. But LinkFile needs check the dst file information, in RenameFile logic, we don't care about that. So just skip encryption for current file. Signed-off-by: Xintao [[email protected]](mailto:[email protected])
apache/incubator-pegasus#1575 Cherry-pick from tikv@1868d12 Signed-off-by: Xintao <[email protected]> Signed-off-by: tabokie <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@4cebfc1 * Add SM4-CTR encryption algorithm * Adjust block size for sm4 encryption * Add UT for SM4 encryption * Adjust macros indentation for sm4 * Fix format for adding sm4 Signed-off-by: Jarvis Zheng <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@9464766 In some env, user installed openssl by yum install, and the openssl software may compiled with OPENSSL_NO_SM4 flag, so although the version is >= 1.1.1, but we still could not use sm4 in that situation. Signed-off-by: Jarvis Zheng <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@acc624f * hook delete dir in encrypted env * add a comment Signed-off-by: tabokie <[email protected]> Co-authored-by: Xinye Tao <[email protected]>
apache/incubator-pegasus#1575 Cherry-pick from tikv@14f36f8 (without compaction related code) * fix renaming encrypted directory Signed-off-by: tabokie <[email protected]>
apache/incubator-pegasus#1575 After all encryption related patches been cherry-picked from [tikv](https://github.com/tikv/rocksdb/commits/6.29.tikv) and merged, now we will improve the encrytion, including: - Fix action job `build-linux-encrypted_env-no_compression-no_openssl` to build binaries without openssl and compression libs correctly. - Fix action job `build-linux-encrypted_env-openssl` to export the `ENCRYPTED_ENV` enviroment variable correctly. - Don not skip tests which are skipped by TiKV. - Refactor `AESCTRCipherStream` and `AESEncryptionProvider` to support manage file key by the file itself, according to the design docs in [Data at rest encryption](apache/incubator-pegasus#1575). - Remove all KeyManager related codes. - Replace KeyManager tests by AES encryption tests. - Refactor encryption/encryption_test.cc and add more tests. - Make it possible to construct AESEncryptionProvider object via `EncryptionProvider::CreateFromString()` by registering a factory in "encryption" library. It's possible to construct an object by URI: `AES`, `AES://test` or `AES:<instance_key>,<EncryptionMethod>`. - `ldb` tool support to parse `--fs_uri` flags as the URI mentioned above. - Add tests to create AESEncryptionProvider object in `CreateEncryptedEnvTest.CreateEncryptedFileSystem` - `db_bench` support to run benchmark with encryption enabled, by adding new flags for `db_bench`, they are `encryption_method` and `encryption_instance_key`. - Move code from the exported header directory (i.e. include/rocksdb/encryption.h) to rocksdb internal (i.e. encryption/encryption.h), do not expose them to users. - Code format. Review hint: #17 shows all the code changes from the base branch (i.e. `pegasus-kv:v8.3.2-pegasus`), you can review it together to make sure the request branch `acelyc111:pk_enc_new` doesn't have vice effect on the base. Manual test: ``` // Generate some data. ./db_bench --encryption_method=AES128CTR --encryption_instance_key=test_instance_key --num=10000 // Dump WAL OK ./tools/ldb --fs_uri="provider=AES; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log ./tools/ldb --fs_uri="provider=AES://test; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log ./tools/ldb --fs_uri="provider=AES:test_instance_key,AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log // Dump WAL failed. Pass bad provider parameters to --fs_uri, e.g. ./tools/ldb --fs_uri="provider=AES1:test_instance_key,1AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log ./tools/ldb --fs_uri="provider=AES:bad_test_instance_key,AES128CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log ./tools/ldb --fs_uri="provider=AES:test_instance_key,AES192CTR; id=EncryptedFileSystem" dump_wal --walfile=/tmp/rocksdbtest-1000/dbbench/000004.log // The same to other ldb tools. ```
apache/incubator-pegasus#1575 1. Update the status badge to pegasus-kv/rocksdb's own site. 2. Also aim to check whether all tests could pass after cherry-picking encryption related patches to 8.5.3 branch.
…cksdb (#1610) #1575 This patch changes Pegasus to use the `v8.5.3-pegasus-encrypt` branch of https://github.com/pegasus-kv/rocksdb.git repository. The `v8.5.3-pegasus-encrypt` branch is based on the official `v8.5.3` tag of facebook/rocksdb repository but adds the encryption feature which is implemented by the Pegasus team. There is nothing changed if not enable the encryption feature.
#1575 Set option -DWITH_OPENSSL=ON to build rocksdb with encryption feature enabled.
…rsion (#1614) #1575 Fix a build error on lower OpenSSL version, the error looks like: ``` 2023-09-19T02:53:45.4093185Z #11 924.7 /root/incubator-pegasus/thirdparty/build/Source/rocksdb/encryption/encryption.cc: In function 'const EVP_CIPHER* rocksdb::encryption::GetEVPCipher(rocksdb::encryption::EncryptionMethod)': 2023-09-19T02:53:45.4094191Z #11 924.7 /root/incubator-pegasus/thirdparty/build/Source/rocksdb/encryption/encryption.cc:112:44: error: cannot convert 'rocksdb::Status' to 'const EVP_CIPHER* {aka const evp_cipher_st*}' in return 2023-09-19T02:53:45.4094713Z #11 924.7 std::string(OPENSSL_VERSION_TEXT)); 2023-09-19T02:53:45.4094991Z #11 924.7 ^ 2023-09-19T02:53:45.5599505Z #11 924.7 gmake[5]: *** [CMakeFiles/rocksdb.dir/encryption/encryption.cc.o] Error 1 2023-09-19T02:53:45.5599938Z #11 924.7 gmake[4]: *** [CMakeFiles/rocksdb.dir/all] Error 2 2023-09-19T02:53:45.5600266Z #11 924.7 gmake[4]: *** Waiting for unfinished jobs.... ```
#1575 This patch introduces `PegasusEnv()` to obtain the `Env` instance used by RocksDB. Then it's possible to obtain an encrypted Env instance by `PegasusEnv(FileDataType::kSensitive)`, the encrypted Env is used for operating on sensitive files, the writing data to the file will be encrypted and the reading data from the file will be decrypted. Some file operate functions and related unit tests are added as well.
#1575 This is a dependent work to implement encryption at rest, we can use the capacity of rocksdb encryption after this patch. - Use rocksdb APIs to implement class `native_linux_aio_provider`. Both of the implementations are using `pread()` and `pwrite()` system calls, so there isn't significant performance changes, see the newly added simple benchmark performance comparation below. - Separate the file read and write operations for class `aio_provider`
#1575 User key-values will be redacted if encryption enabled.
#1575 - Mark all files as sensitive, thus all files will be encrypted when `encrypt_data_at_rest` is enabled - Eanble both true and false for config `encrypt_data_at_rest` is related tests - The FDS module has not implemented encryption feature yet, do not enable `encrypt_data_at_rest` if you are using FDS - Some small refacors
Motivation
There are some Pegasus users that store privacy data in Pegasus, it’s important to protect the data against unauthorized access by persons who gain access to the storage media used by Pegasus.
It's possible to support transparent data at rest encryption to provide a way to protect users’ data, which is transparent to users and straightforward to set up for operators.
Data at rest encryption refers to encrypting data for storage and decrypting it when reading the stored data. It uses symmetric encryption where the same key is used to encrypt and to decrypt the data. Keys need to be stored and handled securely as anyone with access to a key will be able to decrypt any data encrypted with it.
Cloud disk encryption
If your Pegasus clusters are deployed on public cloud service storages, it’s possible to use their own encryption solutions. See:
It’s not needed to enable Pegasus Data at rest encryption to avoid encrypting/decrypting data twice, which may lead to poor performance.
Goals
Non-Goals
TODO: It's possible to implement this, after all the data been full compacted, the data could transfer to plaintext/ciphertext.
TODO: It's possible to implement this after the cluster granularity encryption been implemented.
TODO: Same to the above.
TODO: Use TLS libs.
pegasus-spark only supports to read plaintext data from source, the generated data is in plaintext as well, it doesn't break the security. When load the generated plaintext data into Pegasus, the data will be encrypted if the encrypt_data_at_rest feature is enabled.
Cryptography overview
Symmetric-key algorithm
Symmetric-key algorithms are algorithms for cryptography that use the same cryptographic keys for both the encryption of plaintext and the decryption of cipher-text. The keys may be identical, or there may be a simple transformation to go between the two keys. The keys, in practice, represent a shared secret between two or more parties that can be used to maintain a private information link. The requirement that both parties have access to the secret key is one of the main drawbacks of symmetric-key encryption, in comparison to public-key encryption (also known as asymmetric-key encryption).
AES
Advanced Encryption Standard, is a block cipher with a block size of 128 bits, but three different key lengths: 128, 192 and 256 bits. AES supersedes the Data Encryption Standard (DES), the algorithm described by AES is a symmetric-key algorithm, meaning the same key is used for both encrypting and decrypting the data.
Block cipher
A block cipher is a deterministic algorithm that operates on fixed-length groups of bits, called blocks. Block ciphers are the elementary building blocks of many cryptographic protocols. They are ubiquitous in the storage and exchange of data, where such data is secured and authenticated via encryption.
A block cipher uses blocks as an unvarying transformation. Even a secure block cipher is suitable for the encryption of only a single block of data at a time, using a fixed key. A multitude of modes of operation have been designed to allow their repeated use in a secure way to achieve the security goals of confidentiality and authenticity. However, block ciphers may also feature as building blocks in other cryptographic protocols, such as universal hash functions and pseudorandom number generators.
ROT13
ROT13 ("rotate by 13 places") is a simple letter substitution cipher that replaces a letter with the 13th letter after it in the latin alphabet.
Because there are 26 letters (2×13) in the basic Latin alphabet, ROT13 is its own inverse; that is, to undo ROT13, the same algorithm is applied, so the same action can be used for encoding and decoding. The algorithm provides virtually no cryptographic security, and is often cited as a canonical example of weak encryption.
facebook/rocksdb uses ROT13 as an encryption sample.
block cipher mode of operation
In cryptography, a block cipher mode of operation is an algorithm that uses a block cipher to provide information security such as confidentiality or authenticity. A block cipher by itself is only suitable for the secure cryptographic transformation (encryption or decryption) of one fixed-length group of bits called a block. A mode of operation describes how to repeatedly apply a cipher's single-block operation to securely transform amounts of data larger than a block.
Most modes require a unique binary sequence, often called an initialization vector (IV), for each encryption operation. The IV has to be non-repeating and, for some modes, random as well. The initialization vector is used to ensure distinct ciphertexts are produced even when the same plaintext is encrypted multiple times independently with the same key. Block ciphers may be capable of operating on more than one block size, but during transformation the block size is always fixed. Block cipher modes operate on whole blocks and require that the last part of the data be padded to a full block if it is smaller than the current block size. There are, however, modes that do not require padding because they effectively use a block cipher as a stream cipher.
IV,Initialization Vector
In cryptography, an initialization vector (IV) or starting variable (SV) is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between (potentially similar) segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation.
CTR, Counter mode
Counter mode turns a block cipher into a stream cipher. It generates the next keystream block by encrypting successive values of a "counter". The counter can be any function which produces a sequence which is guaranteed not to repeat for a long time, although an actual increment-by-one counter is the simplest and most popular. The usage of a simple deterministic input function used to be controversial; critics argued that "deliberately exposing a cryptosystem to a known systematic input represents an unnecessary risk". However, today CTR mode is widely accepted, and any problems are considered a weakness of the underlying block cipher, which is expected to be secure regardless of systemic bias in its input. Along with CBC, CTR mode is one of two block cipher modes recommended by Niels Ferguson and Bruce Schneier.
OpenSSL
OpenSSL contains an open-source implementation of the SSL and TLS protocols. The core library, written in the C programming language, implements basic cryptographic functions and provides various utility functions. Wrappers allowing the use of the OpenSSL library in a variety of computer languages are available.
OpenSSL supports a number of different cryptographic algorithms, including AES mentioned above.
Design
Key management
For Pegasus , overview of the design:
<kms_url>/v1/key/<cluster_key_name>/_eek?eek_op=generate&num_keys=1
<kms_url>/v1/keyversion/<key_version>/_eek?eek_op=decrypt
with payload:
rocksdb::EncryptedEnv
to encrypt and encrypt FK.New Configurations
encrypt_data_at_rest
bool(false), Whether sensitive files should be encrypted on the file system.
encryption_key_length
int(128), Encryption key length. Can be 128, 192 or 256.
encryption_key_provider
string("default"), Key provider implementation to generate and decrypt server keys. Valid values are: 'default' (not for production usage), and 'hadoop-kms'.
hadoop_kms_url
string(""), Comma-separated list of Hadoop KMS server URLs. Must be set when 'encryption_key_provider' is set to 'hadoop-kms'.
encryption_cluster_key_name
string("kudu_cluster_key"), Name of the cluster key that is used to encrypt server encryption keys as stored in Hadoop KMS.
redact_logs
bool(false), Whether sensitive data (e.g. keys, values, table names) in logs should be redacted.
Implementation overview
RocksDB
Encryption file header
Encrypted Env has a fixed length of header, we can define it as 4096 (one page size).
The first of 64 bytes are used to store encryption information, including:
Encryption data
facebook/rocksdb uses ROT13 to encrypt data, it’s just a sample and can not be used in a product environment, we will use AES encryption algorithms.
tikv/rocksdb and Kudu have implemented AES encryption algorithms by using OpenSSL, we will use OpenSSL library as well.
Git repository
Because we are planning to add AES encryption on RocksDB, I guess it would a long journey to merge the modify code into the upstream facebook/rocksdb repository, so I suggest to maintenance Pegasus owned git repository (i.e. https://github.com/pegasus-kv/rocksdb), we can commit the patches to the upstream when the feature is fully tested and stable.
Now Pegasus uses official RocksDB 6.6.4, it’s a chance to upgrade the third-party library to the latest stable version (8.3.2 when write the doc).
Pegasus
Git repository
I'm planning to develop the functionality on the master branch of apache/incubator-pegasus after the 2.5 branch has been created.
Modules updates
native_linux_aio_provider
In fact the
native_linux_aio_provider
module doesn't use AIO since Pegasus 2.2.0, instead it usespwrite
andpread
.RocksDB uses
pwrite
andpread
too, it's possible to replace the underlying implementation of filesystem of Pegasus byrocksdb::Env
.rocksdb::Env
has a plenty of file operation features, includes mmap, direct io, prefetch, preallocate, encryption at rest, and so on, they are public APIs of RocksDB library, and we believe in the stability of RocksDB.So we will introduce
rocksdb::Env
to Pegasus as the underlying implementation of filesystem layer.plog
plog
usesnative_linux_aio_provider
, ifnative_linux_aio_provider
has implemented data at rest encryption,plog
has this feature logically.nfs
The nfs module is used to transfer files (e.g. rocksdb SST files) between replica servers. The files are encrypted if data at rest encryption is enabled, and different replica servers have different SK, so the nfs server side should support to decrypt data when uploading (by using the soure SK), the nfs client side should support to encrypt data when downloading (by using the target SK).
The nfs module uses
native_linux_aio_provider
too, so it's convenient to support encryption for nfs module.block service
The block server module is used to backup and restore data, it supports 3 type of targets, including local filesystem, Xiaomi FDS and Apache HDFS. We should also provide the encryption ability of block service to ensure the data security. However, the corresponding SK is needed to be backed up and restored along with the data, the backup SK will be used to decrypt data when downloading in restore stage, and the data will be encrypted again by using the replica server's own SK when writing in restore stage.
logs
User key-values printed in logs should be redacted.
others
Some other modules which read/write files are possible to use rocksdb::Env to refactor as well, e.g. the replica_app_info module.
Roadmap
Prepare the rocksdb repository
Commits are merged to https://github.com/pegasus-kv/rocksdb/tree/v8.3.2-pegasus-encrypt firstly.
Cherry-pick encryption related commits from TiKV
Commits are cherryp-icked from branch https://github.com/tikv/rocksdb/commits/6.29.tikv
Remove the key manager
Implement the self-served file key managment
Update rocksdb to 8.5.3
Other fixes of pegasus-kv/rocksdb
Pegasus use rocksdb::EncryptedEnv when data at rest encryption enabled
Refactor Pegasus to use rocksdb::Env to access other disk files
The text was updated successfully, but these errors were encountered: