-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Worker Manager into DB for parallel decryption/encryption #220
Comments
Some things I realised. The DB serialize and deserialize should be extracted out of the encryption and decryption. This is because you are transferring data to the workers, you have to serialize and deserialize all the time, and you might as well do it in the main thread. The serialisation and deserialisation can be made more efficient in these ways:
One of the problems is that we won't know whether we are dealing with JSON or if we are dealing with strings or buffers. We know that at the end it is converted to bytes by leveldown. And I believe leveljs uses Node buffers, and we would preserve this to avoid another binary encoding/decoding being done by leveljs. So therefore it up the reader/writer to know whether they should be using JSON for a given level. In other cases, the raw bytes should be used instead. This can help in the case where EFS is storing blocks of bufffers. Once we finally start using array buffers. It's possible for us to copy the key array buffer (not transfer it), while transferring the plain text array buffer to be encrypted by the threads, and having the worker transfer us the decrypted plain text array buffer or the encrypted cipher text array buffer. All of this should be done with a proper benchmark suite to see if we get performance improvements. |
Note that the underlying data is always |
Changing node-forge for node-crypto is a large operation. And that might involve changing to support ED25519 keys since node-forge still doesn't have it. It will impact how this is done for mobile devices later. |
To achieve this. We should standardize on the usage of DB between js-polykey and EFS. The new DB is being developed in EFS and so changes to it should eventually be ported to polykey. The worker manager integration requires more tests, requires benchmarking reports produced in the repos themselves, and to be done by our CI/CD. Remember CI/CD will only have 1 or 2 CPU cores, and should be the benchmark we work towards. The We also should eliminate the excessive copying going on, but we won't be able to use Node buffers for this reason. This is why when we are transferring data into worker, we will have to copy any data from a Node Buffer to a new |
Expanding on this with regards to EFS:
There are 2 major constraints:
Therefore this means:
PK So the whole architecture may look like:
|
This is now being done in |
The So first to integrate it into EFS first, then similar benchmarking procedures can be applied there too @tegefaulkes @scottmmorris. The Note that some changes are needed for |
|
In EFS, we have an opportunity to try and get the encryption and decryption utilities to operate on |
The forge library currently has a couple different types in use:
Right now we have been using binary strings and From the code, it looks like These hacks should not be needed if we are using WebCrypto, these constructs are all forge's own creations. |
One problem I realised is that LevelDB doesn't seem to have any support for For optimal architecture, leveldb would have to be changed to support |
Ok I've tried this now with node-forge. There's no point in trying to make the encryption/decryption use ArrayBuffer.
Anyway until we change over to webcrypto, the internal crypto operation does involve a bunch of copying. So we are essentially using When webcrypto does become available here, then we can change how the crypto works to more efficient. |
Note that the |
|
@tegefaulkes note that with new js-workers, the cores should be set 1 in PK testing. |
I might move DB out to |
Doing it here. https://github.com/MatrixAI/js-db Once this is done... all pieces should be ready to integrate into EFS and also PK. The new transaction features are available for PK to use as well. The new DB allows direct entry of buffers as the key, this should speed up any usage of lexicographic-integer as keys instead of having to serialise them as strings. Further as it supports the |
The js-db would have to take a "crypto" interface, and expect EFS and PK to supply those functions as perhaps callbacks. An encryption and decryption function. Right now it would have to work with Node |
One problem with abstracting the encryption/decryption, is that we now have this sort of interface: type Crypto = {
encrypt(key: ArrayBuffer, plainText: ArrayBuffer): ArrayBuffer;
decrypt(key: ArrayBuffer, cipherText: ArrayBuffer): ArrayBuffer | undefined;
}; I chose This new type works for both However I realised that webcrypto always makes this asynchronous... so the actual utility function will be async: type Crypto = {
encrypt(key: ArrayBuffer, plainText: ArrayBuffer): Promise<ArrayBuffer>;
decrypt(key: ArrayBuffer, cipherText: ArrayBuffer): Promise<ArrayBuffer | undefined>;
}; This can also be implemented in threadsjs as well since it supports asynchronous methods. Then the expectation is for the constructor of DB to pass this crypto utility belt, and to optionally & dynamically inject |
It is easy to wrap sync code as async code. You just wrap it in |
Something very important:
So you can pass a Node Buffer wherever that expects an But to make an |
The In doing so, the crypto functionality is now optional for DB. If it is not passed in, the data won't be encrypted. However because we want to do some tests of this form, I'm going to be adding example workers and example crypto into the tests code supported by node-forge for now. |
The |
Now integrating it into EFS. |
This is done in EFS. And with MatrixAI/js-encryptedfs#47 closed and this being done in https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205. Then this issue should be closed for now. As there's nothing further to do here. |
Specification
Using
ArrayBuffer
to avoid copying between main thread and worker threads:Additional context
Note that it turns out we cannot do a proper buffer transfer with Node buffers. The Node
.buffer
is never detached. Keys always have to be copied, but it is possible to optimize in the future if we useArrayBuffer
instead of NodeBuffer
. This can be an optimization for the future.Examples of EFS functions to do this:
Tasks
WorkerManager
as a dependency intoDB
class withsetWorkerManager
:serializeEncrypt
andunserializeDecrypt
to be asynchronous methods. There is no need to have synchronous versions, they are always used in an asynchronous way. These methods must then check if the worker manager is available, if yes, it should pass these buffers into the worker related functions.batch
ops can be done where multiple entries are encrypted in one go. This will reduce the amount of time spent in communicating to the threads considering that we have to copy Node buffers anyway.Related to #209. The overhead of copying to the threads and back can become quite significant I reckon. And we might want to investigate ways of using
ArrayBuffer
where we can, or whether copying buffer plus offset and length is better than just turning them into strings and copying.The text was updated successfully, but these errors were encountered: