Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept/early NodeJS crypto bindings (through N-API) #444

Closed
wants to merge 45 commits into from

Conversation

turt2live
Copy link
Member

@turt2live turt2live commented Dec 17, 2021

Fixes #417

Builds on top of #443


Please be gentle - I learned Rust while writing these.

Introduction

These bindings aren't perfect, but represent a bare minimum for a simple bot/appservice to hook itself up and do encryption in a stable way. Particularly missing is cross-signing, SSSS, key backup, verification, key sharing, etc - it's just basic encryption and decryption, plus media as a bonus.

This is working in the bot-sdk as turt2live/matrix-bot-sdk#187 - it is presently published to @turt2live/matrix-sdk-crypto-nodejs pending a home in the matrixdotorg npm. I don't really recommend that people try this on their own yet, so would somewhat advocate that we don't publish it properly just yet.

Before publishing formally, I'd like us to figure out Mac and Windows pre-builds to avoid downstream developers having to build the bindings locally. Linux (32 and 64 bit) bindings are supplied as part of the current release process.

System structure

This passes serialized JSON objects over the node/rust boundary as types tend to get messy and simply not work. Some types are translated directly, but most are serialized over a proxy layer.

Here's how it works (simplified):

+-------------+        +---------+                       +-----------+            +-------------+
| JSConsumer  |        | JSProxy |                       | RustProxy |            | RustMachine |
+-------------+        +---------+                       +-----------+            +-------------+
       |                    |                                  |                         |
       | Call function      |                                  |                         |
       |------------------->|                                  |                         |
       |                    |                                  |                         |
       |                    | Serialize to JSON (string)       |                         |
       |                    |---------------------------       |                         |
       |                    |                          |       |                         |
       |                    |<--------------------------       |                         |
       |                    |                                  |                         |
       |                    | Fling over n-api                 |                         |
       |                    |--------------------------------->|                         |
       |                    |                                  |                         |
       |                    |                                  | Deserialize JSON        |
       |                    |                                  |-----------------        |
       |                    |                                  |                |        |
       |                    |                                  |<----------------        |
       |                    |                                  |                         |
       |                    |                                  | Call function           |
       |                    |                                  |------------------------>|
       |                    |                                  |                         |

(similar process to send return value back)

The two Proxy objects in the diagram are the matrix-sdk-crypto-nodejs crate: it exposes a proper TypeScript interface on the Node side, and a translation layer on the Rust side to bring the functionality of the crypto crate through. Together, these hide the transport details as the objects pass over the n-api boundary.

Eventually we'll want to replace the JSON serialization with proper types, but for now this is functional and works. It's also still extremely resource efficient all things considered.

Notable notes

This doesn't support custom stores because the bindings for a second object are near impossible at the moment. We'll probably have to implement the stores in the rust-sdk and expose overloaded constructors to pass parameters between them. We'd have to do this without passing an object though, hence the sort of factory setup of the OlmMachine.

ara4n and others added 4 commits December 27, 2021 22:13
This also fails on `main`, but looks to be a change in clippy between 19 days ago and now
@turt2live turt2live changed the title WIP: NodeJS bindings (through N-API) Proof of concept/early NodeJS crypto bindings (through N-API) Dec 27, 2021
@turt2live
Copy link
Member Author

I don't really know why clippy is so angry all of a sudden, so am going to leave it alone now in favour of someone else looking at it.

@turt2live turt2live marked this pull request as ready for review December 27, 2021 23:08
@turt2live turt2live requested a review from poljar December 27, 2021 23:08
@jplatte
Copy link
Collaborator

jplatte commented Dec 28, 2021

Clippy is angry because new lints got added that fail(ed) for existing code. They were just fixed as part of #447, so rebasing should fix things (unless they also complain about parts of the new code).

@turt2live
Copy link
Member Author

well, now it's just unhappy about unused functions because, well, they are.

Copy link
Contributor

@poljar poljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks largely ok. The biggest issue is probably using block_on() everywhere, this works currently since sled will unlikely block but a PostgreSQL certainly could.

I didn't mention some obvious things like missing docs and proper error handling inline but those would be required as well.

Finally, I think that our repo here is getting a bit big, CI takes quite long to finish. I think that we should create a separate repo for our bindings like I created one for vodozemac over here: https://github.com/poljar/vodozemac-bindings.

key_claim_request_serialize(&request_id, &request)
}

pub fn outgoing_req_to_json(r: OutgoingRequest) -> Result<String, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to have a type for this? For example:

struct Request {
    kind: RequestKind,
    request_id: String,
    body: String,
}

That would avoid one of the serialization steps: rust_request -> serde_json::Value -> String.

You could even leave the body as a serialized string on the JS side and send it out as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most likely yes, though the types were causing more frustration than it was worth - probably worth a revisit

crates/matrix-sdk-crypto-nodejs/src/ts/OlmMachine.ts Outdated Show resolved Hide resolved
crates/matrix-sdk-crypto-nodejs/src/ts/OlmMachine.ts Outdated Show resolved Hide resolved
if (request['request_kind']) {
await this.runEngineRequest(request);
}
await this.runEngineUntilComplete();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above methods won't produce any outgoing requests.

}

public async updateTrackedUsers(userIds: string[]) {
this.machine.updateTrackedUsers(userIds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may or may not produce an outgoing request, but don't you want to wait for all calls to updateTrackedUsers that are coming from a single sync response to be done before you send out the outgoing request?

I think you'll have to deduplicate the /keys/query requests per user otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, probably. This is one of those things that should be solved with documentation I think (to remind the caller to actually run the query)


#[napi(object)]
#[derive(Serialize, Deserialize)]
pub struct Device {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has been copied from the uniffi based bindings. uniffi isn't quite great when your bound type wants to return another bound type. In this case OlmMachine returns a Device, but you would have to call free() on the Kotlin side to not leak memory. Instead I opted to return pure data in the uniffi based bindings.

You can just wrap RsDevice in napi-rs:

#[napi]
pub struct Device {
    inner: RSDevice,
}

#[napi]
impl Device {
    #[napi(getter)]
    pub fn user_id(&self) -> String {
        self.inner.user_id().to_string()
    }
}

mod machine;
mod models;
mod request;
mod responses;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole crate complains about unused types because nothing is here publicly exported. All your types are private types. Not sure why napi-rs doesn't complain about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because my IDE was complaining that it couldn't determine types of "unused" modules. I think napi-rs also requires the things to be referenced like this, so might be why it's not complaining?

Is the solution to just stick pub in front of these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do that or you could add exports for the public types, for example: pub use machine::OlmMachine.

&self,
events: String,
device_changes: DeviceLists,
key_counts: Map<String, Value>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can take a HashMap<String, i64> here to make this a bit more typesafe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, napi-rs doesn't have bindings for that (annoyingly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about that? This diff seems to, at least, compile:

diff --git a/crates/matrix-sdk-crypto-nodejs/src/machine.rs b/crates/matrix-sdk-crypto-nodejs/src/machine.rs
index c1ab0c308..f392ba8a9 100644
--- a/crates/matrix-sdk-crypto-nodejs/src/machine.rs
+++ b/crates/matrix-sdk-crypto-nodejs/src/machine.rs
@@ -224,15 +224,13 @@ impl SledBackedOlmMachine {
     }
 
     #[napi]
-    pub fn receive_sync_changes(
+    pub async fn receive_sync_changes(
         &self,
         events: String,
         device_changes: DeviceLists,
-        key_counts: Map<String, Value>,
+        key_counts: std::collections::HashMap<String, i64>,
         unused_fallback_keys: Option<Vec<String>>,
     ) -> Result<String> {
-        // key_counts: Map<String, String> (cast to Map<String, i32>)
-
         let events: ToDevice = serde_json::from_str(events.as_str())?;
         let device_changes: RumaDeviceLists = device_changes.into();
         let key_counts: BTreeMap<DeviceKeyAlgorithm, UInt> = key_counts
@@ -240,12 +238,7 @@ impl SledBackedOlmMachine {
             .map(|(k, v)| {
                 (
                     DeviceKeyAlgorithm::try_from(k).expect("Failed to convert key algorithm"),
-                    v.as_str()
-                        .expect("Failed to get string for number")
-                        .parse::<i32>()
-                        .unwrap()
-                        .try_into()
-                        .expect("Failed to convert to number"),
+                    UInt::try_from(v).expect("Invalid key count"),
                 )
             })
             .collect();
@@ -254,13 +247,14 @@ impl SledBackedOlmMachine {
             unused_fallback_keys.map(|u| u.into_iter().map(DeviceKeyAlgorithm::from).collect());
 
         let events = self
-            .runtime
-            .block_on(self.inner.receive_sync_changes(
+            .inner
+            .receive_sync_changes(
                 events,
                 &device_changes,
                 &key_counts,
                 unused_fallback_keys.as_deref(),
-            ))
+            )
+            .await
             .expect("Failed to handle sync changes");
 
         Ok(serde_json::to_string(&events)?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything I know is a lie - that didn't work previously, but could have been any number of factors.

private constructor(private _userId: string, private _deviceId: string, private engine: OlmEngine) {
}

private makeSled(dir: string): OlmMachine {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful if you have withSledBackend()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the way we do factories in JS, yea. withSledBackend is static so the values don't transfer to the inner instance.


let events = self
.runtime
.block_on(self.inner.receive_sync_changes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all methods here are async, you are converting them to be sync using block_on(). Uniffi sadly doesn't support automagical conversions betewen Rust async methods and Kotlin suspendable ones. napi-rs does, so we should use it.

let machine = self.inner.clone();
let events = self.runtime.spawn(async move {
    machine.receive_sync_changes(
        events,
        &device_changes,
        &key_counts,
        unused_fallback_keys.as_deref(),
    ).await
})
.await
.expect("Task paniced")
.expect("Failed to handle sync changes");

You might want to continue using block_on() for the constructor and perhaps some other hand picked methods but certainly not for all of these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually napi-rs seems to support tokio out of the box, so you don't need to create a runtime at all:

NAPI-RS supports the tokio runtime by default. If you await a tokio future in async fn, NAPI-RS will execute it in the tokio runtime and convert it into a JavaScript Promise.

More info here: https://napi.rs/docs/concepts/async-fn.

@turt2live
Copy link
Member Author

Thanks for the review @poljar - will take a look at moving it to its own repo in the new year.

For error handling, I was having quite a few issues with napi-rs not liking the error types from call()? sort of syntax, and converting the errors looked non-trivial, so ended up abusing expect() everywhere. Part of this is because I only learned Rust while writing these, but part of it is also just trying to get it to compile 🙈. For decryption I ended up re-wrapping the error by hand - is this the correct way to handle errors?

@poljar
Copy link
Contributor

poljar commented Dec 29, 2021

For decryption I ended up re-wrapping the error by hand - is this the correct way to handle errors?

Sadly it seems to be so, napi-rs is a bit terse on the error handling docs. I suggest you create some helper functions that convert the errors (most errors are CryptoStoreError). The helper then converts the Rust error to a napi-rs error and you can use result.map_err(my_helper_function)? to make this bearable.

@turt2live
Copy link
Member Author

The napi-rs community is pretty responsive, so might seek their help in the new year.

turt2live added a commit to matrix-org/matrix-rust-sdk-bindings that referenced this pull request Feb 7, 2022
@turt2live
Copy link
Member Author

started moving this to https://github.com/matrix-org/matrix-rust-sdk-bindings - will keep iterating based on review feedback

@turt2live turt2live closed this Feb 7, 2022
@turt2live turt2live deleted the travis/node-bindings branch February 7, 2022 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add node bindings
4 participants