Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of MessageReceiver and MessagePipe. #2

Merged
merged 23 commits into from
Jul 13, 2020
Merged

Conversation

rubdos
Copy link
Member

@rubdos rubdos commented Jun 18, 2020

Signal protobuf compilation

These files should be kept in sync with the protobuf directory of libsignal-service-java.

It would be cleaner to yield the compiled files OUT_DIR, but that gets ugly very fast because of rust-lang/rust#48250.

  • Do we force use protoc, or do we yield the choice to the dependee?
    Since we adopted prost, we're bound to protoc.
  • How do we resolve the use of the .proto files in this repo, LICENSE-wise?
    We don't think there's anything to do, beyond licensing this crate under GPLv3.

This part is merged.

MessageReceiver

  • TLS + custom root certificate through rustls
  • Interface in libsignal-service
  • Implementation in libsignal-service-actix
  • Acknowledge messages (keeping this untouched for now, so that I don't lose my own messages) (postponed in MessageReceiver is incomplete #7)

MessagePipe

  • Move WebSocket code that I had in Whisperfish here
  • Parse messages on the WebSocket
  • Acknowledge messages (keeping this untouched for now, so that I don't lose my own messages)

ServiceCipher

(possibly in a next branch, not sure yet)

  • Decrypt messages

Misc

  • How do we want to test this stuff? Is there a Signal (mock) server that we could use somewhere?

@rubdos rubdos requested a review from Michael-F-Bryan June 18, 2020 14:07
@rubdos
Copy link
Member Author

rubdos commented Jun 18, 2020

It might be that Travis does not have protoc on board. fixed

FWIW, the protobuf crate has three ways of generating Rust code from .proto files, of which the recommended is the way it's implement now. The other two either include a copy of protoc or reimplement protoc in Rust itself. We might consider switching to either implementation, or handing this control over to the dependee.

@rubdos rubdos force-pushed the MessageReceiver branch 3 times, most recently from 6c46e0b to 28a0806 Compare June 18, 2020 14:38
@shekohex
Copy link
Collaborator

Maybe we should use prost instead of protobuf
It dose the job better, it generates idiomatic Rust structures and enums from the .proto
It's Widely used by Rust Echosystem.

@Michael-F-Bryan
Copy link
Contributor

It would be cleaner to yield the compiled files OUT_DIR, but that gets ugly very fast because of rust-lang/rust#48250.

We use gRPC and protobufs at work with Rust as the server and it works really well for us.

This is (roughly) our setup:

  1. Proto files are stored in a separate repository
  2. The project needing access to the proto files adds that repo as a git submodule (e.g. under vendor/proto)
  3. I create a separate crate containing just the generated language bindings (generated with tonic_build, which uses prost under the hood)
  4. The gRPC server then depends on this language bindings crate to do its thing

The build script for compiling the language bindings is a bit non-trivial because we have many proto files and also import the well-known types (which are also vendored in the proto repo.

Language bindings build script
// vendor/proto/build.rs

use std::{
    error::Error,
    ffi::OsStr,
    path::{Path, PathBuf},
};
use walkdir::WalkDir;

fn main() -> Result<(), Box<dyn Error>> {
    let crate_dir = Path::new(env!("CARGO_MANIFEST_DIR"));
    let project_root = crate_dir.parent().unwrap().parent().unwrap();
    let with_grpc = std::env::var("CARGO_FEATURE_GRPC_SERVICES").is_ok();

    compile_all_protobufs(&project_root, with_grpc)?;

    Ok(())
}

fn compile_all_protobufs(
    project_root: &Path,
    with_grpc: bool,
) -> Result<(), Box<dyn Error>> {
    let proto_files = find_proto_files(project_root, &["project_1", "project_2"])?;

    for file in &proto_files {
        println!("cargo:rerun-if-changed={}", file.display());
    }

    let format = rustfmt_is_available();

    if !format {
        println!("cargo:warning=The generated bindings won't be formatted because rustfmt isn't installed.");
    }

    tonic_build::configure()
        .build_client(with_grpc)
        .build_server(with_grpc)
        .format(format)
        .compile(&proto_files, &[project_root.to_path_buf()])?;

    Ok(())
}

fn rustfmt_is_available() -> bool {
    use std::process::{Command, Stdio};

    let status = Command::new("rustfmt")
        .arg("--version")
        .stdout(Stdio::null())
        .stderr(Stdio::null())
        .status();

    match status {
        Ok(s) => s.success(),
        Err(_) => false,
    }
}

fn find_proto_files(
    project_root: &Path,
    folders: &[&str],
) -> Result<Vec<PathBuf>, Box<dyn Error>> {
    let mut paths = Vec::new();

    for folder in folders {
        let entries = WalkDir::new(project_root.join(folder))
            .contents_first(true)
            .into_iter()
            .filter_entry(|e| {
                e.path().extension() == Some(OsStr::new("proto"))
            });

        for entry in entries {
            paths.push(entry?.into_path());
        }
    }

    Ok(paths)
}

From there, you can use include!() and OUT_DIR to include the files using whatever module structure you want.

Importing generated code
// vendor/src/lib.rs

//! Rust bindings to the various [Protocol Buffer][pb] messages used at company.
//!
//! [pb]: https://developers.google.com/protocol-buffers

pub mod project_1 {
    /// Automatically generated messages used by foo.
    ///
    /// See `project_1/foo/bar.proto` for more.
    pub mod bar {
        include!(concat!(
            env!("OUT_DIR"),
            "/",
            "company.project_1.foo.bar.rs"
        ));
    }
}

/// Protocol Buffers used by project 2.
pub mod project_2 {
    pub mod foo{
        include!(concat!(
            env!("OUT_DIR"),
            "/",
            "company.project_2.foo.rs"
        ));
    }

    pub mod server {
        include!(concat!(env!("OUT_DIR"), "/", "company.project_2.server.rs"));
    }


    pub mod bar {
        include!(concat!(env!("OUT_DIR"), "/", "company.project_2.bar.rs"));
    }
}

We also have a grpc-services feature so you can cut down on compile time by only generating protobufs and not needing to pull in tokio and tonic.

These files should be kept in sync with the protobuf directory of libsignal-service-java.

Git submodules work really well for keeping things in sync with a 3rd party project. If upstream has new changes, you just need to run git submodule update --remote to grab the latest commits.

How do we resolve the use of the .proto files in this repo, LICENSE-wise?

It seems pretty straightforward. Each *.proto file starts with "Licensed according to the LICENSE file in this repository." and the repo is GPLv3.

The GPL is a bit too restrictive for my liking, but considering we directly import/copy their files we have no choice but to make this project GPLv3 as well. I had to do the same with libsignal-protocol-rs as well.

@rubdos
Copy link
Member Author

rubdos commented Jun 19, 2020

Maybe we should use prost instead of protobuf
It dose the job better, it generates idiomatic Rust structures and enums from the .proto
It's Widely used by Rust Echosystem.

I looked both up on crates.io before I started this, protobuf had about twice the download count. They both seem maintained, and Mozilla seems to have picked prost too. I'll give it a shot, thanks for the suggestion.

@rubdos
Copy link
Member Author

rubdos commented Jun 19, 2020

It would be cleaner to yield the compiled files OUT_DIR, but that gets ugly very fast because of rust-lang/rust#48250.

We use gRPC and protobufs at work with Rust as the server and it works really well for us.

This is (roughly) our setup:

1. Proto files are stored in a separate repository

2. The project needing access to the proto files adds that repo as a git submodule (e.g. under `vendor/proto`)

3. I create a separate crate containing just the generated language bindings (generated with `tonic_build`, which uses `prost` under the hood)

4. The gRPC server then depends on this language bindings crate to do its thing

That sounds right for a larger project, but for just those four Proto files I would just have them in this repository, I think.

The build script for compiling the language bindings is a bit non-trivial because we have many proto files and also import the well-known types (which are also vendored in the proto repo.
Language bindings build script

Looks mostly as what I'm doing. I don't think we can use tonic, however, because Signal doesn't seem to use gRPC. tonic seems to use prost, not protobuf, so that's a second reason to get on with @shekohex 's suggestion!

From there, you can use include!() and OUT_DIR to include the files using whatever module structure you want.
Importing generated code

protobuf was generating inner attributes: I've tried including all proto files in a single file with a module structure, but rustc didn't like that. I had to make them in separate files :-(

I'd think I wouldn't have that issue with prost, so that's a third good reason.

We also have a grpc-services feature so you can cut down on compile time by only generating protobufs and not needing to pull in tokio and tonic.

That's mostly how we structure this project too.

These files should be kept in sync with the protobuf directory of libsignal-service-java.

Git submodules work really well for keeping things in sync with a 3rd party project. If upstream has new changes, you just need to run git submodule update --remote to grab the latest commits.

I'm hesitant to pulling in the whole libsignal-service-java thing in, just to have those four files. It doesn't seem like Signal provides those files in a separate repository either. Their Objective-C implementation seems to include a lot of more files too.

How do we resolve the use of the .proto files in this repo, LICENSE-wise?

It seems pretty straightforward. Each *.proto file starts with "Licensed according to the LICENSE file in this repository." and the repo is GPLv3.

The GPL is a bit too restrictive for my liking, but considering we directly import/copy their files we have no choice but to make this project GPLv3 as well. I had to do the same with libsignal-protocol-rs as well.

I have no issues with GPLv3 myself, and I figured the repo should be GPLv3 anyhow, because -protocol is already GPLv3 and we're looking at a ton of GPLv3 code to port from the Java and C versions.

Either way, I was wondering whether we had to make an additional effort to acknowledge that we directly compile upstream code, I'm not sure what the GPL says about that. I seem to recall there was something to it. It probably doesn't matter for now. Let's first get something to work; we're in the green either way.

@rubdos
Copy link
Member Author

rubdos commented Jun 19, 2020

So, they're in OUT_DIR now, and prost generates one Rust file per protobuf-package, so there's only one include command. I like it. Thanks for pushing me there @shekohex!

If everyone agrees to stay on prost, I'll squash that commit into the protobuf commit.

@rubdos rubdos force-pushed the MessageReceiver branch 2 times, most recently from fc7c3aa to 4436476 Compare June 20, 2020 10:30
@rubdos rubdos self-assigned this Jun 20, 2020
@rubdos rubdos force-pushed the MessageReceiver branch from 5837465 to 0501c8d Compare June 20, 2020 19:38
@rubdos
Copy link
Member Author

rubdos commented Jun 20, 2020

Good news, I'm at the point where getMessages() works™: the GET request works, the JSON gets parsed through serde. Next step is their actual decryption and delivery to the application that calls into signal-service!

@rubdos rubdos force-pushed the MessageReceiver branch from 0501c8d to 9b57c85 Compare June 21, 2020 09:34
@rubdos
Copy link
Member Author

rubdos commented Jul 5, 2020

(merged the protobuf stuff already)

@rubdos rubdos force-pushed the MessageReceiver branch 3 times, most recently from de2dfd1 to 0a28420 Compare July 9, 2020 17:14
@rubdos rubdos force-pushed the MessageReceiver branch from 0a28420 to fd43ed4 Compare July 9, 2020 18:59
@rubdos
Copy link
Member Author

rubdos commented Jul 9, 2020

In its current state, the messages get acknowledged as soon as they come over the socket. This mirrors the Java implementation, but I dislike this approach. I would like the application to tell when the message is actually stored.

This also means I just lost a few months of messages 😭

@rubdos rubdos force-pushed the MessageReceiver branch 3 times, most recently from 75491b5 to 2658326 Compare July 13, 2020 10:49
@rubdos
Copy link
Member Author

rubdos commented Jul 13, 2020

We even have a single unit test now! :'-)

@rubdos rubdos force-pushed the MessageReceiver branch from 7ad2e31 to 69671c4 Compare July 13, 2020 11:40
@rubdos
Copy link
Member Author

rubdos commented Jul 13, 2020

ServiceCipher will be in a next branch. Gonna get this in master now.

@rubdos rubdos marked this pull request as ready for review July 13, 2020 11:45
@rubdos rubdos merged commit 44dda63 into master Jul 13, 2020
@rubdos rubdos deleted the MessageReceiver branch July 13, 2020 11:46
gferon added a commit that referenced this pull request Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants