This guide details what you'll need to contribute to Materialize.
Materialize consists of several services written in Rust that are orchestrated by Kubernetes. Supporting build and test tools are written in a combination of Rust, Python, and Bash. Tests often use Docker Compose rather than Kubernetes to orchestrate interactions with other systems, like Apache Kafka.
Materialize depends on several components that are written in C and C++, so you'll need a working C and C++ toolchain. You'll also need to install:
- The CMake build system
- libclang
- PostgreSQL
- lld (on Linux, or set a custom
RUSTFLAGS
)
On macOS, if you install Homebrew, you'll be guided through the process of installing Apple's developer tools, which includes a C compiler and libclang. Then it's a cinch to install CMake and PostgreSQL.
brew install cmake postgresql
On Debian-based Linux variants, it's even easier:
sudo apt update
sudo apt install build-essential cmake postgresql-client libclang-dev lld
On other platforms, you'll have to figure out how to get these tools yourself.
Install Rust via rustup:
curl https://sh.rustup.rs -sSf | sh
We recommend that you do not install Rust via your system's package manager. We closely track the most recent version of Rust. The version of Rust in your package manager is likely too old to build Materialize.
For details on how we upgrade Rust see here.
Running Materialize locally requires a running CockroachDB server.
On macOS, when using Homebrew, CockroachDB can be installed and started via:
brew install materializeinc/cockroach/cockroach
brew services start cockroach
(We recommend use of our forked Homebrew tap because it runs CockroachDB using an in-memory store, which avoids slow filesystem operations on macOS.)
On Linux, we recommend using Docker:
docker run --name=cockroach -d -p 26257:26257 -p 26258:8080 cockroachdb/cockroach:v23.1.11 start-single-node --insecure
If you can successfully connect to CockroachDB with either
psql postgres://root@localhost:26257
or cockroach sql --insecure
, you're
all set.
Materialize's build and test infrastructure is largely written in Python;
running our integration tests, in particular, requires a local Python
environment. Most of this should be taken care of by the bin/pyactivate
script, which constructs a local virtual environment and keeps necessary
dependencies up to date.
We support, as a minimum version, the default Python provided in the most recent Ubuntu LTS release. As of October 2023 this is Python 3.10, provided in Ubuntu "Jammy Jellyfish". Earlier versions may work but are not supported. Our recommended installation methods are:
- macOS: Homebrew
- Linux: System package manager if possible, or community package repositories if necessary
- Windows: Microsoft App Store
- Cross-platform: Nix flake
If none of the above work well for you, these are a few other methods that have worked for us in the past, but are not formally supported:
The Confluent Platform bundles Apache ZooKeeper and Apache Kafka with several non-free Confluent tools, like the Confluent Schema Registry and Control Center. For local development, the Confluent CLI allows easy management of these services.
Confluent Platform is not required for changes that don't need Kafka integration. If your changes don't affect integration with external systems and can be fully exercised by SQL logic tests, we recommend not installing the Confluent Platform, as it is a rather heavy dependency. Most Materialize employees, or other major contributors, will probably need to run the full test suite and should therefore install the Confluent Platform.
First, install the CLI. As of early July 2022 you can run this command on macOS and Linux:
curl -sL --http1.1 https://cnfl.io/cli | sudo sh -s -- -b /usr/local/bin latest
If this no longer works, follow the instructions in the Confluent CLI documentation. Then please update this guide with the new instructions!
You will need JDK 8 or 11. The easiest way to install this is via Homebrew:
brew install --cask homebrew/cask-versions/temurin11
Then, download and extract the Confluent Platform tarball (when using bash, replace ~/.zshrc
with ~/.bashrc
):
INSTALL_DIR=$HOME/confluent # You can choose somewhere else if you like.
mkdir $INSTALL_DIR
curl http://packages.confluent.io/archive/7.0/confluent-7.0.1.tar.gz | tar -xzC $INSTALL_DIR --strip-components=1
echo export CONFLUENT_HOME=$(cd $INSTALL_DIR && pwd) >> ~/.zshrc
source ~/.zshrc
confluent local services start
When using bash, note that you need to create a .bash_profile
that sources .bashrc
to ensure
the above works with the Terminal app.
If you have multiple JDKs installed and your current JAVA_HOME points to an incompatible version, you can explicitly run confluent with JDK 8 or 11:
JAVA_HOME=$(/usr/libexec/java_home -v 1.11) confluent local services start
On Debian-based Linux variants, you can use APT to install Java and the Confluent Platform:
curl http://packages.confluent.io/deb/6.0/archive.key | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/6.0 stable main"
sudo apt update
sudo apt install openjdk-11-jre-headless confluent-community-2.13
echo export CONFLUENT_HOME=/ >> ~/.bashrc
source ~/.bashrc
confluent local services start
On other Linux variants, you'll need to make your own way through Confluent's installation instructions. Note that, at the time of writing, only Java 8 and 11 are supported.
Alternatively, it is possible to get an all-in-one tarball from
here. Then untar this to a
location, set $CONFLUENT_HOME
to this location and add $CONFLUENT_HOME/bin
to your $PATH. I found this to be the most convenient way to get confluent
and it also works in a distro neutral way (if you are using, Arch Linux for example).
First, clone this repository:
git clone [email protected]:MaterializeInc/materialize.git
Because the MaterializeInc organization requires two-factor authentication (2FA), you'll need to clone via SSH as indicated above, or configure a personal access token for use with HTTPS.
Then you can build Materialize. Because Materialize is a collection of several
Rust services that need to be built together, each service can be built
individually via Cargo, but we recommend using the bin/environmentd
script to
drive the process:
cd materialize
bin/environmentd [--release] [<environmentd arg>...]
Some crates are compiled to WebAssembly and published to npm. This is
accomplished through wasm-pack
. Install it by running:
cargo install wasm-pack
WASM builds can then be initiated through
./bin/wasm-build <path/to/crate>
WASM crates reside in misc/wasm/
Cargo workspace, and should be kept out of
the main Cargo workspace to avoid cache invalidation issues.
As mentioned above, Confluent Platform is only required need to test Kafka sources and sinks against a local Kafka installation. If possible, we recommend that you don't run the Confluent Platform if you don't need it, as it is very memory hungry.
If you do need the Confluent Platform running locally, execute the following commands:
confluent local services schema-registry start # Also starts ZooKeeper and Kafka.
You can also use the included confluent
CLI command to start and stop
individual services. For example:
confluent local services status # View what services are currently running.
confluent local services kafka start # Start Kafka and any services it depends upon.
confluent local services kafka log # View Kafka log file.
Beware that the CLI is fairly buggy, especially around service management.
Putting your computer to sleep often causes the service status to get out of
sync. In other words, trust the output of confluent local services <service> log
and ps ... | grep
over the output of confluent local services status
.
Still, it's reliable enough to be more convenient than managing each service
manually.
When the confluent local services are running, they can be examined via a web UI which defaults to http://localhost:9021.
It might happen that the start script says that it failed to start
zookeeper/kafka/schema-registry, but it actually starts them successfully, it
just can't detect them for some reason. In this case, you can just run
confluent local services schema-registry start
3 times, and then everything
is up.
Once things are built and CockroachDB is running, you can start Materialize:
bin/environmentd --reset -- --all-features --unsafe-mode
This should bootstrap a fresh Materialize instance. Once you see the logline "environmentd v listening...", you can connect to the database via:
psql -U materialize -h localhost -p 6875 materialize
This uses the external SQL port. If you wish to connect using a system account,
you can use the internal port with the mz_system
user:
psql -U mz_system -h localhost -p 6877 materialize
Console can point at your local environmentd. To use this feature, pass the internal console flag:
bin/environmentd -- --internal-console-redirect-url="https://local.console.materialize.com"
Then visit http://localhost:6878/internal-console/. This is a great way to dogfood the console, feedback is valuable.
Note there is no frontegg login in this mode, so all frontegg features are disabled.
Materialize embeds a web UI, which it serves from port 6876. If you're running Materialize locally, you can view the web UI at http://localhost:6876.
Developing the web UI can be painful, as by default the HTML, CSS, and JS source code for the UI gets baked into the binary, and so making a change requires a full rebuild of the binary.
To speed up the development cycle, you can enable the dev-web
feature like so:
cd src/environmentd
bin/environmentd --features=dev-web
In this mode, every request for a static file will reload the file from disk. Changes to standalone CSS and JS files will be reflected immediately upon reload, without requiring a recompile!
Note that dev-web
can only hot-reload the files in
src/environmentd/src/static
. The HTML templates in
src/environmentd/src/templates
use a compile-time templating library called
askama, and so changes to those templates necessarily require a recompile.
For details about adding a new JavaScript/CSS dependency, see the comment in
src/environmentd/build/npm.rs
.
Materialize's testing philosophy is sufficiently complex that it warrants its own document. See Developer guide: testing.
We use the following tools to perform automatic code style checks:
Tool | Use | Run locally with |
---|---|---|
Clippy | Rust semantic nits | cargo clippy |
rustfmt | Rust code formatter | cargo fmt |
Linter | General formatting nits | bin/lint |
cargo-udeps | Check for unused Rust dependencies | bin/unused-deps |
See the style guide for additional recommendations on code style.
Linting requires the following tools and Cargo packages to be installed:
- buf (installation guide)
- cargo-about (
cargo install cargo-about
) - cargo-hakari (
cargo install cargo-hakari
) - cargo-deplint (
cargo install cargo-deplint
)
See Developer guide: submitting and reviewing changes.
This repository has the following basic structure:
bin
contains scripts for contributor use.ci
contains configuration and scripts for CI.doc/developer
contains documentation for Materialize contributors, including this document.doc/user
contains the user-facing documentation, which is published to https://materialize.com/docs.misc
contains a variety of supporting tools and projects. Some highlights:misc/dbt-materialize
contains the Materialize dbt adapter.misc/python
contains Python developer tools, like mzbuild.misc/nix
contains an experimental Nix configuration for developing Materialize.misc/wasm
contains the Rust crates that are published to NPM as WebAssembly.misc/www
contains the source code for https://dev.materialize.com.
src
contains the primary Rust crates that comprise Materialize.test
contains test suites, which are described in Developer guide: testing.
We break our Rust code into crates primarily to promote organization of code by team, thereby introducing ownership and autonomy. As such, many crates are owned by a specific team (which does not preclude the existence of shared, cross-team crates).
Although the primary unit of code organization at the inter-team level is the
crate, modules within a crate are also useful for code organization, especially
because they are the level at which pub
visibility operates.
We make a best-effort attempt to document the ownership of the Rust code in this repository using GitHub's CODEOWNERS file.
You can create and view a relationship diagram of our crates by running the following command (this will require graphviz):
bin/crate-diagram
It is possible to view transitive dependencies of a select subset of roots by
specifying the --roots
flag with a comma separated list of crates:
bin/crate-diagram --roots mz-sql,mz-dataflow
The workspace-hack
crate speeds up rebuilds by
ensuring that all crates use the same features of all transitive dependencies in
the graph. This prevents Cargo from recompiling huge chunks of the dependency
graph when you move between crates in the workspace. For details, see the
hakari documentation.
If you add or remove dependencies on crates, you will likely need to regenerate
the workspace-hack
crate. You can do this by running:
cargo install --locked cargo-hakari
cargo hakari generate
cargo hakari manage-deps
CI will enforce that the workspace-hack
crate is kept up to date.
Where possible, we prefer to keep things in the main repository (a "monorepo" approach). There are a few exceptions:
- demos, which showcases several use cases for Materialize
- rust-dec, libdecnumber bindings for Rust
- materialize-dbt-utils, data build tool (dbt) utilities for Materialize
- Several custom Pulumi providers
Don't add to this list without good reason! Separate repositories are acceptable for:
-
Rapid iteration on new Materialize plugins or integrations, where the CI time or code quality requirements in the main repository would be burdensome. When the code is more stable, the repository should be integrated into the main Materialize repository.
-
External requirements that require a separate repository. For example, Pulumi providers are conventionally developed each in their own repository. Similarly, materialize-dbt-utils can only appear on dbt hub if it is developed in a standalone repository.
-
Stable foundational components where community contribution is desirable. For example, rust-dec is a very small package, and asking contributors to clone the entire Materialize repository would be a large barrier to entry. Changes to Materialize very rarely require changes in rust-dec, so maintaining the two separately does not introduce much overhead.
We use the Mozilla-developed native Rust cargo-vet tool for dependency auditing to help ensure the security and reliability of external dependencies. Cargo-vet allows developers to audit their dependencies by checking for known security vulnerabilities, licensing issues, and overall health of the dependencies, which include factors like maintenance status, version stability, and community trust.
For a developer, the basic workflow with cargo-vet starts with integrating it into their Rust development process. After installing cargo-vet locally, the developer runs it against their project’s Cargo.toml and Cargo.lock files. Cargo-vet will then analyze the list of dependencies and output a report detailing the status of each dependency.
All crates currently in use have been added to the audited list, but CI will fail on PRs with new unaudited dependencies (or after 'cargo update'). Complete information on using cargo-vet are available in the book at https://mozilla.github.io/cargo-vet/how-it-works.html but in summary:
- cargo vet inspect some-crate
- cargo vet certify some-crate
The 'certify' step will record an entry in the audits.toml file certifying the crate has been reviewed and is appropriate to use. Example:
[[audits.aws-sdk-s3]]
who = "Matt Arthur <[email protected]>"
criteria = "safe-to-deploy"
version = "0.26.0"
In principle, any text editor can be used to edit Rust code.
By default, we recommend that developers without a strong preference of editor use Visual Studio Code with the Rust-Analyzer plugin. This is the most mainstream setup for developing Materialize, and the one for which you are the most likely to be able to get help if something goes wrong. It's important to note that you should not install the "Rust" plugin, as it is known to conflict with Rust-Analyzer; the latter has far more advanced code navigation features and is the de-facto standard for developing Rust. If you use Rust-Analyzer, you may wish to change the target directory so it does not conflict with other cargo commands. You can do this by adding to the cargo check extra args "--target-dir" and "$NEWTARGET".
Visual Studio Code also works well for editing Python; to work on the Python code
in the Materialize repository, install the official Python extension from Microsoft
and add the following to your settings.json
.
{
"python.linting.mypyEnabled": true,
"python.analysis.extraPaths": [
"misc/python"
],
"python.defaultInterpreterPath": "misc/python/venv/bin/python"
}
If you prefer to use another editor, such as Vim or Emacs, we recommend that you install an LSP plugin with Rust-Analyzer. How to do so is beyond the scope of this document; if you have any issues, ask in one of the engineering channels on Slack.
If you are using Rust-Analyzer, you should configure it to conform to our style guide by setting the following options:
imports.granularity.group
=module
imports.prefix
=crate
Besides Rust-Analyzer, the only other known tool with good code navigation features is CLion along with its Rust plugin. This is a good choice for developers who prefer the JetBrains ecosystem, but we no longer recommend it by default, since Rust-Analyzer has long since caught up to it in maturity. If you are a Materialize employee, ask Nikhil Benesch on Slack for access to our corporate JetBrains license. If you're not yet sure you want to use CLion, you can use the 30-day free trial.
A few editor-specific add-ons and configurations have been authored to improve the editing of
Materialize-specific code. Check misc/editor
for add-ons that may be relevant for your editor
of choice.
The standard debuggers for Rust code are rust-lldb
on macOS, and rust-gdb
on GNU/Linux.
(It is also possible to run rust-lldb
on GNU/Linux if necessary for whatever reason).
These are wrappers around lldb
and gdb
, respectively, that endow them with slightly
improved capabilities for pretty-printing Rust data structures. Visual Studio Code
users may want to try the CodeLLDB
plugin.
Unfortunately, you will soon find that these programs work less well than the equivalent
tools for some other mainstream programming languages. In particular, inspecting
complex data structures is often tedious and difficult. For this reason, most developers routinely use
println!
statements for debugging, in addition to (or instead of) these standard debuggers.
To ensure each code change passes all style nits before pushing to GitHub,
symlink pre-push
into your local git hooks:
ln -s ../../misc/githooks/pre-push .git/hooks/pre-push
Some Materialize scripts have shell completion, and the latest versions of the completions files
are checked in to misc/completions
. The contents of this directory can be sourced into your shell,
and will stay updated as any changes are made.
To add the completions to bash, add the following to your ~/.bashrc
:
source /path/to/materialize/misc/completions/bash/*
For zsh, add the follow to your ~/.zshrc
:
source /path/to/materialize/misc/completions/zsh/*