Skip to content

Commit

Permalink
Documentation updates (#22)
Browse files Browse the repository at this point in the history
* document structure, minor fixes

* add more docs

* even more docs

* Update docs/introduction/architecture.mdx

Co-authored-by: Maha Hajja <[email protected]>

* Update docs/introduction/getting-started.mdx

Co-authored-by: Maha Hajja <[email protected]>

* fix review comments

* add doc about referencing connectors

* simplify connector introduction

Co-authored-by: Maha Hajja <[email protected]>
  • Loading branch information
lovromazgon and maha-hajja authored Nov 4, 2022
1 parent b027c21 commit 3433960
Show file tree
Hide file tree
Showing 26 changed files with 729 additions and 84 deletions.
4 changes: 0 additions & 4 deletions docs/Deploy/_category_.json

This file was deleted.

4 changes: 4 additions & 0 deletions docs/configuration/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Configuration",
"position": 1
}
99 changes: 99 additions & 0 deletions docs/configuration/pipeline-configuration-files.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: 'Pipeline Configuration Files'
slug: 'pipeline-configuration-files'
---

Pipeline configuration files give you the ability to define pipelines that are provisioned by Conduit at startup.
It's as simple as creating a YAML file that defines pipelines, connectors, processors, and their corresponding configurations.

## Getting started

Create a folder called `pipelines` at the same level as your Conduit binary file, add all your YAML files
there, then run Conduit using the command:
```
./conduit
```
Conduit will only search for files with `.yml` or `.yaml` extensions, recursively in all sub-folders.

If you have your YAML files in a different directory, or want to provision only one file, then simply run Conduit with
the CLI flag `pipelines.path` and point to your file or directory:
```
./conduit -pipeline.path ../my-directory
```
If your directory does not exist, Conduit will fail with an error: `"pipelines.path" config value is invalid`

### YAML Schema

The file in general has two root keys, the `version`, and the `pipelines` map. The map consists of other elements like
`status` and `name`, which are configurations for the pipeline itself.

To create connectors in that pipeline, simply add another map under the pipeline map, and call it `connectors`.

To create processors, either add a `processors` map under a pipeline ID, or under a connector ID, depending on its parent.
Check this YAML file example with explanation for each field:

``` yaml
version: 1.0 # parser version, the only supported version for now is 1.0 [mandatory]

pipelines: # a map of pipelines IDs and their configurations.
pipeline1: # pipeline ID, has to be unique.
status: running # pipelines status at startup, either running or stopped. [mandatory]
name: pipeline1 # pipeline name, if not specified, pipeline ID will be used as name. [optional]
description: desc # pipeline description. [optional]
connectors: # a map of connectors IDs and their configurations.
con1: # connector ID, has to be unique per pipeline.
type: source # connector type, either "source" or "destination". [mandatory]
plugin: builtin:file # connector plugin. [mandatory]
name: con3 # connector name, if not specified, connector ID will be used as name. [optional]
settings: # map of configurations keys and their values.
path: ./file1.txt # for this example, the plugin "bultin:file" has only one configuration, which is path.
con2:
type: destination
plugin: builtin:file
name: file-dest
settings:
path: ./file2.txt
processors: # a map of processor IDs and their configurations, "con2" is the processor parent.
proc1: # processor ID, has to be unique for each parent
type: js # processor type. [mandatory]
settings: # map of processor configurations and values
Prop1: string
processors: # processor IDs, that have the pipeline "pipeline1" as a parent.
proc2:
type: js
settings:
prop1: ${ENV_VAR} # yon can use environmental variables by wrapping them in a dollar sign and curly braces ${}.
```
If the file is invalid (missed a mandatory field, or has an invalid configuration value), then the pipeline that has the
invalid value will be skipped, with an error message logged.
If two pipelines in one file have the same ID, or the `version` field was not specified, then the file would be
non-parsable and will be skipped with an error message logged.

If two pipelines from different files have the same ID, the second pipeline will be skipped, with an error message
specifying which pipeline was not provisioned.

**_Note_**: Connector IDs and processor IDs will get their parent ID prefixed, so if you specify a connector ID as `con1`
and its parent is `pipeline1`, then the provisioned connector will have the ID `pipeline1:con1`. Same goes for processors,
if the processor has a pipeline parent, then the processor ID will be `connectorID:processorID`, and if a processor
has a connector parent, then the processor ID will be `pipelineID:connectorID:processorID`.

## Pipelines Immutability

Pipelines provisioned by configuration files are **immutable**, any updates needed on a provisioned pipeline have to be
done through the configuration file. You can only control stopping and starting a pipeline
through the UI or API.

### Updates and Deletes

Updates and deletes for a pipeline provisioned by configuration files can only be done through the configuration files.
Changes should be made to the files, then Conduit has to be restarted to reload the changes. Any updates or deletes done
through the API or UI will be prohibited.

* To delete a pipeline: simply delete it from the `pipelines` map from the configuration file, then run Conduit again.
* To update a pipeline: change any field value from the configuration file, and run Conduit again to address these updates.

Updates will preserve the status of the pipeline, and will continue working from where it stopped. However, the pipeline
will start from the beginning of the source and will not continue from where it stopped, if one of these values were updated:
{`pipeline ID`, `connector ID`, `connector plugin`, `connector type`}.
4 changes: 4 additions & 0 deletions docs/connectors/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Connectors",
"position": 4
}
58 changes: 58 additions & 0 deletions docs/connectors/behavior.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: "Connector Behavior"
sidebar_label: "Behavior"
slug: "behavior"
sidebar_position: 2
---

This document provides insights on how Conduit communicates with a connector.

## Conduit Connector Protocol

Conduit expects all connectors to follow the
[Conduit Connector Protocol](https://github.com/ConduitIO/conduit-connector-protocol).
The connector protocol is a set of protobuf files describing
the [interface](#protocol-grpc-interface)
between Conduit and the connector in the form of gRPC services. This approach
allows connectors to be written in any language with support for gRPC.

The connector protocol splits the connector interface in 3 gRPC services - one
for the source, another for the destination, and a third one for the connector
specifications. A connector needs to implement the specifications and at least
the source or destination.

Note that you don't need to use the connector protocol directly - we provide a
[Go connector SDK](https://github.com/ConduitIO/conduit-connector-sdk) that
hides the complexity of the protocol and simplifies the implementation of a
connector.

### Standalone vs built-in connectors

While the Conduit Connector Protocol decouples Conduit from its connectors by
using gRPC, it also provides a thin Go layer that allows any Go connector to be
compiled into the Conduit binary as a built-in connector. The following diagram
shows how Conduit communicates with a standalone connector and a built-in
connector.

![Standalone vs built-in connectors](/images/standalone-vs-builtin.svg)

**Standalone connectors** are run as separate processes, separate from the
Conduit process. They need to have an entrypoint (binary or script) which runs
the connector and starts the gRPC server responsible for communicating with
Conduit. A standalone connector process is started and stopped by Conduit on
demand. One connector process will be started for every pipeline connector in
Conduit.

**Built-in connectors** on the other hand are executed in the same process as
Conduit and communicate with Conduit through Go channels instead of gRPC. Any
connector written in Go can be compiled into the Conduit binary and used as a
built-in connector.

Find out more about the [Conduit connector plugin architecture](https://github.com/ConduitIO/conduit/blob/main/docs/architecture-decision-records/20220121-conduit-plugin-architecture.md).

## Protocol gRPC Interface

The protocol interface is hosted on the
[Buf schema registry](https://buf.build/conduitio/conduit-connector-protocol/docs/main:connector.v1).
Use it as a starting point when implementing a connector in a language other
than Go.
19 changes: 19 additions & 0 deletions docs/connectors/building.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: "Building Connectors"
slug: "building-connectors"
sidebar_position: 3
---

Conduit connectors can be built in any programming language that supports gRPC.
To make it easier to write connectors we provide
a [Connector SDK](https://github.com/ConduitIO/conduit-connector-sdk) written in
Go. Using the SDK is the recommended way of writing a Conduit connector.

## Conduit connector template

The easiest way to start implementing your own Conduit connector is by using the
[Conduit connector template](https://github.com/ConduitIO/conduit-connector-template).
It contains the basic project structure as well as some additional utilities
like GitHub actions and a Makefile.

Find out more about the template and how to use it in the readme.
50 changes: 50 additions & 0 deletions docs/connectors/installing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: "Installing Connectors"
slug: "installing-connectors"
sidebar_position: 0
---

Conduit ships with a number of built-in connectors:

- [File connector](https://github.com/ConduitIO/conduit-connector-file) provides
a source/destination to read/write a local file (useful for quickly trying out
Conduit without additional setup).
- [Kafka connector](https://github.com/ConduitIO/conduit-connector-kafka)
provides a source/destination for Apache Kafka.
- [Postgres connector](https://github.com/ConduitIO/conduit-connector-postgres)
provides a source/destination for PostgreSQL.
- [S3 connector](https://github.com/ConduitIO/conduit-connector-s3) provides a
source/destination for AWS S3.
- [Generator connector](https://github.com/ConduitIO/conduit-connector-generator)
provides a source which generates random data (useful for testing).

Besides these connectors there is a number of standalone connectors that can be
added to Conduit as plugins (find the complete
list [here](https://github.com/ConduitIO/conduit/blob/main/docs/connectors.md)).

### Standalone Connector Binary

To install a standalone connector you first need the compiled connector binary.
A binary can normally be downloaded from the latest release in the connector's
GitHub repository (this may vary in 3rd party connectors not developed by the
Conduit team). Make sure to download the binary that matches your operating
system and architecture.

Alternatively you can build the binary yourself (for instructions on building a
connector please refer to the readme of that specific connector).

## Installing a Connector in Conduit

Conduit loads standalone connectors at startup. The connector binaries need to
be placed in the `connectors` directory relative to the Conduit binary so
Conduit can find them. Alternatively, the path to the standalone connectors can
be adjusted using the CLI flag `-connectors.path`, for example:

```shell
./conduit -connectors.path=/path/to/connectors/
```

Names of the connector binaries are not important, since Conduit is getting the
information about connectors from connectors themselves (using their gRPC API).

Find out how to [reference your connector](/docs/connectors/referencing-connectors).
40 changes: 40 additions & 0 deletions docs/connectors/referencing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: "Referencing Connectors"
slug: "referencing-connectors"
sidebar_position: 1
---

The name used to reference a connector in API requests (e.g. to create a new
connector) comes in the following format:

`[PLUGIN-TYPE:]PLUGIN-NAME[@VERSION]`

- `PLUGIN-TYPE` (`builtin`, `standalone` or `any`)
- Defines if the specified plugin should be builtin or standalone.
- If `any`, Conduit will use a standalone plugin if it exists and fall back to
a builtin plugin.
- Default is `any`.
- `PLUGIN-NAME`
- Defines the name of the plugin as specified in the plugin specifications, it
has to be an exact match.
- `VERSION`
- Defines the plugin version as specified in the plugin specifications, it has
to be an exact match.
- If `latest`, Conduit will use the latest semantic version.
- Default is `latest`.

Examples:

- `postgres`
- will use the **latest** **standalone** **postgres** plugin
- will fallback to the **latest** **builtin** **postgres** plugin if
standalone wasn't found
- `[email protected]`
- will use the **standalone** **postgres** plugin with version **v0.2.0**
- will fallback to a **builtin** **postgres** plugin with version **v0.2.0**
if standalone wasn't found
- `builtin:postgres`
- will use the **latest** **builtin** **postgres** plugin
- `standalone:[email protected]`
- will use the **standalone** **postgres** plugin with version **v0.3.0** (no
fallback to builtin)
4 changes: 4 additions & 0 deletions docs/deploy/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Deploy",
"position": 3
}
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,12 @@ Here is an overview of the Conduit Architecture.

![Conduit Architecture](/images/conduit/conduit-diagram.svg)

Conduit is split in the following layers:
Conduit is split into the following layers:
* **API layer** - exposes the public APIs used to communicate with Conduit. It exposes 2 types of APIs:
* **gRPC** - this is the main API provided by Conduit. The gRPC API definition can be found in
[api.proto](../proto/api/v1/api.proto), it can be used to generate code for the client.
[api.proto](https://github.com/ConduitIO/conduit/blob/main/proto/api/v1/api.proto), it can be used to generate code for the client.
* **HTTP** - the HTTP API is generated using [grpc-gateway](https://github.com/grpc-ecosystem/grpc-gateway) and
forwards the requests to the gRPC API. Conduit exposes an
[openapi](../pkg/web/openapi/swagger-ui/api/v1/api.swagger.json) definition that describes the HTTP API, which is
forwards the requests to the gRPC API. Conduit exposes an openapi definition that describes the HTTP API, which is
also exposed through Swagger UI on `http://localhost:8080/openapi/`.
* **Orchestration layer** - the orchestration layer is responsible for coordinating the flow of operations between the
core services. It also takes care of transactions, making sure that changes made to specific entities are not visible
Expand Down Expand Up @@ -52,7 +51,7 @@ Conduit is split in the following layers:
* **Plugins** - while this is not a layer in the same sense as the other layers, it is a component separate from
everything else. It interfaces with the connector on one side and with Conduit plugins on the other and facilitates
the communication between them. A Conduit plugin is a separate process that implements the interface defined in
[plugins.proto](https://github.com/ConduitIO/conduit/blob/main/pkg/plugins/proto/plugins.proto) and provides the
[conduit-connector-protocol](https://github.com/ConduitIO/conduit-connector-protocol) and provides the
read/write functionality for a specific resource (e.g. a database).

For more see [GitHub](https://github.com/ConduitIO/conduit/blob/main/docs/architecture.md).
For more see [GitHub](https://github.com/ConduitIO/conduit/blob/main/docs/architecture.md).
18 changes: 18 additions & 0 deletions docs/introduction/connectors.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: 'Connectors'
slug: 'connectors'
sidebar_position: 4
---

A connector knows how to read/write records from/to a data source/destination
(e.g. a database).

When thinking about connectors for Conduit, our goals were to:
- provide a good development experience to connector developers,
- ship Conduit with real built-in connectors (compiled into the Conduit binary),
- to make it as easy as possible to write plugins in _any_ programming language,
- the [Connector SDK](https://github.com/conduitio/conduit-connector-sdk) to be
decoupled from Conduit and be able to change without changing Conduit itself.

Have a look at our [connector docs](/docs/connectors/installing-connectors) to
find out more!
29 changes: 18 additions & 11 deletions docs/introduction/getting-started.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,30 @@ hide_title: true
sidebar_label: "Getting Started"
---

<div style={{ textAlign: "center" }}>
<img alt="Conduit Logo" style={{ maxWidth: "400px", marginTop: 0 }} src="/images/conduit/on-white-conduit-logo.png" />
</div>
<img
alt="Conduit Logo"
style={{ maxWidth: "400px", marginTop: 0 }}
src="/images/conduit/on-white-conduit-logo.png"
/>

Conduit is a data integration tool for software engineers. Its purpose is to
help you move data from A to B. You can use Conduit to send data from Kafka to
Postgres, between files and APIs,
between [supported connectors](https://github.com/ConduitIO/conduit/blob/main/docs/connectors.md),
and [any datastore you can build a plugin for](/docs/connectors/building-connectors).

Conduit is a data integration tool for software engineers. Its purpose is to help you move data from A to B. You can use Conduit to send data from Kafka to Postgres, between files and APIs, between [supported connectors](https://github.com/ConduitIO/conduit/blob/main/docs/connectors.md), and [any datastore you can build a plugin for](/docs/introduction/plugins).

It's written in [GoLang](https://go.dev/), compiles to a binary, and is designed to be easy to use and [deploy](https://docs.conduit.io/docs/Deploy/overview).
It's written in [GoLang](https://go.dev/), compiles to a binary, and is designed
to be easy to use and [deploy](https://docs.conduit.io/docs/Deploy/overview).

To get started:

1. [Download the latest Conduit release](https://github.com/ConduitIO/conduit/releases).
1. [Download the latest Conduit release](https://github.com/ConduitIO/conduit/releases/latest).
2. Unzip:

If youre on Mac, it will look something like this:
If you're on Mac, it will look something like this:

```shell
tar zxvf conduit_0.1.0_Darwin_x86_64.tar.gz
tar zxvf conduit_0.3.0_Darwin_x86_64.tar.gz
```

3. Start Conduit:
Expand All @@ -32,9 +38,10 @@ tar zxvf conduit_0.1.0_Darwin_x86_64.tar.gz
./conduit
```

**Tip**: Depending on your operating system, you may need to run `chmod +x conduit` before running the binary.
**Tip**: Depending on your operating system, you may need to
run `chmod +x conduit` before running the binary.

4. Navigate to `http://localhost:8080/ui/`:
4. Navigate to `http://localhost:8080` to check Conduit's UI:

![Conduit Pipeline](/images/conduit/pipeline.png)

Expand Down
Loading

0 comments on commit 3433960

Please sign in to comment.