Documentation updates (#22)

* document structure, minor fixes * add more docs * even more docs * Update docs/introduction/architecture.mdx Co-authored-by: Maha Hajja <[email protected]> * Update docs/introduction/getting-started.mdx Co-authored-by: Maha Hajja <[email protected]> * fix review comments * add doc about referencing connectors * simplify connector introduction Co-authored-by: Maha Hajja <[email protected]>
ConduitIO · Nov 4, 2022 · 3433960 · 3433960
1 parent b027c21
commit 3433960
Show file tree

Hide file tree

Showing 26 changed files with 729 additions and 84 deletions.
diff --git a/docs/Deploy/_category_.json b/docs/Deploy/_category_.json
diff --git a/docs/configuration/_category_.json b/docs/configuration/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Configuration",
+  "position": 1
+}
diff --git a/docs/configuration/pipeline-configuration-files.mdx b/docs/configuration/pipeline-configuration-files.mdx
@@ -0,0 +1,99 @@
+---
+title: 'Pipeline Configuration Files'
+slug: 'pipeline-configuration-files'
+---
+
+Pipeline configuration files give you the ability to define pipelines that are provisioned by Conduit at startup.
+It's as simple as creating a YAML file that defines pipelines, connectors, processors, and their corresponding configurations.
+
+## Getting started
+
+Create a folder called `pipelines` at the same level as your Conduit binary file, add all your YAML files
+there, then run Conduit using the command:
+```
+./conduit
+```
+Conduit will only search for files with `.yml` or `.yaml` extensions, recursively in all sub-folders.
+
+If you have your YAML files in a different directory, or want to provision only one file, then simply run Conduit with
+the CLI flag `pipelines.path` and point to your file or directory:
+```
+./conduit -pipeline.path ../my-directory
+```
+If your directory does not exist, Conduit will fail with an error: `"pipelines.path" config value is invalid`
+
+### YAML Schema
+
+The file in general has two root keys, the `version`, and the `pipelines` map. The map consists of other elements like
+`status` and `name`, which are configurations for the pipeline itself.
+
+To create connectors in that pipeline, simply add another map under the pipeline map, and call it `connectors`.
+
+To create processors, either add a `processors` map under a pipeline ID, or under a connector ID, depending on its parent.
+Check this YAML file example with explanation for each field:
+
+``` yaml
+version: 1.0                    # parser version, the only supported version for now is 1.0 [mandatory]
+
+pipelines:                      # a map of pipelines IDs and their configurations.
+  pipeline1:                    # pipeline ID, has to be unique.
+    status: running             # pipelines status at startup, either running or stopped. [mandatory]
+    name: pipeline1             # pipeline name, if not specified, pipeline ID will be used as name. [optional]
+    description: desc           # pipeline description. [optional]
+    connectors:                 # a map of connectors IDs and their configurations.
+      con1:                     # connector ID, has to be unique per pipeline.
+        type: source            # connector type, either "source" or "destination". [mandatory]
+        plugin: builtin:file    # connector plugin. [mandatory]
+        name: con3              # connector name, if not specified, connector ID will be used as name. [optional]
+        settings:               # map of configurations keys and their values.
+          path: ./file1.txt     # for this example, the plugin "bultin:file" has only one configuration, which is path.
+      con2:
+        type: destination
+        plugin: builtin:file
+        name: file-dest
+        settings:
+          path: ./file2.txt
+        processors:             # a map of processor IDs and their configurations, "con2" is the processor parent.
+          proc1:                # processor ID, has to be unique for each parent
+            type: js            # processor type. [mandatory]
+            settings:           # map of processor configurations and values
+              Prop1: string
+    processors:                 # processor IDs, that have the pipeline "pipeline1" as a parent.
+      proc2:
+        type: js
+        settings:
+          prop1: ${ENV_VAR}     # yon can use environmental variables by wrapping them in a dollar sign and curly braces ${}.
+```
+
+If the file is invalid (missed a mandatory field, or has an invalid configuration value), then the pipeline that has the
+invalid value will be skipped, with an error message logged.
+
+If two pipelines in one file have the same ID, or the `version` field was not specified, then the file would be
+non-parsable and will be skipped with an error message logged.
+
+If two pipelines from different files have the same ID, the second pipeline will be skipped, with an error message
+specifying which pipeline was not provisioned.
+
+**_Note_**: Connector IDs and processor IDs will get their parent ID prefixed, so if you specify a connector ID as `con1`
+and its parent is `pipeline1`, then the provisioned connector will have the ID `pipeline1:con1`. Same goes for processors,
+if the processor has a pipeline parent, then the processor ID will be `connectorID:processorID`, and if a processor
+has a connector parent, then the processor ID will be `pipelineID:connectorID:processorID`.
+
+## Pipelines Immutability
+
+Pipelines provisioned by configuration files are **immutable**, any updates needed on a provisioned pipeline have to be
+done through the configuration file. You can only control stopping and starting a pipeline
+through the UI or API.
+
+### Updates and Deletes
+
+Updates and deletes for a pipeline provisioned by configuration files can only be done through the configuration files.
+Changes should be made to the files, then Conduit has to be restarted to reload the changes. Any updates or deletes done
+through the API or UI will be prohibited.
+
+* To delete a pipeline: simply delete it from the `pipelines` map from the configuration file, then run Conduit again.
+* To update a pipeline: change any field value from the configuration file, and run Conduit again to address these updates.
+
+Updates will preserve the status of the pipeline, and will continue working from where it stopped. However, the pipeline
+will start from the beginning of the source and will not continue from where it stopped, if one of these values were updated:
+{`pipeline ID`, `connector ID`, `connector plugin`, `connector type`}.
diff --git a/docs/connectors/_category_.json b/docs/connectors/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Connectors",
+  "position": 4
+}
diff --git a/docs/connectors/behavior.mdx b/docs/connectors/behavior.mdx
@@ -0,0 +1,58 @@
+---
+title: "Connector Behavior"
+sidebar_label: "Behavior"
+slug: "behavior"
+sidebar_position: 2
+---
+
+This document provides insights on how Conduit communicates with a connector.
+
+## Conduit Connector Protocol
+
+Conduit expects all connectors to follow the
+[Conduit Connector Protocol](https://github.com/ConduitIO/conduit-connector-protocol).
+The connector protocol is a set of protobuf files describing
+the [interface](#protocol-grpc-interface)
+between Conduit and the connector in the form of gRPC services. This approach
+allows connectors to be written in any language with support for gRPC.
+
+The connector protocol splits the connector interface in 3 gRPC services - one
+for the source, another for the destination, and a third one for the connector
+specifications. A connector needs to implement the specifications and at least
+the source or destination.
+
+Note that you don't need to use the connector protocol directly - we provide a
+[Go connector SDK](https://github.com/ConduitIO/conduit-connector-sdk) that
+hides the complexity of the protocol and simplifies the implementation of a
+connector.
+
+### Standalone vs built-in connectors
+
+While the Conduit Connector Protocol decouples Conduit from its connectors by
+using gRPC, it also provides a thin Go layer that allows any Go connector to be
+compiled into the Conduit binary as a built-in connector. The following diagram
+shows how Conduit communicates with a standalone connector and a built-in
+connector.
+
+![Standalone vs built-in connectors](/images/standalone-vs-builtin.svg)
+
+**Standalone connectors** are run as separate processes, separate from the
+Conduit process. They need to have an entrypoint (binary or script) which runs
+the connector and starts the gRPC server responsible for communicating with
+Conduit. A standalone connector process is started and stopped by Conduit on
+demand. One connector process will be started for every pipeline connector in
+Conduit.
+
+**Built-in connectors** on the other hand are executed in the same process as
+Conduit and communicate with Conduit through Go channels instead of gRPC. Any
+connector written in Go can be compiled into the Conduit binary and used as a
+built-in connector.
+
+Find out more about the [Conduit connector plugin architecture](https://github.com/ConduitIO/conduit/blob/main/docs/architecture-decision-records/20220121-conduit-plugin-architecture.md).
+
+## Protocol gRPC Interface
+
+The protocol interface is hosted on the
+[Buf schema registry](https://buf.build/conduitio/conduit-connector-protocol/docs/main:connector.v1).
+Use it as a starting point when implementing a connector in a language other
+than Go.
diff --git a/docs/connectors/building.mdx b/docs/connectors/building.mdx
@@ -0,0 +1,19 @@
+---
+title: "Building Connectors"
+slug: "building-connectors"
+sidebar_position: 3
+---
+
+Conduit connectors can be built in any programming language that supports gRPC.
+To make it easier to write connectors we provide
+a [Connector SDK](https://github.com/ConduitIO/conduit-connector-sdk) written in
+Go. Using the SDK is the recommended way of writing a Conduit connector.
+
+## Conduit connector template
+
+The easiest way to start implementing your own Conduit connector is by using the
+[Conduit connector template](https://github.com/ConduitIO/conduit-connector-template).
+It contains the basic project structure as well as some additional utilities
+like GitHub actions and a Makefile.
+
+Find out more about the template and how to use it in the readme.
diff --git a/docs/connectors/installing.mdx b/docs/connectors/installing.mdx
@@ -0,0 +1,50 @@
+---
+title: "Installing Connectors"
+slug: "installing-connectors"
+sidebar_position: 0
+---
+
+Conduit ships with a number of built-in connectors:
+
+- [File connector](https://github.com/ConduitIO/conduit-connector-file) provides
+  a source/destination to read/write a local file (useful for quickly trying out
+  Conduit without additional setup).
+- [Kafka connector](https://github.com/ConduitIO/conduit-connector-kafka)
+  provides a source/destination for Apache Kafka.
+- [Postgres connector](https://github.com/ConduitIO/conduit-connector-postgres)
+  provides a source/destination for PostgreSQL.
+- [S3 connector](https://github.com/ConduitIO/conduit-connector-s3) provides a
+  source/destination for AWS S3.
+- [Generator connector](https://github.com/ConduitIO/conduit-connector-generator)
+  provides a source which generates random data (useful for testing).
+
+Besides these connectors there is a number of standalone connectors that can be
+added to Conduit as plugins (find the complete
+list [here](https://github.com/ConduitIO/conduit/blob/main/docs/connectors.md)).
+
+### Standalone Connector Binary
+
+To install a standalone connector you first need the compiled connector binary.
+A binary can normally be downloaded from the latest release in the connector's
+GitHub repository (this may vary in 3rd party connectors not developed by the
+Conduit team). Make sure to download the binary that matches your operating
+system and architecture.
+
+Alternatively you can build the binary yourself (for instructions on building a
+connector please refer to the readme of that specific connector).
+
+## Installing a Connector in Conduit
+
+Conduit loads standalone connectors at startup. The connector binaries need to
+be placed in the `connectors` directory relative to the Conduit binary so
+Conduit can find them. Alternatively, the path to the standalone connectors can
+be adjusted using the CLI flag `-connectors.path`, for example:
+
+```shell
+./conduit -connectors.path=/path/to/connectors/
+```
+
+Names of the connector binaries are not important, since Conduit is getting the
+information about connectors from connectors themselves (using their gRPC API).
+
+Find out how to [reference your connector](/docs/connectors/referencing-connectors).
diff --git a/docs/connectors/referencing.mdx b/docs/connectors/referencing.mdx
@@ -0,0 +1,40 @@
+---
+title: "Referencing Connectors"
+slug: "referencing-connectors"
+sidebar_position: 1
+---
+
+The name used to reference a connector in API requests (e.g. to create a new
+connector) comes in the following format:
+
+`[PLUGIN-TYPE:]PLUGIN-NAME[@VERSION]`
+
+- `PLUGIN-TYPE` (`builtin`, `standalone` or `any`)
+  - Defines if the specified plugin should be builtin or standalone.
+  - If `any`, Conduit will use a standalone plugin if it exists and fall back to
+    a builtin plugin.
+  - Default is `any`.
+- `PLUGIN-NAME`
+  - Defines the name of the plugin as specified in the plugin specifications, it
+    has to be an exact match.
+- `VERSION`
+  - Defines the plugin version as specified in the plugin specifications, it has
+    to be an exact match.
+  - If `latest`, Conduit will use the latest semantic version.
+  - Default is `latest`.
+
+Examples:
+
+- `postgres`
+  - will use the **latest** **standalone** **postgres** plugin
+  - will fallback to the **latest** **builtin** **postgres** plugin if
+    standalone wasn't found
+- `[email protected]`
+  - will use the **standalone** **postgres** plugin with version **v0.2.0**
+  - will fallback to a **builtin** **postgres** plugin with version **v0.2.0**
+    if standalone wasn't found
+- `builtin:postgres`
+  - will use the **latest** **builtin** **postgres** plugin
+- `standalone:[email protected]`
+  - will use the **standalone** **postgres** plugin with version **v0.3.0** (no
+    fallback to builtin)
diff --git a/docs/deploy/_category_.json b/docs/deploy/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "Deploy",
+  "position": 3
+}
diff --git a/docs/Deploy/aws_ec2.mdx → docs/deploy/aws_ec2.mdx b/docs/Deploy/aws_ec2.mdx → docs/deploy/aws_ec2.mdx
diff --git a/docs/Deploy/overview.mdx → docs/deploy/overview.mdx b/docs/Deploy/overview.mdx → docs/deploy/overview.mdx
diff --git a/docs/introduction/architecture.md → docs/introduction/architecture.mdx b/docs/introduction/architecture.md → docs/introduction/architecture.mdx
@@ -7,13 +7,12 @@ Here is an overview of the Conduit Architecture.
 
 ![Conduit Architecture](/images/conduit/conduit-diagram.svg)
 
-Conduit is split in the following layers:
+Conduit is split into the following layers:
 * **API layer** - exposes the public APIs used to communicate with Conduit. It exposes 2 types of APIs:
   * **gRPC** - this is the main API provided by Conduit. The gRPC API definition can be found in
-    [api.proto](../proto/api/v1/api.proto), it can be used to generate code for the client.
+    [api.proto](https://github.com/ConduitIO/conduit/blob/main/proto/api/v1/api.proto), it can be used to generate code for the client.
   * **HTTP** - the HTTP API is generated using [grpc-gateway](https://github.com/grpc-ecosystem/grpc-gateway) and
-    forwards the requests to the gRPC API. Conduit exposes an 
-    [openapi](../pkg/web/openapi/swagger-ui/api/v1/api.swagger.json) definition that describes the HTTP API, which is
+    forwards the requests to the gRPC API. Conduit exposes an openapi definition that describes the HTTP API, which is
     also exposed through Swagger UI on `http://localhost:8080/openapi/`.
 * **Orchestration layer** - the orchestration layer is responsible for coordinating the flow of operations between the
   core services. It also takes care of transactions, making sure that changes made to specific entities are not visible
@@ -52,7 +51,7 @@ Conduit is split in the following layers:
 * **Plugins** - while this is not a layer in the same sense as the other layers, it is a component separate from
   everything else. It interfaces with the connector on one side and with Conduit plugins on the other and facilitates
   the communication between them. A Conduit plugin is a separate process that implements the interface defined in
-  [plugins.proto](https://github.com/ConduitIO/conduit/blob/main/pkg/plugins/proto/plugins.proto) and provides the
+  [conduit-connector-protocol](https://github.com/ConduitIO/conduit-connector-protocol) and provides the
   read/write functionality for a specific resource (e.g. a database).
 
-  For more see [GitHub](https://github.com/ConduitIO/conduit/blob/main/docs/architecture.md).
+For more see [GitHub](https://github.com/ConduitIO/conduit/blob/main/docs/architecture.md).
diff --git a/docs/introduction/connectors.mdx b/docs/introduction/connectors.mdx
@@ -0,0 +1,18 @@
+---
+title: 'Connectors'
+slug: 'connectors'
+sidebar_position: 4
+---
+
+A connector knows how to read/write records from/to a data source/destination
+(e.g. a database).
+
+When thinking about connectors for Conduit, our goals were to:
+- provide a good development experience to connector developers,
+- ship Conduit with real built-in connectors (compiled into the Conduit binary),
+- to make it as easy as possible to write plugins in _any_ programming language,
+- the [Connector SDK](https://github.com/conduitio/conduit-connector-sdk) to be
+  decoupled from Conduit and be able to change without changing Conduit itself.
+
+Have a look at our [connector docs](/docs/connectors/installing-connectors) to
+find out more!
diff --git a/docs/introduction/getting-started.mdx b/docs/introduction/getting-started.mdx
@@ -6,24 +6,30 @@ hide_title: true
 sidebar_label: "Getting Started"
 ---
 
-<div style={{ textAlign: "center" }}>
-  <img alt="Conduit Logo" style={{ maxWidth: "400px", marginTop: 0 }} src="/images/conduit/on-white-conduit-logo.png" />
-</div>
+<img
+  alt="Conduit Logo"
+  style={{ maxWidth: "400px", marginTop: 0 }}
+  src="/images/conduit/on-white-conduit-logo.png"
+/>
 
+Conduit is a data integration tool for software engineers. Its purpose is to
+help you move data from A to B. You can use Conduit to send data from Kafka to
+Postgres, between files and APIs,
+between [supported connectors](https://github.com/ConduitIO/conduit/blob/main/docs/connectors.md),
+and [any datastore you can build a plugin for](/docs/connectors/building-connectors).
 
-Conduit is a data integration tool for software engineers. Its purpose is to help you move data from A to B. You can use Conduit to send data from Kafka to Postgres, between files and APIs, between [supported connectors](https://github.com/ConduitIO/conduit/blob/main/docs/connectors.md), and [any datastore you can build a plugin for](/docs/introduction/plugins).
-
-It's written in [GoLang](https://go.dev/), compiles to a binary, and is designed to be easy to use and [deploy](https://docs.conduit.io/docs/Deploy/overview).
+It's written in [GoLang](https://go.dev/), compiles to a binary, and is designed
+to be easy to use and [deploy](https://docs.conduit.io/docs/Deploy/overview).
 
 To get started:
 
-1. [Download the latest Conduit release](https://github.com/ConduitIO/conduit/releases).
+1. [Download the latest Conduit release](https://github.com/ConduitIO/conduit/releases/latest).
 2. Unzip:
 
-If you’re on Mac, it will look something like this:
+If you're on Mac, it will look something like this:
 
 ```shell
-tar zxvf conduit_0.1.0_Darwin_x86_64.tar.gz
+tar zxvf conduit_0.3.0_Darwin_x86_64.tar.gz
 ```
 
 3. Start Conduit:
@@ -32,9 +38,10 @@ tar zxvf conduit_0.1.0_Darwin_x86_64.tar.gz
 ./conduit
 ```
 
-**Tip**: Depending on your operating system, you may need to run `chmod +x conduit` before running the binary.
+**Tip**: Depending on your operating system, you may need to
+run `chmod +x conduit` before running the binary.
 
-4. Navigate to `http://localhost:8080/ui/`:
+4. Navigate to `http://localhost:8080` to check Conduit's UI:
 
 ![Conduit Pipeline](/images/conduit/pipeline.png)