Skip to content
This repository has been archived by the owner on Dec 16, 2021. It is now read-only.

Rename the project to DoctorK #270

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 33 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# <img src="docs/doctorkafka_logo.svg" alt="DoctorKafka logo" width="48"> &nbsp;&nbsp; Pinterest DoctorKafka
# <img src="docs/doctork_logo.svg" alt="DoctorK logo" width="48"> &nbsp;&nbsp; Pinterest DoctorK

### Open Sourcing Orion
Based on learning from DoctorKafka we have created and open sourced [Orion](https://github.com/pinterest/orion), a more capable system for management of Kafka and other distributed systems. Orion addresses the [shortcomings](https://github.com/pinterest/orion/blob/master/docs/Motivation.md) of DoctorKafka and also adds new features like topic management, rolling restarts, rolling upgrades, stuck consumer remediation etc. Orion has been stabily managing our entire kafka fleet for >6months.
Based on learning from DoctorK we have created and open sourced [Orion](https://github.com/pinterest/orion), a more capable system for management of Kafka and other distributed systems. Orion addresses the [shortcomings](https://github.com/pinterest/orion/blob/master/docs/Motivation.md) of DoctorK and also adds new features like topic management, rolling restarts, rolling upgrades, stuck consumer remediation etc. Orion has been stabily managing our entire kafka fleet for >6months.

---

DoctorKafka is a service for [Kafka] cluster auto healing and workload balancing. DoctorKafka can automatically detect broker failure and reassign the workload on the failed nodes to other nodes. DoctorKafka can also perform load balancing based on topic partitions's network usage, and makes sure that broker network usage does not exceed the defined settings. DoctorKafka sends out alerts when it is not confident on taking actions.
DoctorK is a service for [Kafka] cluster auto healing and workload balancing. DoctorK can automatically detect broker failure and reassign the workload on the failed nodes to other nodes. DoctorK can also perform load balancing based on topic partitions's network usage, and makes sure that broker network usage does not exceed the defined settings. DoctorK sends out alerts when it is not confident on taking actions.

#### Features

Expand All @@ -19,10 +19,10 @@ Design details are available in [docs/DESIGN.md](docs/DESIGN.md).

## Setup Guide

##### Get DoctorKafka code
##### Get DoctorK code
```sh
git clone [git-repo-url] doctorkafka
cd doctorkafka
git clone [git-repo-url] doctork
cd doctork
```

##### Build kafka stats collector and deployment it to kafka brokers
Expand Down Expand Up @@ -56,8 +56,8 @@ The following is a sample command line for running kafkastats collector:
```
java -server \
-Dlog4j.configurationFile=file:./log4j2.xml \
-cp lib/*:kafkastats-0.2.4.9.jar \
com.pinterest.doctorkafka.stats.KafkaStatsMain \
-cp lib/*:kafkastats-0.2.4.10.jar \
com.pinterest.doctork.stats.KafkaStatsMain \
-broker 127.0.0.1 \
-jmxport 9999 \
-topic brokerstats \
Expand Down Expand Up @@ -100,7 +100,7 @@ The following is a sample upstart scripts for automatically restarting kafkastat
-XX:ErrorFile=$LOG_DIR/jvm_error.log \
-cp $CLASSPATH"
exec $DAEMON $DAEMON_OPTS -Dlog4j.configuration=${LOG_PROPERTIES} \
com.pinterest.doctorkafka.stats.KafkaStatsMain \
com.pinterest.doctork.stats.KafkaStatsMain \
-broker 127.0.0.1 \
-jmxport 9999 \
-topic brokerstats \
Expand All @@ -115,39 +115,39 @@ The following is a sample upstart scripts for automatically restarting kafkastat
```


##### Customize doctorkafka configuration parameters
##### Customize doctork configuration parameters

Edit `drkafka/config/*.properties` files to specify parameters describing the environment. Those files contain
Edit `doctork/config/*.properties` files to specify parameters describing the environment. Those files contain
comments describing the meaning of individual parameters.


#### Create and install jars

```
mvn package -pl drkafka -am
mvn package -pl doctork -am
```

```sh
mvn package
mkdir ${DOCTORKAFKA_INSTALL_DIR} # directory to place DoctorKafka binaries in.
tar -zxvf target/doctorkafka-0.2.4.9-bin.tar.gz -C ${DOCTORKAFKA_INSTALL_DIR}
mkdir ${DOCTORK_INSTALL_DIR} # directory to place DoctorK binaries in.
tar -zxvf target/doctork-0.2.4.10-bin.tar.gz -C ${DOCTORK_INSTALL_DIR}
```

##### Run DoctorKafka
##### Run DoctorK
```sh
cd ${DOCTORKAFKA_INSTALL_DIR}
cd ${DOCTORK_INSTALL_DIR}

java -server \
-cp lib/*:doctorkafka-0.2.4.9.jar \
com.pinterest.doctorkafka.DoctorKafkaMain \
-cp lib/*:doctork-0.2.4.10.jar \
com.pinterest.doctork.DoctorKMain \
server dropwizard_yaml_file
```

The above `dropwizard_yaml_file` is the path to a standard [DropWizard configuration file ](https://www.dropwizard.io/1.0.0/docs/manual/configuration.html)
that only requires the following line pointing to your `doctorkafka.properties` path.
that only requires the following line pointing to your `doctork.properties` path.

```
config: $doctorkafka_config_properties_file_path
config: $doctork_config_properties_file_path
```

##### Customize configuration parameters
Expand All @@ -157,36 +157,36 @@ Those files contain comments describing the meaning of individual parameters.


## Tools
DoctorKafka comes with a number of tools implementing interactions with the environment.
DoctorK comes with a number of tools implementing interactions with the environment.

##### Cluster Load Balancer

```bash
cd ${DOCTORKAFKA_INSTALL_DIR}
cd ${DOCTORK_INSTALL_DIR}
java -server \
-Dlog4j.configurationFile=file:drkafka/config/log4j2.xml \
-cp drkafka/target/lib/*:drkafka/target/doctorkafka-0.2.4.9.jar \
com.pinterest.doctorkafka.tools.ClusterLoadBalancer \
-Dlog4j.configurationFile=file:doctork/config/log4j2.xml \
-cp doctork/target/lib/*:doctork/target/doctork-0.2.4.10.jar \
com.pinterest.doctork.tools.ClusterLoadBalancer \
-brokerstatstopic brokerstats \
-brokerstatszk zookeeper001:2181/cluster1 \
-clusterzk zookeeper001:2181,zookeeper002:2181,zookeeper003:2181/cluster2 \
-config ./drkafka/config/doctorkafka.properties \
-config ./doctork/config/doctork.properties \
-seconds 3600
```
Cluster load balancer balances the workload among brokers to make sure the broker network
usage does not exceed the threshold.


## DoctorKafka UI
## DoctorK UI

DoctorKafka uses [dropwizard-core module](https://www.dropwizard.io/1.3.5/docs/manual/core.html) and [serving assets](https://www.dropwizard.io/1.3.5/docs/manual/core.html#serving-assets) to provide a web UI. The following is the screenshot from a demo:
DoctorK uses [dropwizard-core module](https://www.dropwizard.io/1.3.5/docs/manual/core.html) and [serving assets](https://www.dropwizard.io/1.3.5/docs/manual/core.html#serving-assets) to provide a web UI. The following is the screenshot from a demo:

![doctorkafka UI](docs/doctorkafka_ui.png)
<img src="docs/doctorkafka_ui.png" width="160">
![doctork UI](docs/doctork_ui.png)
<img src="docs/doctork_ui.png" width="160">

## DoctorKafka APIs
## DoctorK APIs

The following APIs are available for DoctorKafka:
The following APIs are available for DoctorK:

- List Cluster
- Maintenance Mode
Expand All @@ -210,7 +210,7 @@ Detailed description of APIs can be found [docs/APIs.md](docs/APIs.md)

## License

DoctorKafka is distributed under [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0.html).
DoctorK is distributed under [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0.html).

[Kafka]:http://kafka.apache.org/
[Ostrich]: https://github.com/twitter/ostrich
Expand Down
18 changes: 9 additions & 9 deletions docs/APIs.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ List Cluster: curl -XGET http://localhost:8080/api/cluster

**Maintenance Mode API**

Allows users to disable DoctorKafka for a cluster so that manual maintenance operations can be performed on it without any interference from Dr. Kafka.
Allows users to disable DoctorK for a cluster so that manual maintenance operations can be performed on it without any interference from Dr. Kafka.

GET will get the current status of maintenance mode.
PUT will place the cluster in maintenance mode.
Expand All @@ -24,21 +24,21 @@ curl -XDELETE http://localhost:8080/api/cluster/<clustername>/admin/maintenance

**API Security**

Dr. Kafka allows plugable API request authorization and follows the Role Based Access Control (RBAC) model. Authorization is performed by populating role-mapping in [DrKafkaSecurityContext](https://github.com/pinterest/doctorkafka/tree/master/drkafka/src/main/java/com/pinterest/doctorkafka/security/DrKafkaSecurityContext.java) by creating an implementation of AuthorizationFilter e.g. [SampleAuthorizationFilter](https://github.com/pinterest/doctorkafka/tree/master/drkafka/src/main/java/com/pinterest/doctorkafka/security/SampleAuthorizationFilter.java)
Dr. Kafka allows plugable API request authorization and follows the Role Based Access Control (RBAC) model. Authorization is performed by populating role-mapping in [DoctorKSecurityContext](https://github.com/pinterest/doctorkafka/tree/master/doctork/src/main/java/com/pinterest/doctork/security/DoctorKSecurityContext.java) by creating an implementation of AuthorizationFilter e.g. [SampleAuthorizationFilter](https://github.com/pinterest/doctorkafka/tree/master/doctork/src/main/java/com/pinterest/doctork/security/SampleAuthorizationFilter.java)

Here's the flow sequence:
1. DoctorKafkaMain checks if an authorization filter has been specified via `doctorkafka.authorization.filter.class` configuration and creates an instance of `DrKafkaAuthorizationFilter`
2. This instance is then configured (invoke `configure(DoctorKafkaConfig config)`) and registered with Jersey
1. DoctorKMain checks if an authorization filter has been specified via `doctork.authorization.filter.class` configuration and creates an instance of `DoctorKAuthorizationFilter`
2. This instance is then configured (invoke `configure(DoctorKConfig config)`) and registered with Jersey

All authorization filters must implement [DrKafkaAuthorizationFilter](https://github.com/pinterest/doctorkafka/tree/master/drkafka/src/main/java/com/pinterest/doctorkafka/security/DrKafkaAuthorizationFilter.java) which has two methods that need to be implemented:
All authorization filters must implement [DoctorKAuthorizationFilter](https://github.com/pinterest/doctorkafka/tree/master/doctork/src/main/java/com/pinterest/doctork/security/DoctorKAuthorizationFilter.java) which has two methods that need to be implemented:

- `configure(DoctorKafkaConfig config)`
- `configure(DoctorKConfig config)`
- `filter(ContainerRequestContext requestContext)`

`configure(DoctorKafkaConfig config)` provides DoctorKafkaConfig to allow authorizer to configure, `DoctorKafkaConfig.getDrKafkaAdminGroups()` returns the list of groups that need to be mapped to `drkafka_admin` role
`configure(DoctorKConfig config)` provides DoctorKConfig to allow authorizer to configure, `DoctorKConfig.getDoctorKAdminGroups()` returns the list of groups that need to be mapped to `doctork_admin` role

`filter(ContainerRequestContext requestContext)` should implement the logic to extract and populate PRINCIPAL & ROLE information which is needed to create a new instance of [DrKafkaSecurityContext](https://github.com/pinterest/doctorkafka/tree/master/drkafka/src/main/java/com/pinterest/doctorkafka/security/DrKafkaSecurityContext.java). Jersey then uses this information to restricted access to methods for users who are not in the `drkafka_admin` role. Here's the flow:
`filter(ContainerRequestContext requestContext)` should implement the logic to extract and populate PRINCIPAL & ROLE information which is needed to create a new instance of [DoctorKSecurityContext](https://github.com/pinterest/doctorkafka/tree/master/doctork/src/main/java/com/pinterest/doctork/security/DoctorKSecurityContext.java). Jersey then uses this information to restricted access to methods for users who are not in the `doctork_admin` role. Here's the flow:

(Authentication) -> (Populates user & group info headers) -> (YourDrKafkaAuthoriziationFilter) -> (extract User and Group info) -> (Map groups to roles) -> (Create SecurityContext) -> (Inject SecurityContext back in session)
(Authentication) -> (Populates user & group info headers) -> (YourDoctorKAuthoriziationFilter) -> (extract User and Group info) -> (Map groups to roles) -> (Create SecurityContext) -> (Inject SecurityContext back in session)

Note: We currently don't ship authentication mechanisms with Dr.Kafka since authentication requirements are environment/company specific. For plugable authentication, please refer to https://www.dropwizard.io/1.3.8/docs/manual/auth.html You may also use an authentication proxy.
18 changes: 9 additions & 9 deletions docs/DESIGN.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
## Pinterest DoctorKafka Design
## Pinterest DoctorK Design

#### High Level Design

DoctorKafka is composed of two parts:
DoctorK is composed of two parts:

* Metrics collector that is deployed to each kafka broker
* Central doctorkafka service that analyzes broker status and execute kafka operation commands
* Central doctork service that analyzes broker status and execute kafka operation commands

The following diagram shows the high level design. DoctorKafka is composed of two parts: i) the metrics collector that deploys on every kafka broker; 2) the central failure detection, workload balancing, and partition reassignment logic. The metric collectors send metrics to a kafka topic that the central DoctorKafka service read from. DoctorKafka takes actions and also log its action to another topic that can be viewed through web UI. DoctorKafka only takes confident actions, and send out alerts when it is not confident on taking actions.
The following diagram shows the high level design. DoctorK is composed of two parts: i) the metrics collector that deploys on every kafka broker; 2) the central failure detection, workload balancing, and partition reassignment logic. The metric collectors send metrics to a kafka topic that the central DoctorK service read from. DoctorK takes actions and also log its action to another topic that can be viewed through web UI. DoctorK only takes confident actions, and send out alerts when it is not confident on taking actions.

![doctorkafka diagram](doctorkafka_diagram.png)
<img src="doctorkafka_diagram.png" width="160">
![doctork diagram](doctork_diagram.png)
<img src="doctork_diagram.png" width="160">


#### Kafka Metrics collection

DoctorKafka needs accurate kafka metrics to make sound decisions. As Kafka workload is mostly network bounded, DoctorKafka only uses topic partition network traffic metric to decide topic partition allocation. Currently kafka only have jmx metrics at topic level. It does not provide jmx metrics at replica level. Due to partition reassignment, etc., the traffic at topic level can vary a lot. Computing the normal network traffic of replicas becomes a challenge.
DoctorK needs accurate kafka metrics to make sound decisions. As Kafka workload is mostly network bounded, DoctorK only uses topic partition network traffic metric to decide topic partition allocation. Currently kafka only have jmx metrics at topic level. It does not provide jmx metrics at replica level. Due to partition reassignment, etc., the traffic at topic level can vary a lot. Computing the normal network traffic of replicas becomes a challenge.

DoctorKafka deploys a metric collection agent on each kafka broker to collect metrics. The metric agent collect the following info for each broker:
DoctorK deploys a metric collection agent on each kafka broker to collect metrics. The metric agent collect the following info for each broker:
* Inbound and outbound network traffic for each leader replica
* leader replicas on the broker
* follower replicas on the broker
Expand All @@ -26,7 +26,7 @@ DoctorKafka deploys a metric collection agent on each kafka broker to collect me
Note that as of kafka 0.10.2, kafka only expose network traffic metrics for leader replicas. As follower replicas only have in-bound traffic, we can infer the follower replica traffic from leader replica traffic.


#### DoctorKafka cluster manager
#### DoctorK cluster manager

The broker workload traffic usually varies throughout the day. Because of this, we need to read broker stats from 24-48 hours time window to infer the traffic of each replica. As partition reassignment does not reflect the noraml workload traffic, we need to exclude partition reassignment traffic during the metric computation.

Expand Down
Binary file added docs/doctork_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions docs/doctork_logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/doctork_ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/doctorkafka_diagram.png
Binary file not shown.
16 changes: 0 additions & 16 deletions docs/doctorkafka_logo.svg

This file was deleted.

Binary file removed docs/doctorkafka_ui.png
Binary file not shown.
6 changes: 3 additions & 3 deletions drkafka/Dockerfile → doctork/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ RUN apt-get update && apt-get install -y mailutils
RUN apt-get update && apt-get install -y sendmail

# Add the build artifact under /opt, can be overridden by docker build
ARG ARTIFACT_PATH=target/doctorkafka-0.2.4.9-bin.tar.gz
ADD $ARTIFACT_PATH /opt/doctorkafka/
ARG ARTIFACT_PATH=target/doctork-0.2.4.10-bin.tar.gz
ADD $ARTIFACT_PATH /opt/doctork/
# default cmd
CMD /opt/doctorkafka/scripts/run_in_container.sh
CMD /opt/doctork/scripts/run_in_container.sh
Loading