From 23e0c0b53e61ca191f9b9d8a637e178446d2e0e3 Mon Sep 17 00:00:00 2001 From: Lucas Medeiros Date: Mon, 27 Sep 2021 10:48:29 -0300 Subject: [PATCH] update docs to techdocs format --- README.md | 264 +----------------- .../assets/klepto_logo.png | Bin docs/contribute.md | 9 + docs/index.md | 25 ++ docs/installation.md | 17 ++ docs/usage/commands.md | 108 +++++++ docs/usage/config.md | 123 ++++++++ mkdocs.yml | 16 ++ 8 files changed, 303 insertions(+), 259 deletions(-) rename klepto_logo.png => docs/assets/klepto_logo.png (100%) create mode 100644 docs/contribute.md create mode 100644 docs/index.md create mode 100644 docs/installation.md create mode 100644 docs/usage/commands.md create mode 100644 docs/usage/config.md create mode 100644 mkdocs.yml diff --git a/README.md b/README.md index 985f2a3..0241bba 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@

- Klepto + Klepto

# Klepto @@ -8,270 +8,16 @@ [![Go Report Card](https://goreportcard.com/badge/github.com/hellofresh/klepto)](https://goreportcard.com/report/github.com/hellofresh/klepto) [![Go Doc](https://godoc.org/github.com/hellofresh/klepto?status.svg)](https://godoc.org/github.com/hellofresh/klepto) -> Klepto is a tool for copying and anonymising data +Klepto is a tool for copying and anonymising data -Klepto is a tool that copies and anonymises data from other sources. +- [Installation](docs/installation.md) +- [Usage](docs/usage/commands.md) +- [Configuration](docs/usage/config.md) -- [Readme Languages](#) - - [English (Default)](#) -- [Intro](#intro) - - [Features](#features) - - [Supported Databases](#supported-databases) -- [Requirements](#requirements) -- [Installation](#installation) -- [Usage](#usage) -- [Steal Options](#steal-options) -- [Configuration File Options](#configuration-file-options) - - [IgnoreData](#ignoredata) - - [Matchers](#matchers) - - [Anonymise](#anonymise) - - [Relationships](#relationships) -- [Contributing](#contributing) -- [License](#licence) - - -## Intro - -Klepto helps you to keep the data in your environment as consistent as possible by copying it from another environment's database. - -You can use Klepto to get production data but without sensitive customer information for your testing or local debugging. - - -### Features -- Copy data to your local database or to stdout, stderr -- Filter the source data -- Anonymise the source data - - -### Supported Databases -- PostgreSQL -- MySQL - ->If you need to get data from a database type that you don't see here, build it yourself and add it to this list. Contributions are welcomed :) - - -## Requirements - -- Active connection to the IT VPN -- Latest version of [pg_dump][pg_dump-docs] installed (_Only required when working with PostgreSQL databases_) - - -## Installation - -Klepto is written in Go with support for multiple platforms. Pre-built binaries are provided for the following: - -- macOS (Darwin) for x64, i386, and ARM architectures -- Windows -- Linux - -You can download the binary for your platform of choice from the [releases page](klepto-releases). - -Once downloaded, the binary can be run from anywhere. We recommend that you move it into your `$PATH` for easy use, which is usually at `/usr/local/bin`. - - -## Usage - -Klepto uses a configuration file called `.klepto.toml` to define your table structure. If your table is normalized, the structure can be detected automatically. - -For dumping the last 10 created active users, your file will look like this: - -```toml -[[Tables]] - Name = "users" - [Tables.Anonymise] - email = "EmailAddress" - username = "FirstName" - password = "SimplePassword" - [Tables.Filter] - Match = "users.status = 'active'" - Limit = 10 - [Tables.Filter.Sorts] - created_at = "desc" -``` - -After you have created the file, run: - -Postgres: -```sh -klepto steal \ ---from="postgres://user:pass@localhost/fromDB?sslmode=disable" \ ---to="postgres://user:pass@localhost/toDB?sslmode=disable" \ -``` - -MySQL: -```sh -klepto steal \ ---from="user:pass@tcp(localhost:3306)/fromDB?sslmode=disable" \ ---to="user:pass@tcp(localhost:3306)/toDB?sslmode=disable" \ -``` - -Behind the scenes Klepto will establishes the connection with the source and target databases with the given parameters passed, and will dump the tables. - - - -## Steal Options -Available options can be seen by running `klepto steal --help` - -``` -❯ klepto steal --help -Steals and anonymises databases - -Usage: - klepto steal [flags] - -Flags: - --concurrency int Sets the amount of dumps to be performed concurrently (default 12) - -c, --config string Path to config file (default ".klepto.toml") - -f, --from string Database dsn to steal from (default "mysql://root:root@tcp(localhost:3306)/klepto") - -h, --help help for steal - --read-conn-lifetime duration Sets the maximum amount of time a connection may be reused on the read database - --read-max-conns int Sets the maximum number of open connections to the read database (default 5) - --read-max-idle-conns int Sets the maximum number of connections in the idle connection pool for the read database - --read-timeout duration Sets the timeout for read operations (default 5m0s) - -t, --to string Database to output to (default writes to stdOut) (default "os://stdout/") - --to-rds If the output server is an AWS RDS server - --write-conn-lifetime duration Sets the maximum amount of time a connection may be reused on the write database - --write-max-conns int Sets the maximum number of open connections to the write database (default 5) - --write-max-idle-conns int Sets the maximum number of connections in the idle connection pool for the write database - --write-timeout duration Sets the timeout for write operations (default 30s) - -Global Flags: - -v, --verbose Make the operation more talkative -``` - -We recommend to always set the following parameters: -- `concurrency` to alleviate the pressure over both the source and target databases. -- `read-max-conns` to limit the number of open connections, so that the source database does not get overloaded. - - - -## Configuration File Options -You can set a number of keys in the configuration file. Below is a list of all configuration options, followed by some examples of specific keys. - -- `Matchers` - Variables to store filter data. You can declare a filter once and reuse it among tables. -- `Tables` - A Klepto table definition. - - `Name` - The table name. - - `IgnoreData` - A flag to indicate whether data should be imported or not. If set to true, it will dump the table structure without importing data. - - `Filter` - A Klepto definition to filter results. - - `Match` - A condition field to dump only certain amount data. The value may be either expression or correspond to an existing `Matchers` definition. - - `Limit` - The number of results to be fetched. - - `Sorts` - Defines how the table is sorted. - - `Anonymise` - Indicates which columns to anonymise. - - `Relationships` - Represents a relationship between the table and referenced table. - - `Table` - The table name. - - `ForeignKey` - The table's foreign key. - - `ReferencedTable` - The referenced table name. - - `ReferencedKey` - The referenced table primary key. - - - - -### IgnoreData - -You can dump the database structure without importing data by setting the `IgnoreData` value to `true`. -```toml -[[Tables]] - Name = "logs" - IgnoreData = true -``` - - -### Matchers -Matchers are variables to store filter data. You can declare a filter once and reuse it among tables: -```toml -[[Matchers]] - Latest100Users = "ORDER BY users.created_at DESC LIMIT 100" - -[[Tables]] - Name = "users" - [Tables.Filter] - Match = "Latest100Users" - -[[Tables]] - Name = "orders" - [[Tables.Relationships]] - ForeignKey = "user_id" - ReferencedTable = "users" - ReferencedKey = "id" - [Tables.Filter] - Match = "Latest100Users" -``` - -See [examples](./examples) for more. - - - -### Anonymise - -You can anonymise specific columns in your table using the `Anonymise` key. Anonymisation is performed by running a Faker against the specified column. - -```toml -[[Tables]] - Name = "customers" - [Tables.Anonymise] - email = "EmailAddress" - firstName = "FirstName" - postalCode = "DigitsN:5" - creditCard = "CreditCardNum" - voucher = "Password:3:5:true" - -[[Tables]] - Name = "users" - [Tables.Anonymise] - email = "EmailAddress" - password = "literal:1234" -``` - -This would replace all the specified columns from the `customer` and `users` tables with the spcified fake function. If a function requires arguments to be passed, we can specify them splitting with the `:` character, the default value of a argument type will be used in case the provided one is invalid or missing. There is also a special function `literal:[some-constant-value]` to specify a constant we want to write for a column. In this case, `password = "literal:1234"` would write `1234` for every row in the password column of the users table. - -#### Available data types for anonymisation - -Available data types can be found in [fake.go](pkg/anonymiser/fake.go). This file is generated from https://github.com/icrowley/fake (it must be generated because it is written in such a way that Go cannot reflect upon it). - -We generate the file with the following: - -```sh -$ go get github.com/ungerik/pkgreflect -$ fake master pkgreflect -notypes -novars -norecurs vendor/github.com/icrowley/fake/ -``` - - -### Relationships -The `Relationships` key represents a relationship between the table and referenced table. - -To dump the latest 100 users with their orders: -```toml -[[Tables]] - Name = "users" - [Tables.Filter] - Limit = 100 - [Tables.Filter.Sorts] - created_at = "desc" - -[[Tables]] - Name = "orders" - [[Tables.Relationships]] - # behind the scenes klepto will create a inner join between orders and users - ForeignKey = "user_id" - ReferencedTable = "users" - ReferencedKey = "id" - [Tables.Filter] - Limit = 100 - [Tables.Filter.Sorts] - created_at = "desc" -``` - - - ## Contributing Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us. - ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details - - -[pg_dump-docs]: https://www.postgresql.org/docs/10/static/app-pgdump.html "pg_dump docs" -[klepto-releases]: https://github.com/hellofresh/klepto/releases "Klepto releases page" diff --git a/klepto_logo.png b/docs/assets/klepto_logo.png similarity index 100% rename from klepto_logo.png rename to docs/assets/klepto_logo.png diff --git a/docs/contribute.md b/docs/contribute.md new file mode 100644 index 0000000..1927ad3 --- /dev/null +++ b/docs/contribute.md @@ -0,0 +1,9 @@ +# Contributing + +Klepto is a open source tool licensed under the MIT License, contributions are very welcomed! Some examples are: + +- Reporting/Fixing bugs +- Adding new database support +- Improving documentation + +If you would like to do so, please read the [contribution guidelines](https://github.com/hellofresh/klepto/blob/master/CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..c429e43 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,25 @@ +# Klepto + +![klepto_logo](./assets/klepto_logo.png){ width=200 } + +Klepto is a tool for copying and anonymising data. + +## Features + +Klepto core features are: + +- Copy data to your local database or to stdout, stderr +- Filter the source data +- Anonymise the source data + +## Supported Databases + +- PostgreSQL +- MySQL + +!!! note "Is your database missing?" + Contributions are very welcomed, check our Contribution guide and add it to this list. + +## License + +This project is licensed under the MIT License - see the [LICENSE](https://github.com/hellofresh/klepto/blob/master/LICENSE) file for details diff --git a/docs/installation.md b/docs/installation.md new file mode 100644 index 0000000..0568b22 --- /dev/null +++ b/docs/installation.md @@ -0,0 +1,17 @@ +# Installing Klepto + +## Requirements + +Latest version of [pg_dump](https://www.postgresql.org/docs/10/static/app-pgdump.html) installed (*Only required when working with PostgreSQL databases*) + +## Installation + +Klepto is written in Go with support for multiple platforms. Pre-built binaries are provided for the following: + +- macOS (Darwin) +- Windows +- Linux + +You can download the binary for your platform of choice from the [releases page](https://github.com/hellofresh/klepto/releases). + +Once downloaded, the binary can be run from anywhere. We recommend that you move it into your `$PATH` for easy use, which is usually at `/usr/local/bin`. diff --git a/docs/usage/commands.md b/docs/usage/commands.md new file mode 100644 index 0000000..d1335c4 --- /dev/null +++ b/docs/usage/commands.md @@ -0,0 +1,108 @@ +# Command + +Detailed list of Klepto's available commands + +```sh +klepto --help +Klepto by HelloFresh. + Takes the structure and data from one (mysql) database (--from), + anonymises the data according to the provided configuration file, + and inserts that data into another mysql database (--to). + + Perfect for bringing your live data to staging! + +Usage: + klepto [command] + +Examples: +klepto steal -c .klepto.toml|yaml|json --from root:root@localhost:3306/fromDb --to root:root@localhost:3306/toDb + +Available Commands: + help Help about any command + init Create a fresh config file + steal Steals and anonymises databases + update Check for new versions of kepto + +Flags: + -h, --help help for klepto + -v, --verbose Make the operation more talkative + --version version for klepto + +Use "klepto [command] --help" for more information about a command. +``` + +## Init + +Klepto `init` command creates a example `.klepto.toml` file. + +```sh +klepto init +• Initializing .klepto.toml +• Created .klepto.toml! +``` + +## Update + +Klepto can self update by running the `update` command + +```sh +klepto update +• Checking for new versions of Klepto! +• Klepto! updated to version v0.3.1 +``` + +## Steal + +Klepto `steal` command starts the copy from the instructions defined in `.klepto.toml` file. + +- **Postgres:** + + ```sh + klepto steal \ + --from="postgres://user:pass@localhost/fromDB?sslmode=disable" \ + --to="postgres://user:pass@localhost/toDB?sslmode=disable" \ + ``` + +- **MySQL** + + ```sh + klepto steal \ + --from="user:pass@tcp(localhost:3306)/fromDB?sslmode=disable" \ + --to="user:pass@tcp(localhost:3306)/toDB?sslmode=disable" \ + ``` + +Behind the scenes Klepto will establishes the connection with the source and target databases with the given parameters passed, and will dump the tables. + +Available options can be seen by running `klepto steal --help` + +```sh +klepto steal --help +Steals and anonymises databases + +Usage: + klepto steal [flags] + +Flags: + --concurrency int Sets the amount of dumps to be performed concurrently (default 12) + -c, --config string Path to config file (default ".klepto.toml") + -f, --from string Database dsn to steal from (default "mysql://root:root@tcp(localhost:3306)/klepto") + -h, --help help for steal + --read-conn-lifetime duration Sets the maximum amount of time a connection may be reused on the read database + --read-max-conns int Sets the maximum number of open connections to the read database (default 5) + --read-max-idle-conns int Sets the maximum number of connections in the idle connection pool for the read database + --read-timeout duration Sets the timeout for read operations (default 5m0s) + -t, --to string Database to output to (default writes to stdOut) (default "os://stdout/") + --to-rds If the output server is an AWS RDS server + --write-conn-lifetime duration Sets the maximum amount of time a connection may be reused on the write database + --write-max-conns int Sets the maximum number of open connections to the write database (default 5) + --write-max-idle-conns int Sets the maximum number of connections in the idle connection pool for the write database + --write-timeout duration Sets the timeout for write operations (default 30s) + +Global Flags: + -v, --verbose Make the operation more talkative +``` + +We recommend to always set the following parameters: + +- `concurrency` to alleviate the pressure over both the source and target databases. +- `read-max-conns` to limit the number of open connections, so that the source database does not get overloaded. diff --git a/docs/usage/config.md b/docs/usage/config.md new file mode 100644 index 0000000..7fd062b --- /dev/null +++ b/docs/usage/config.md @@ -0,0 +1,123 @@ +# Configuration + +Klepto uses a configuration file called `.klepto.toml` to define your table structure and the Anonymise functions to be used. + +If your table is normalized, the structure can be detected automatically. + +## Keys + +You can set a number of keys in the configuration file. Below is a list of all configuration options, followed by some examples of specific keys. + +- `Matchers` - Variables to store filter data. You can declare a filter once and reuse it among tables. +- `Tables` - A Klepto table definition. + - `Name` - The table name. + - `IgnoreData` - A flag to indicate whether data should be imported or not. If set to true, it will dump the table structure without importing data. + - `Filter` - A Klepto definition to filter results. + - `Match` - A condition field to dump only certain amount data. The value may be either expression or correspond to an existing `Matchers` definition. + - `Limit` - The number of results to be fetched. + - `Sorts` - Defines how the table is sorted. + - `Anonymise` - Indicates which columns to anonymise. + - `Relationships` - Represents a relationship between the table and referenced table. + - `Table` - The table name. + - `ForeignKey` - The table's foreign key. + - `ReferencedTable` - The referenced table name. + - `ReferencedKey` - The referenced table primary key. + +### **IgnoreData** + +You can dump the database structure without importing data by setting the `IgnoreData` value to `true`. + +```toml +[[Tables]] + Name = "logs" + IgnoreData = true +``` + +### **Matchers** + +Matchers are variables to store filter data. You can declare a filter once and reuse it among tables: + +```toml +[[Matchers]] + Latest100Users = "ORDER BY users.created_at DESC LIMIT 100" + +[[Tables]] + Name = "users" + [Tables.Filter] + Match = "Latest100Users" + +[[Tables]] + Name = "orders" + [[Tables.Relationships]] + ForeignKey = "user_id" + ReferencedTable = "users" + ReferencedKey = "id" + [Tables.Filter] + Match = "Latest100Users" +``` + +### **Anonymise** + +You can anonymise specific columns in your table using the `Anonymise` key. Anonymisation is performed by running a Faker against the specified column. + +```toml +[[Tables]] + Name = "customers" + [Tables.Anonymise] + email = "EmailAddress" + firstName = "FirstName" + postalCode = "DigitsN:5" + creditCard = "CreditCardNum" + voucher = "Password:3:5:true" + +[[Tables]] + Name = "users" + [Tables.Anonymise] + email = "EmailAddress" + password = "literal:1234" +``` + +This would replace all the specified columns from the `customer` and `users` tables with the spcified fake function. + +If a function requires arguments to be passed, we can specify them splitting with the `:` character, the default value of a argument type will be used in case the provided one is invalid or missing. + +There is also a special function `literal:[some-constant-value]` to specify a constant we want to write for a column. In this case, `password = "literal:1234"` would write `1234` for every row in the password column of the users table. + +Available data types can be found in [fake.go](https://github.com/hellofresh/klepto/blob/master/pkg/anonymiser/fake.go). This file is generated from [https://github.com/icrowley/fake](https://github.com/icrowley/fake) (it had to be generated because it is written in such a way that Go cannot reflect upon it). + +Bellow are the instructions used to generate the file: + +```sh +go get github.com/ungerik/pkgreflect +fake master pkgreflect -notypes -novars -norecurs vendor/github.com/icrowley/fake/ +``` + +### **Relationships** + +The `Relationships` key represents a relationship between the table and referenced table. + +To dump the latest 100 users with their orders: + +```toml +[[Tables]] + Name = "users" + [Tables.Filter] + Limit = 100 + [Tables.Filter.Sorts] + created_at = "desc" + +[[Tables]] + Name = "orders" + [[Tables.Relationships]] + # behind the scenes klepto will create a inner join between orders and users + ForeignKey = "user_id" + ReferencedTable = "users" + ReferencedKey = "id" + [Tables.Filter] + Limit = 100 + [Tables.Filter.Sorts] + created_at = "desc" +``` + +!!! info "Tip" + You can find some [configuration examples](https://github.com/hellofresh/klepto/tree/master/examples) in Klepto's repository. diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..824467f --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,16 @@ +site_name: 'Klepto' +site_description: 'Klepto User Guide' + +plugins: + - techdocs-core + +markdown_extensions: + - attr_list + +nav: + - Home: index.md + - Installation: installation.md + - Usage: + - Configuration: usage/config.md + - Commands: usage/commands.md + - Contributing: contribute.md