Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using a PG database for the config database #3605

Closed
cgardens opened this issue May 25, 2021 · 15 comments · Fixed by #4670
Closed

Allow using a PG database for the config database #3605

cgardens opened this issue May 25, 2021 · 15 comments · Fixed by #4670
Assignees
Labels
airbyte-cloud type/enhancement New feature or request

Comments

@cgardens
Copy link
Contributor

cgardens commented May 25, 2021

Tell us about the problem you're trying to solve

Current we a use a homegrown file "database" to store configs (see: ConfigPersistence and DefaultConfigPersistence):

  • As someone setting up Airbyte, I'd like to be able to instead configure a PG database instead. It should be able to connect to either the PG database that we run as part of docker compose or an externally one.
  • For the docker (and K8s) version of Airbyte, I'd like the PG database that Airbyte already runs to be the default offering
  • Note: While we will reuse the same db instance that already runs in our docker compose set up, within that instance we should be using a separate database in postgres from the one we already use for jobs.

┆Issue is synchronized with this Asana task by Unito

@tuliren
Copy link
Contributor

tuliren commented May 26, 2021

Does this mean that in the future, the config migration can be done through a database migration? If so, we can leverage mature database migration systems to do it smoothly (e.g. Rails migration).

@cgardens
Copy link
Contributor Author

Maybe!

Couple things:

  • Rails doesn't really handle data migration, right? It just handles schema migration.
  • I don't know that we want to lock ourselves into PG. It's convenient for us (and likely a good piece of tech to use for Airbyte Cloud at the start), but I think allowing people to use whatever db they have / want might be nice (e.g. MySQL, Mongo, etc). Depending on how requirements scale up, being able to swap in a distributed database option might be nice. Although, I guess if we are at that scale exporting the entire database probably is no longer feasible anymore anyway.

@tuliren
Copy link
Contributor

tuliren commented May 26, 2021

Rails doesn't really handle data migration, right? It just handles schema migration.

It can handle data migration as well. You can write anything in the up and down methods.

I don't know that we want to lock ourselves into PG.

This is actually a perk of using something like Rails migration. It is not database specific. You can even use SQLite, which is a file based database, similar to our current set up.

@eamontaaffe
Copy link

eamontaaffe commented May 31, 2021

Are the data and workspace volumes required for persistence also? Or are these volumes ephemeral?

Will we be able to?

  • Deploy Airbyte (using an external database for config) on an instance (AWS EC2/DO Droplett/GCP Compute Engine).
  • Run it for a while syncing data between sources and destinations.
  • Tear down the initial instance, leaving the external database as it is.
  • Then spin up a new instance to take it's place (maybe we need to resize).
  • Resume syncing data between sources and destinations.

If so, this would be amazing.

@eamontaaffe
Copy link

Just having a bit of a closer look now. It appears like Postgres db only contains some of the data. The data volume is also required for the full state. So the scenario I posted above would not work.

@davinchia
Copy link
Contributor

@tuliren Although I like using existing mature tools, I'd prefer to not pull in parts of the ruby ecosystem into our stack. I'd vote for continuing to use flyway for schema migrations and our current migration process for data migration. I'm down to use an existing tool if we find one that is self-contained and doesn't carry other build baggage (e.g. the data migration version of flyway)

@phamduyly
Copy link

Hi all,

What about making a way to version controlling the configuration files? Is there any way to make the volume not so persistent and then Airbyte will reread new configuration without any restart docker?

Or is it to hard as the architecture don't allow it?

@derekperkins
Copy link

👍 for MySQL compatibility

@sherifnada sherifnada removed this from the Core - 2021-07-07 milestone Jul 7, 2021
@sherifnada sherifnada added this to the Core - 2021-07-14 milestone Jul 7, 2021
@rclmenezes
Copy link
Contributor

Redash uses their DB for non-secret configuration and it's great. New versions are just DB migrations, which is easy to do.

As a stopgap solution, we persist our configuration via a crontab and S3:

# Download the current docker volumes if they exist
aws s3 sync s3://${s3_bucket_name}/volumes /var/lib/docker/volumes || true

# Make our crontab file
cat > /root/crontab << EOF
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# Every 5 min, save the docker volume to S3
*/5 * * * * aws s3 sync /var/lib/docker/volumes/ s3://${s3_bucket_name}/volumes > /home/ubuntu/crontab.log 2>&1
EOF

crontab /root/crontab

@ShahNewazKhan
Copy link

What is the process to migrate from pg db on k8s pvc to an external db?

@tuliren
Copy link
Contributor

tuliren commented Jan 29, 2022

What is the process to migrate from pg db on k8s pvc to an external db?

@ShahNewazKhan, you can find the docs here.

@ShahNewazKhan
Copy link

What is the process to migrate from pg db on k8s pvc to an external db?

@ShahNewazKhan, you can find the docs here.

Thanks for the pointer @tuliren! However the situation I have in hand is I have deployed airbyte onto a k8s cluster that provisioned a pg db on pvc I was looking for instructions to migrate this pg db on k8s to an external db.

Do I have to perform any other steps other than exporting the db to an external db and rebooting airbyte with connections to the new external db?

@davinchia
Copy link
Contributor

You'll also have to remove the additional kubernetes deploys and configure the deploy so airbyte connects to your external database. See https://docs.airbyte.com/operator-guides/configuring-airbyte#database

@ShahNewazKhan
Copy link

Makes sense, thanks!

@userbradley
Copy link

The docs site is still not updated

Link

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airbyte-cloud type/enhancement New feature or request
Projects
None yet