Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-33] Proposal for Removing SQLite Support for Metadata Databases #8874

Closed
willbarrett opened this issue Dec 19, 2019 · 21 comments
Closed
Labels
enhancement:request Enhancement request submitted by anyone from the community sip Superset Improvement Proposal

Comments

@willbarrett
Copy link
Member

[SIP] Proposal for Removing SQLite Support for Metadata Databases

Motivation

Historically, SQLite has been used as a starting point for development for new contributors to Superset. While the need for an easy onramp still exists, I believe that it is fully covered by the new Docker compose file contributed by Craig Rueda. Deprecating SQLite will encourage docker-compose as a primary starting point for new contributors to the project. SQLite support causes multiple pain points for developers, including bad migration practices with unnecessary batch of migrations and database schema drift due to SQLite’s inability to alter constraints on existing tables. Issues contributed to Github referenced below which indicate that SQLite is in some instances being used in a production-like environment. This is a strong anti-pattern that we would like to avoid that makes supporting Superset in the community more difficult.

Proposed Change

Immediate:

  • Add logging to indicate SQLite’s deprecation in the next version release
  • Add release notes in the upgrading.md file relating to deprecation
  • Announce deprecation on the mailing lists and in Slack
  • Update documentation to recommend Docker Compose with Postgres for new users

In 2 minor versions:

  • Update configuration to use a Postgres database by default
  • Update the build matrix to stop building Superset against SQLite
  • Update Cypress test configuration to use Postgres instead of SQLite
  • Add release notes in the upgrading.md file relating to removal
  • Announce removal on the mailing lists and in Slack

New or Changed Public Interfaces

Support for SQLite as a metadata database will be officially removed. SQLite may still be usable, but it will not be supported in future Alembic migrations.

Migration Plan and Compatibility

Users who leverage SQLite in production will be required to migrate their data to a different supported metadata database, either MySQL or Postgres. This can be accomplished via third-party tools.

Rejected Alternatives

The primary other option is not deprecating SQLite and not removing it from the build matrix. This is really a binary decision.

@issue-label-bot issue-label-bot bot added the enhancement:request Enhancement request submitted by anyone from the community label Dec 19, 2019
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label #enhancement to this issue, with a confidence of 0.68. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@etr2460
Copy link
Member

etr2460 commented Dec 20, 2019

This makes sense to me. If not supporting sqlite makes support and development easier, then I'm all for it. The only comment I might have is ensuring that the Docker workflow is mature enough to move forward with the deprecation. I haven't used it yet personally, maybe @craig-rueda can comment on the current state of using Docker for dev work and how easy/hard it is?

@craig-rueda
Copy link
Member

I think Docker is close, having a few more tweaks I'd like to get in before calling it 100%. People have been asking about things like bringing your own requirements, etc. I would also caution that the tests themselves need to be updated a bit in order to lessen, or remove completely, their reliance on the Superset examples. The reason why I bring this up here is that speed and ease of running tests is paramount, and forcing a "real" DB into the mix will definitely affect performance over the current SQLite setup.

As we all have skin in this game, our end goal is to make things as streamlined and approachable as possible. Anything that gets in the way or adds too much complexity should be carefully considered. Having said that, I think this will end up being a collaborative effort and will likely take some time to get "right".

@willbarrett
Copy link
Member Author

Thanks both for the comments. I think we could move ahead with deprecating SQLite, but gate the removal on the Docker workflow being mature enough for our needs. It would also be reasonable to open a PR moving all the tests over to Postgres to get an understanding of how that will affect CI run time. I'm happy to do that in the near future and report back here.

@mistercrunch mistercrunch added the sip Superset Improvement Proposal label Dec 20, 2019
@willbarrett
Copy link
Member Author

I ran a quick test (draft PR here: #8889) and the timings for the tests which were leveraging SQLite seem nearly identical on Postgres:
Screen Shot 2019-12-23 at 11 24 30 AM
I'd love for folks to double check that I haven't missed a piece of required configuration for Cypress.

@metaperl
Copy link

metaperl commented Jan 10, 2020

Users who leverage SQLite in production will be required to migrate their data to a different supported metadata database, either MySQL or Postgres. This can be accomplished via third-party tools.

This is an unactionable statement.

People who have hundreds of users relying on the current metastore have a single vague sentence to migrate their work? No link to a 3rd party tool? No information on what to do with it?

the sentence does not paint a complete picture

There will have to be changes made in Superset as well, presumably to the SQLALCHEMY_URI and perhaps other places and every place in Superset that must be changed must be documented in a way that a devops person can take action on it.

I cannot upvote this in good conscience

Until the vague, imprecise, unactionable statement This can be accomplished like so has like so replaced with a link to a easy-to-follow step-by-step set of instructions about how to make the transition to Postgres, I don't feel comfortable with this transition.

Why is said transition to be accomplished by third party tools?

If every user in superset is going to have to do this, then why not have a console script that ships with Superset and has been thoroughly tested?

@villebro
Copy link
Member

I assume the assumption behind this SIP is that anyone running Superset in production is already using Postgres or MySQL as a backend database. If this is not the case, i.e. there is in fact widespread sqlite production usage with Superset, and they need help moving the data over, I can probably put together a script for performing the migration.

FWIW I think encouraging people to use a proper OLTP backend is the right direction, especially as Superset is still in v. 0.x, and therefore I vote +1 for the deprecation of sqlite.

@willbarrett
Copy link
Member Author

Hi @metaperl, thanks for the feedback. A couple of points:

@villebro is correct, SQLite is not a production-ready datastore, so this should only affect development setups for the vast majority of organizations. Migrating a development setup to our dockerized system should not require bringing development data along for the ride. I don't foresee many (if any) having to transition metadata databases as a result of this change.

You'll notice that there is a deprecation period for the SQLite support - it will not be yanked immediately. If we hear feedback during that time that organizations will need support migrating the data to a different datastore it will be a good idea to provide a migration path to the community.

I appreciate your enthusiasm. One minor process point is important here though - only Apache Software Foundation committers are able to vote on SIPs. I myself am not yet a committer, so I have no ability to vote yea or nea on this proposal.

Does your organization use SQLite in production? If so, I would love to learn more about your deployment so that we can support organizations like yours better!

@nytai
Copy link
Member

nytai commented Jan 10, 2020

+1 this change may cause some pain for some, however, as @villebro mentions, I think it's important to encourage people to use a proper OLTP db in production settings and doing so is worth the pain. I've noticed plenty of instances (in code) that try to work around sqlite's shortcomings and it seems to me the code is suffering in order to support a [slightly] less painful development workflow. I'm in favor of deprecating, especially as we gear up for 1.0 release

@xinbinhuang
Copy link

Hi @willbarrett ,

I am pretty new to superset so my comment may not be accurate, but regarding the batch migrations:

SQLite support causes multiple pain points for developers, including bad migration practices with unnecessary batch of migrations

This should not have an actual impact on the migration process. The method bathch_alter_table only does an actual batch migration on SQLite, while keeping the normal ALTER operations on other databases. Alembic batch

As for SQLite for production, I am neither support nor against it, as it does give you an easy deployment with small users' size. However, as we are living in a world of containerization, deployment is not an issue either.

@willbarrett
Copy link
Member Author

Thanks for the comments @xinbinhuang - I wasn't aware of that behavior on batch_alter_table. The main issue that we've had with migrations on SQLite is that constraints, once set on a table, cannot be altered. Even the batch_alter_table system doesn't provide a workaround. This leads to a lot of try/except blocks in our migration files, and thus schema drift between production systems. It's really painful to try to figure out what different organizations have as an underlying schema right now.

I don't see it mentioned here in the history, so I should mention that this SIP has gone through the Apache voting process and has been approved. I'll be working on adding deprecation notices in the next week or so.

@sairamkrish
Copy link

sairamkrish commented Nov 25, 2021

The above thread says:

support for SQLite is not removed immediately. But in future it will be removed

Current Apache superset documentation still has SQLite as the default metadata database.

But in reality, Apache superset already doesn't work with SQLite. We can't get started with Apache Superset + SQLite as metadata store today (2021-11-25). Using Apache superset version 1.3.2.

I have raised a new issue in this regard. Please comment there if you are also facing this issue.

(superset-experiment) ➜  superset-experiment git:(main) ✗ PYTHONPATH=.:$PYTHONPATH superset db upgrade
Loaded your LOCAL configuration at [/Users/sairam/superset-experiment/superset_config.py]
logging was configured successfully
2021-11-25 07:28:07,681:INFO:superset.utils.logging_configurator:logging was configured successfully
2021-11-25 07:28:07,695:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
  warnings.warn(
No PIL installation found
2021-11-25 07:28:07,915:INFO:superset.utils.screenshots:No PIL installation found
WARNI [alembic.env] SQLite Database support for metadata databases will         be removed in a future version of Superset.
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade 12d55656cbca -> 2591d77e9831, user_id
Traceback (most recent call last):
  File "/Users/sairam/superset-experiment/.venv/bin/superset", line 8, in <module>
    sys.exit(superset())
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/flask/cli.py", line 586, in main
    return super(FlaskGroup, self).main(*args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/flask/cli.py", line 426, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/flask_migrate/cli.py", line 149, in upgrade
    _upgrade(directory, revision, sql, tag, x_arg)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/flask_migrate/__init__.py", line 98, in wrapped
    f(*args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/flask_migrate/__init__.py", line 185, in upgrade
    command.upgrade(config, revision, sql=sql, tag=tag)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/alembic/command.py", line 320, in upgrade
    script.run_env()
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/alembic/script/base.py", line 563, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 92, in load_python_file
    module = load_module_py(module_id, path)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 108, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/superset/migrations/env.py", line 124, in <module>
    run_migrations_online()
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/superset/migrations/env.py", line 116, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/alembic/runtime/environment.py", line 851, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/alembic/runtime/migration.py", line 620, in run_migrations
    step.migration_fn(**kw)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/superset/migrations/versions/2591d77e9831_user_id.py", line 36, in upgrade
    batch_op.create_foreign_key("user_id", "ab_user", ["user_id"], ["id"])
  File "/Users/sairam/.pyenv/versions/3.8.1/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/alembic/operations/base.py", line 374, in batch_alter_table
    impl.flush()
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/alembic/operations/batch.py", line 118, in flush
    existing_table = Table(
  File "<string>", line 2, in __new__
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned
    return fn(*args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 563, in __new__
    metadata._remove_table(name, schema)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 558, in __new__
    table._init(name, metadata, *args, **kw)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 647, in _init
    self._autoload(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 670, in _autoload
    autoload_with.run_callable(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1653, in run_callable
    return callable_(self, *args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 484, in reflecttable
    return insp.reflecttable(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 684, in reflecttable
    self._reflect_fk(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 868, in _reflect_fk
    sa_schema.Table(
  File "<string>", line 2, in __new__
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned
    return fn(*args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 563, in __new__
    metadata._remove_table(name, schema)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 558, in __new__
    table._init(name, metadata, *args, **kw)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 647, in _init
    self._autoload(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 670, in _autoload
    autoload_with.run_callable(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1653, in run_callable
    return callable_(self, *args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 484, in reflecttable
    return insp.reflecttable(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 684, in reflecttable
    self._reflect_fk(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 868, in _reflect_fk
    sa_schema.Table(
  File "<string>", line 2, in __new__
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned
    return fn(*args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 563, in __new__
    metadata._remove_table(name, schema)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 558, in __new__
    table._init(name, metadata, *args, **kw)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 647, in _init
    self._autoload(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 670, in _autoload
    autoload_with.run_callable(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1653, in run_callable
    return callable_(self, *args, **kwargs)
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 484, in reflecttable
    return insp.reflecttable(
  File "/Users/sairam/superset-experiment/.venv/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 678, in reflecttable
    raise exc.NoSuchTableError(table.name)
sqlalchemy.exc.NoSuchTableError: ab_user

Most amazing feature about Apache Superset is that it was so good to get started quickly. SQLite as metadata store was helping on this.

In production, if someone uses SQLite, I believe that is their responsibility.
Drawing parallels, if someone in production, has Superset admin password as password or some other easy to hack password, are we going to remove the support for all common hackable passwords for Apache Superset ?

For small, simple application - SQLite was super useful as a Superset Metadata store. Many applications start small and grow big. I use Superset with SQLite as metadata database in all my data science personal projects, experimental applications etc till the application proves it's existence.

As @xinbinhuang pointed out, Alembic supports SQLite & non SQLite flows. There is a small additional step that we need to do while writing database migration scripts.

I tried finding alternative to SQLite. In Python + SQLAlchemy world, there is no good alternative to SQLite, as far as I know. H2 & HyperSQL are java based and provide only jdbc drivers. SQLAlchemy python+jdbc driver dialects are not mature enough even in November 2021.

@john-bodley
Copy link
Member

john-bodley commented Jan 4, 2024

@betodealmeida, @dpgaspar, @michael-s-molina, @villebro et al. I see there's logic to also prevent SQLite being used as an analytics (as opposed to metadata) database and was wondering whether there was merit in extending this SIP (or authoring another SIP) to remove SQLite globally—including as a Celery backend.

@betodealmeida
Copy link
Member

I honestly don't understand what we gain from removing support for SQLite.

@villebro
Copy link
Member

villebro commented Jan 4, 2024

I honestly don't understand what we gain from removing support for SQLite.

I'm going to go back to some extent on what I've said previously, but I have to agree with @betodealmeida here. While I agree that vanilla SQLite is not a good solution for a production Superset deployment, things have changed somewhat since the original SIP discussion. There are nowadays many highly available SQLite variants out there, and being able to run Superset on SQLite makes it much more approachable for quick testing than if we were to require using Postgres or MySQL. Also, developing on docker compose isn't always as convenient as running a full Flask backend locally. I, for one, still prefer running my devenv without docker compose, and I believe so do many other committers. Also, the fact that our ephemeral env config s using SQLite says a good deal about the convenience of using SQLite over Postgres/MySQL.

I think having SQLite backend support behind a config flag that's disabled by default is IMO a good solution, as it will serve as a good warning for anyone considering using it for production use. So I vote for keeping things as they are.

FYI @john-bodley

@michael-s-molina
Copy link
Member

michael-s-molina commented Jan 4, 2024

@villebro @john-bodley If we are going to preserve SQLite support then I think we should resolve the following error that frequently happens when testing RCs:

Screenshot 2024-01-04 at 08 52 54

@michael-s-molina
Copy link
Member

michael-s-molina commented Jan 4, 2024

We should probably also submit a lazy consensus email to discard this SIP if the general opinion changed.

@rusackas for awareness

@villebro
Copy link
Member

villebro commented Jan 4, 2024

@michael-s-molina @rusackas maybe we could discuss this in the town hall tomorrow?

@john-bodley
Copy link
Member

@michael-s-molina the "error" you reported is related to using SQLite as an analytical database as opposed to Superset's metadata database.

@michael-s-molina
Copy link
Member

There's a 4.0 proposal related to this SIP that will be submitted for lazy consensus. If we think that we shouldn't drop support for SQLite, let's just downvote the proposal during that phase and we can officially discard this SIP.

@betodealmeida @villebro @rusackas @john-bodley

@michael-s-molina michael-s-molina moved this from [RESULT][VOTE] Approved to Denied / Closed / Discarded in SIPs (Superset Improvement Proposals) Jan 15, 2024
@michael-s-molina
Copy link
Member

This SIP was considered obsolete by the 4.0 lazy consensus email and it's now officially discarded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement:request Enhancement request submitted by anyone from the community sip Superset Improvement Proposal
Projects
Status: Denied / Closed / Discarded
Development

No branches or pull requests