Skip to content
This repository has been archived by the owner on Jun 30, 2024. It is now read-only.

(psycopg2.OperationalError) FATAL: the database system is starting up #1191

Closed
yarikoptic opened this issue Jan 14, 2019 · 10 comments
Closed

Comments

@yarikoptic
Copy link
Contributor

in the light of #1178 , runestone server could be ran very shortly after service postresql start, which is what I think leads to:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  the database system is starting up
full traceback:
  File "/usr/local/lib/python2.7/dist-packages/runestone/__init__.py", line 1, in <module>
    from .activecode import *
  File "/usr/local/lib/python2.7/dist-packages/runestone/activecode/__init__.py", line 1, in <module>
    from .activecode import *
  File "/usr/local/lib/python2.7/dist-packages/runestone/activecode/activecode.py", line 24, in <module>
    from runestone.server.componentdb import addQuestionToDB, addHTMLToDB, engine, meta
  File "/usr/local/lib/python2.7/dist-packages/runestone/server/componentdb.py", line 39, in <module>
    questions = Table('questions', meta, autoload=True, autoload_with=engine)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/schema.py", line 457, in __new__
    metadata._remove_table(name, schema)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/schema.py", line 452, in __new__
    table._init(name, metadata, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/schema.py", line 534, in _init
    include_columns, _extend_on=_extend_on)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/schema.py", line 547, in _autoload
    _extend_on=_extend_on
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2055, in run_callable
    with self.contextual_connect() as conn:
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2123, in contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2162, in _wrap_pool_connect
    e, dialect, self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1476, in _handle_dbapi_exception_noconnection
    exc_info
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 2158, in _wrap_pool_connect
    return fn()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 400, in connect
    return _ConnectionFairy._checkout(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 788, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 529, in checkout
    rec = pool._do_get()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1193, in _do_get
    self._dec_overflow()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 1190, in _do_get
    return self._create_connection()
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 347, in _create_connection
    return _ConnectionRecord(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 474, in __init__
    self.__connect(first_connect_check=True)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 671, in __connect
    connection = pool._invoke_creator(self)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 106, in connect
    return dialect.connect(*cargs, **cparams)
  File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 412, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/usr/local/lib/python2.7/dist-packages/psycopg2/__init__.py", line 130, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  the database system is starting up
FATAL:  the database system is starting up

Outside workaround was to add an generous sleep of 100 sec (10 was not enough in my case, oldish server), and then it started ok. So ideally, there is a guard within RunestoneServer to loop for some predefined timeout period (e.g. 2 minutes) trying to establish connection e.g. each 5 seconds.

@bnmnetp
Copy link
Member

bnmnetp commented Jan 14, 2019

Hmm.... Aren't there system startup gates to handle just this kind of situation? Seems like in the Unix startup sequence this kind of thing would happen all the time.

@bnmnetp
Copy link
Member

bnmnetp commented Jan 14, 2019

If pressed I would put this in rsmanage run

@yarikoptic
Copy link
Contributor Author

Well, for me it is just an observation. Also, I think it is common just to initiate the startup process for some "long to start" processes. I believe we do that e.g. for fail2ban - daemon is started "fully" but then it starts establishing its jails and that might take awhile. https://serverfault.com/questions/886641/systemctl-wait-for-start hints that it is not uncommon. I guess also in typical usecases, the time passes from starting core service such as postgresql to the moment of starting leaf projects, like runestone service, would take a bit so it would not be noted. In my case it didn't happen when I build a new docker image, but when I start from a commit of a running container (so may be it gets committed a bit dirty and services require some house keeping to start properly) - I run into above case.

@bnmnetp
Copy link
Member

bnmnetp commented Jan 18, 2019

Wouldn’t it be better to set up Postgres in its own container and have it use some kind of persistent volume storage for the database? Using git to provide persistence across docker runs seems like a really bad idea.

@yarikoptic
Copy link
Contributor Author

Sorry for a lengthy reply, I will try to be more succinct in the future ;-)

Wouldn’t it be better to set up Postgres in its own container and have it use some kind of persistent volume storage for the database?

"docker compose" recipe (to orchestrate multiple containers - one for DB and one for the server) is one of the possible TODOs I left in #1178 . Myself I haven't used it much (if at all) so I can't judge objectively, but for me it always felt like an overkill for any of my usecases, and somewhat lacking the point whenever all components could be consistently and coherently (the same OS distribution) be installed within a single container. May be in some cases where containers modularity is really needed (e.g. to provide reverse proxy service on top of regular www server containers etc), I could see the point, but not here really.
In my case of #1178, the "easier" the better. Having a single container (environment) having both DB and server running already was really helpful as e.g. to generate a commit/image of the current state and then run troubleshooting/experimentation on it, thus not touching original DB, but just its "clone" in that new image/container. Really handy! That is why the https://github.com/dartmouth-pbs/psyc161-wi19-srv/blob/public/start helper takes the port ;-) Also for a newcomer it is much easier to just build a single image and start playing with it (my usecase to start with). So, overall -- may be for some DevOps for whom juggling containers is their bread, multi-images compose would be of use, for me it is typically not needed complication. Moreover I am not sure it would resolve this issue if the RunestoneServer starts right away after DB finished starting.

@yarikoptic
Copy link
Contributor Author

Using git to provide persistence across docker runs seems like a really bad idea.

Why? ;-)
Note: I was not going to use Git directly for storing data files, but rather keep them in "unlocked" state of the V7 mode of git-annex. I could probably say that like for those DevOps docker/containers is their bread, Git-annex somewhat became my butter through the past years (see http://datalad.org and https://datasets.datalad.org if you would like to discover more). We use git/git-annex tandem to version control virtually everything from code, to data files, and over to complete computing environments images (Docker and singularity). So in my world, DB files should be no special ;-) As an advantage I am getting a full history of the DB changes (with commits upon restarts), being able to replicate any previous state via a single "git checkout", share/backup/etc the entire state of the project while only keeping the "software" (server, tools) in a rebuildable (and thus throwaway) docker image.

@bnmnetp
Copy link
Member

bnmnetp commented Jan 21, 2019

Why — because that’s why databases exist: to provide super reliable persistence of the data they store. Forcing your database to “recreate” itself from some undetermined state that is captured by git just seems mean, and honestly it sounds to me like a good way to end up losing data.

Maybe I’m old school — I’ve been using databases for 25 years, and only recently starting to discover Docker. I have one instance of Runestone hosted on webfaction that has been running since 2012. The server has only needed to be rebooted a handful of times in all those years and I’ve never lost any data. Maybe I’m just missing something on why one would need to regularly reboot / restart their Runestone services?? I’m willing to adapt/modernize but I’d like to understand why before I invest the time needed to do so.

I’ve been using git for years but this is the first I’ve heard of git-annex. — always something new to learn.

@yarikoptic
Copy link
Contributor Author

Why — because that’s why databases exist: to provide super reliable persistence of the data they store. Forcing your database to “recreate” itself from some undetermined state that is captured by git just seems mean, and honestly it sounds to me like a good way to end up losing data.

oh no -- I don't want to recreate it from some state -- I want to capture the entire state after DB is successfully powered down. Isn't DB underneath uses regular files for the storage?
Indeed though it is not a common practice to bring it down/spin it back up, but it shouldn't be abnormal either. If DB cannot be properly fully shutdown and powered back up - that is where I would say the "losing data" would come from.

@yarikoptic
Copy link
Contributor Author

anyways -- git/git-annex desire is orthogonal to this issue that it might take DB service some time to spin back up, and then Runestone server start would keep crashing until the DB is fully up.

@github-actions
Copy link

Stale issue message

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants