Skip to content
cjlee112 edited this page Dec 18, 2013 · 11 revisions

Background

  • selectedpapers.net is hosted by Webfaction, on a shared hosting platform.
  • you perform most site management tasks by logging in to the shared host via a command like ssh [email protected], where username is your login on the shared host.
  • site management users are members of the spnet group, which gives them privileges to perform tasks such as restarting the spnet webserver process, etc.
  • The spnet site consists of two critical processes: the spnet web server (basically spnet/web.py, typically running under spnet/keeprunning.py), and the mongoDB database. Both of these processes must be running in order for the selectedpapers.net website to work.
  • In addition, there are three non-critical processes running: the Google+ polling service (which updates our database with new posts retrieved from Google+; this is basically spnet/gplus.py typically running under spnet/poll.py); an HTTP to HTTPS redirect service (which redirects requests for http://selectedpapers.net to https://selectedpapers.net); and the docs.selectedpapers.net service (which serves static HTML files for the user documentation).
  • the code for each of these are located in ~spnet/webapps: either in webapps/spnet, webapps/mongodb, or the webapps/redirect directories.
  • we generally run our server processes within screen sessions which we can detach re-attach at will.
  • we use Python virtualenv to manage the dependencies for the Python spnet code. This is required first of all because we cannot install dependency packages to the system Python (because we have no root privileges on a shared hosting platform), but also enables us to keep different "virtual environments" which we can test prior to putting them into production. Currently the production virtual environment is in ~spnet/_ve1; to "activate" usage of that virtual environment within a given shell session, run the command source ~spnet/_ve1/bin/activate. Thereafter all Python commands (such as python, pip etc.) will use that virtual environment. If you forget to do this, attempting to run spnet code will result in ImportError due to missing dependencies.
  • Our shared hosting now allows us up to 512MB RAM usage. Hence we have to be very careful not to exceed that. Unfortunately we have two processes that seem to consume large amounts of RAM: mongoDB (up to 100 MB in our current usage); and the Python spnet web server. Python seems to grow in memory usage over time (not due to any apparent memory leak, but rather due to the well-known problem of freelist "arena" fragmentation). So we currently run it in a self-monitoring mode that automatically restarts itself whenever its memory usage exceeds 150MB.
  • the current spnet web server configuration uses file-based session management. This means that restarting the spnet web server (e.g. kill the spnet/web.py process and start a new one) should have minimal effect on logged-in users. Specifically, the new server process will continue to recognize them as logged in; the only way a user would notice the restart is if they submit a request during the brief moment between shutting down the old server process and starting the new one (in which case the browser request might delay a bit, or possibly time out, requiring the user to click reload).
  • we manage deployment of new versions of spnet code using Git. There are three key Git branches that we use to manage this process: the master branch represents the main line of spnet code development, i.e. features and bug fixes that we intend to push to the production web server (no experimental code, please!); the production branch represents webfaction-specific config files or code (e.g. the cp.conf web server config file containing the port number and other settings required to run on our webfaction shared host); the production_secret branch contains confidential key values that the selectedpapers.net server requires. The production_secret branch is confidential and should never be pushed to a publicly accessible Git repository. Changes to these key values can only be made on the production_secret branch; changes to the webfaction-specific config or code should be made on the production branch; and all other code changes should be made on the master branch. We then merge in changes from master and production into the production_secret branch, in order to put those changes into production.
  • by definition, the production repository always runs on the production_secret branch.
  • if the github repository contains new code on the master branch, we can pull that into the current production_secret branch using git pull origin master.

Site Management tasks

Here we briefly summarize the various standard operation procedures (SOP) to be documented here:

  • production server update SOP: roll out updated spnet code by pulling new code to the production server and restarting the web server process.
  • indexing restart SOP: restart the Google+ polling process (e.g. to use updated indexing code).
  • mongoDB backup and restore SOP
  • cold start SOP: starting spnet services from scratch, e.g. under the extreme scenario where the shared hosting server was rebooted and spnet services did not auto-start.

Production server update SOP

We manage all aspects of the server code, configuration and keys using Git. This means that we enjoy the full power of Git for deploying updates (pull the desired changes into the current production version), rescuing us from bad updates (just checkout the previous "good" snapshot), etc. First, go to the main spnet code directory in the production repository:

$ cd ~spnet/webapps/spnet/spnet/spnet

Paranoid preliminaries

We first want to make absolutely sure that the production repository is in the expected clean state:

  • on production_secret branch;

  • no file modifications vs. HEAD

      $ git status -uno
    

To be absolutely sure of being able to revert to the current state (if the update goes wrong for any reason), run

$ git log

and write down the current commit ID.

To ensure that you can quickly connect to whatever server process you need to, you should also list the current server processes and screen sessions:

$ ps x
$ screen -ls

In particular, note the process IDs of the following processes:

  • python watchmem.py: this is an actual spnet web server process, monitoring its own memory usage (it automatically exits if its usage goes over 150 MB).
  • python keeprunning.py: this simply restarts a new web server process (watchmem.py) whenever it exits.
  • mongod: the mongo database server
  • python poll.py: the Google+ indexing poll (updates index every 5 minutes). This is a non-critical process.

Updating the server code

We stage code updates for production deployment on the GitHub master branch. A typical update simply pulls the latest updates from that branch to our running production_secret branch. Our web server runs in auto_reload=False mode, so we can safely change the code files without affecting the running web server process. Once we are happy with the updated state of the code files, we simply force the web server process to restart (which will make it load our code updates).

To pull the latest master updates from the GitHub cjlee112/spnet.git repository (origin) into our running production_secret branch:

$ git pull origin master

If anything goes wrong, you can always revert the current production_secret branch back to the previous "good" snapshot:

$ git reset --hard COMMIT_ID

where COMMIT_ID is the ID you obtained from git log above.

Restarting the web server

Ordinarily, all you have to do is kill the current watchmem.py process, which will force the keeprunning.py process to restart it (using the latest code):

$ kill PROCESS_ID

where PROCESS_ID is the watchmem.py process ID you noted above. Now check that you can access the selectedpapers.net site from any web browser, and verify that the new features / fixes are running as expected. Note that you do not ordinarily need to do anything to restart the Google+ indexing, since the polling code will automatically start using the updated code the next time it runs the indexing process (every 5 minutes).

What to do if something goes wrong

  • if the updated server does not appear to be "ready for prime time" (not "production quality"), simply revert the code to the previous snapshot and kill the watchmem.py to restart with that code:

      $ git reset --hard COMMIT_ID
      $ ps x
      $ kill PROCESS_ID
    

    where COMMIT_ID is the ID you obtained from git log above, and PROCESS_ID is the watchmem.py process ID from ps x.

  • if the web server process fails to come up, quickly connect to the screen session running the keeprunning.py process:

      $ screen -r SCREEN_ID
    

    where SCREEN_ID is one of the screen sessions listed by screen -ls. If you see error messages indicating that the current code is broken (e.g. crashes on startup), revert to the previous snapshot as described above. If for some reason keeprunning.py has exited, simply restart it in the same screen session. When in doubt, revert to the previous snapshot.

  • If you see messages that indicate a failure to connect to MongoDB, check the status of the mongod server in another window, by connecting to its screen session (screen -r SCREEN_ID). If mongod has exited or seems to be in a bad state, restart it (all you should have to do is hit up-arrow key to recall the previous command in that shell). If mongod seems fine or the spnet code continues to fail to connect to mongodb after a mongod restart, changes are the spnet code or config is broken, so revert to the previous good snapshot.

Updating docs.selectedpapers.net

If the documentation has changed, you should upload the new HTML docs (generated by sphinx) as follows (this requires that you have Sphinx installed), on your development platform:

$ cd spnet/doc
$ make html
$ cd _build/html
$ tar cf ../../docs.tar *
$ cd ../..
$ scp docs.tar [email protected]:
$ ssh [email protected]
$ cd webapps/htdocs
$ tar xf ~/docs.tar

Use your web browser to check that the docs on http://docs.selectedpapers.net reflect the latest updates.

Server cold start

We need to start two critical processes: the mongod server, and the spnet web server.

mongod startup

Start it in a screen session as follows:

$ cd ~spnet/webapps/mongodb
$ screen
$ mongodb-linux-x86_64-2.4.1/bin/mongod --config ~spnet/webapps/spnet/spnet/mongodb/mongodb.conf

You should see the server process start successfully. Type Control-A, D to detach from the screen session.

spnet web server startup

Again we start it in a screen session:

$ cd ~spnet/webapps/spnet/spnet/spnet
$ screen
$ source ~spnet/_ve1/bin/activate
$ python keeprunning.py

You should see the server process start successfully. At this point, you should be able to access the selectedpapers.net website from any web browser. Type Control-A, D to detach from the screen session.

Google+ indexing startup

This is a non-critical process, but needed for updating from Google+. Again we start it in a screen session:

$ cd ~spnet/webapps/spnet/spnet/spnet
$ screen
$ source ~spnet/_ve1/bin/activate
$ python poll_gplus.py

You should see the server process start successfully. Type Control-A, D to detach from the screen session.

MongoDB Backup Procedure

The basic backup procedure, from webapps/mongodb directory:

rm -rf dump
mongodb-linux-x86_64-2.4.1/bin/mongodump --port 26966 --username backup --db spnet --password
tar cvzf dump.tgz dump

I then scp'd the dump.tgz file (18 MB) to a safe location.

Restore procedure

Basic restore procedure: from webapps/mongodb directory, we first extract the dump.tgz file to create the dump directory that mongorestore will read, and we use its --drop option to make it drop any existing collection before restoring data from that collection:

tar xzvf dump.tgz
mongodb-linux-x86_64-2.4.1/bin/mongorestore --port 26966 --drop --username admin --password

If you're restoring to a mongod instance that uses no authentication, you can leave off the username and password options. If you're restoring to a mongod instance that uses the default port, you can leave off the --port option.

If you're starting with an empty database you'll need to ensure that the admin user can write to any database; see the next step.

dump/restore authentication setup

Note this first required creating backup user in the spnet db:

mongodb-linux-x86_64-2.4.1/bin/mongo --port 26966 admin
> db.auth('admin', PASSWORD)
> db.system.users.update({ user: "admin" }, { $set: { roles: ["userAdminAnyDatabase", "readWriteAnyDatabase"] } } )
> use spnet
> db.addUser( { user: "backup", roles: ["readWrite", "dbAdmin", "userAdmin"], pwd: PASSWORD } )