-
Notifications
You must be signed in to change notification settings - Fork 11
Site Management
- selectedpapers.net is hosted by Webfaction, on a shared hosting platform.
- you perform most site management tasks by logging in to the shared host via a command like
ssh [email protected]
, where username is your login on the shared host. - site management users are members of the
spnet
group, which gives them privileges to perform tasks such as restarting the spnet webserver process, etc. - The spnet site consists of two critical processes: the spnet web server (basically
spnet/web.py
, typically running underspnet/keeprunning.py
), and the mongoDB database. Both of these processes must be running in order for the selectedpapers.net website to work. - In addition, there are three non-critical processes running: the Google+ polling service (which updates our database with new posts retrieved from Google+; this is basically
spnet/gplus.py
typically running underspnet/poll.py
); an HTTP to HTTPS redirect service (which redirects requests for http://selectedpapers.net to https://selectedpapers.net); and the docs.selectedpapers.net service (which serves static HTML files for the user documentation). - the code for each of these are located in
~spnet/webapps
: either inwebapps/spnet
,webapps/mongodb
, or the webapps/redirect directories. - we generally run our server processes within
screen
sessions which we can detach re-attach at will. - we use Python
virtualenv
to manage the dependencies for the Python spnet code. This is required first of all because we cannot install dependency packages to the system Python (because we have no root privileges on a shared hosting platform), but also enables us to keep different "virtual environments" which we can test prior to putting them into production. Currently the production virtual environment is in~spnet/_ve1
; to "activate" usage of that virtual environment within a given shell session, run the commandsource ~spnet/_ve1/bin/activate
. Thereafter all Python commands (such aspython
,pip
etc.) will use that virtual environment. If you forget to do this, attempting to run spnet code will result inImportError
due to missing dependencies. - Our shared hosting now allows us up to 512MB RAM usage. Hence we have to be very careful not to exceed that. Unfortunately we have two processes that seem to consume large amounts of RAM: mongoDB (up to 100 MB in our current usage); and the Python spnet web server. Python seems to grow in memory usage over time (not due to any apparent memory leak, but rather due to the well-known problem of freelist "arena" fragmentation). So we currently run it in a self-monitoring mode that automatically restarts itself whenever its memory usage exceeds 150MB.
- the current spnet web server configuration uses file-based session management. This means that restarting the spnet web server (e.g. kill the
spnet/web.py
process and start a new one) should have minimal effect on logged-in users. Specifically, the new server process will continue to recognize them as logged in; the only way a user would notice the restart is if they submit a request during the brief moment between shutting down the old server process and starting the new one (in which case the browser request might delay a bit, or possibly time out, requiring the user to click reload). - we manage deployment of new versions of spnet code using Git. There are three key Git branches that we use to manage this process: the master branch represents the main line of spnet code development, i.e. features and bug fixes that we intend to push to the production web server (no experimental code, please!); the production branch represents webfaction-specific config files or code (e.g. the
cp.conf
web server config file containing the port number and other settings required to run on our webfaction shared host); the production_secret branch contains confidential key values that the selectedpapers.net server requires. The production_secret branch is confidential and should never be pushed to a publicly accessible Git repository. Changes to these key values can only be made on theproduction_secret
branch; changes to the webfaction-specific config or code should be made on theproduction
branch; and all other code changes should be made on themaster
branch. We then merge in changes frommaster
andproduction
into theproduction_secret
branch, in order to put those changes into production. - by definition, the production repository always runs on the
production_secret
branch. - if the github repository contains new code on the
master
branch, we can pull that into the currentproduction_secret
branch usinggit pull origin master
.
Here we briefly summarize the various standard operation procedures (SOP) to be documented here:
- production server update SOP: roll out updated spnet code by pulling new code to the production server and restarting the web server process.
- indexing restart SOP: restart the Google+ polling process (e.g. to use updated indexing code).
- mongoDB backup and restore SOP
- cold start SOP: starting spnet services from scratch, e.g. under the extreme scenario where the shared hosting server was rebooted and spnet services did not auto-start.
We manage all aspects of the server code, configuration and keys using Git. This means that we enjoy the full power of Git for deploying updates (pull the desired changes into the current production version), rescuing us from bad updates (just checkout the previous "good" snapshot), etc. First, go to the main spnet code directory in the production repository:
$ cd ~spnet/webapps/spnet/spnet/spnet
We first want to make absolutely sure that the production repository is in the expected clean state:
-
on production_secret branch;
-
no file modifications vs.
HEAD
$ git status -uno
To be absolutely sure of being able to revert to the current state (if the update goes wrong for any reason), run
$ git log
and write down the current commit ID.
To ensure that you can quickly connect to whatever server process you need to, you should also list the current server processes and screen sessions:
$ ps x
$ screen -ls
In particular, note the process IDs of the following processes:
-
python watchmem.py
: this is an actual spnet web server process, monitoring its own memory usage (it automatically exits if its usage goes over 150 MB). -
python keeprunning.py
: this simply restarts a new web server process (watchmem.py
) whenever it exits. -
mongod
: the mongo database server -
python poll.py
: the Google+ indexing poll (updates index every 5 minutes). This is a non-critical process.
We stage code updates for production deployment on the GitHub master
branch. A typical update simply pulls the latest updates from that branch to our running production_secret
branch. Our web server runs in auto_reload=False
mode, so we can safely change the code files without affecting the running web server process. Once we are happy with the updated state of the code files, we simply force the web server process to restart (which will make it load our code updates).
To pull the latest master
updates from the GitHub cjlee112/spnet.git repository (origin
) into our running production_secret
branch:
$ git pull origin master
If anything goes wrong, you can always revert the current production_secret
branch back to the previous "good" snapshot:
$ git reset --hard COMMIT_ID
where COMMIT_ID is the ID you obtained from git log
above.
Ordinarily, all you have to do is kill the current watchmem.py
process, which will force the keeprunning.py
process to restart it (using the latest code):
$ kill PROCESS_ID
where PROCESS_ID is the watchmem.py
process ID you noted above. Now check that you can access the selectedpapers.net site from any web browser, and verify that the new features / fixes are running as expected. Note that you do not ordinarily need to do anything to restart the Google+ indexing, since the polling code will automatically start using the updated code the next time it runs the indexing process (every 5 minutes).
-
if the updated server does not appear to be "ready for prime time" (not "production quality"), simply revert the code to the previous snapshot and kill the
watchmem.py
to restart with that code:$ git reset --hard COMMIT_ID $ ps x $ kill PROCESS_ID
where COMMIT_ID is the ID you obtained from
git log
above, and PROCESS_ID is thewatchmem.py
process ID fromps x
. -
if the web server process fails to come up, quickly connect to the screen session running the
keeprunning.py
process:$ screen -r SCREEN_ID
where SCREEN_ID is one of the screen sessions listed by
screen -ls
. If you see error messages indicating that the current code is broken (e.g. crashes on startup), revert to the previous snapshot as described above. If for some reasonkeeprunning.py
has exited, simply restart it in the same screen session. When in doubt, revert to the previous snapshot. -
If you see messages that indicate a failure to connect to MongoDB, check the status of the
mongod
server in another window, by connecting to its screen session (screen -r SCREEN_ID
). Ifmongod
has exited or seems to be in a bad state, restart it (all you should have to do is hit up-arrow key to recall the previous command in that shell). Ifmongod
seems fine or the spnet code continues to fail to connect to mongodb after amongod
restart, changes are the spnet code or config is broken, so revert to the previous good snapshot.
If the documentation has changed, you should upload the new HTML docs (generated by sphinx) as follows (this requires that you have Sphinx installed), on your development platform:
$ cd spnet/doc
$ make html
$ cd _build/html
$ tar cf ../../docs.tar *
$ cd ../..
$ scp docs.tar [email protected]:
$ ssh [email protected]
$ cd webapps/htdocs
$ tar xf ~/docs.tar
Use your web browser to check that the docs on http://docs.selectedpapers.net reflect the latest updates.
We need to start two critical processes: the mongod
server, and the spnet web server.
Start it in a screen session as follows:
$ cd ~spnet/webapps/mongodb
$ screen
$ mongodb-linux-x86_64-2.4.1/bin/mongod --config ~spnet/webapps/spnet/spnet/mongodb/mongodb.config
You should see the server process start successfully. Type Control-A, D to detach from the screen session.
Again we start it in a screen session:
$ cd ~spnet/webapps/spnet/spnet/spnet
$ screen
$ source ~spnet/_ve1/bin/activate
$ python keeprunning.py
You should see the server process start successfully. At this point, you should be able to access the selectedpapers.net website from any web browser. Type Control-A, D to detach from the screen session.
This is a non-critical process, but needed for updating from Google+. Again we start it in a screen session:
$ cd ~spnet/webapps/spnet/spnet/spnet
$ screen
$ source ~spnet/_ve1/bin/activate
$ python poll.py
You should see the server process start successfully. Type Control-A, D to detach from the screen session.
The basic backup procedure, from webapps/mongodb directory:
mongodb-linux-x86_64-2.4.1/bin/mongodump --port 26966 --username backup --db spnet --password
tar cvzf dump.tgz dump
I then scp'd the dump.tgz file (18 MB) to a safe location.
Basic restore procedure: from webapps/mongodb directory, we first extract the dump.tgz file to create the dump directory that mongorestore will read, and we use its --drop
option to make it drop any existing collection before restoring data from that collection:
tar xzvf dump.tgz
mongodb-linux-x86_64-2.4.1/bin/mongorestore --port 26966 --drop --username admin --password
If you're restoring to a mongod instance that uses no authentication, you can leave off the username and password options.
If you're starting with an empty database you'll need to ensure that the admin user can write to any database; see the next step.
Note this first required creating backup user in the spnet db:
mongodb-linux-x86_64-2.4.1/bin/mongo --port 26966 admin
> db.auth('admin', PASSWORD)
> db.system.users.update({ user: "admin" }, { $set: { roles: ["userAdminAnyDatabase", "readWriteAnyDatabase"] } } )
> use spnet
> db.addUser( { user: "backup", roles: ["readWrite", "dbAdmin", "userAdmin"], pwd: PASSWORD } )