Skip to content
This repository has been archived by the owner on Nov 26, 2019. It is now read-only.

Scale jobs servers #1075

Open
sanfordd opened this issue May 30, 2018 · 4 comments
Open

Scale jobs servers #1075

sanfordd opened this issue May 30, 2018 · 4 comments
Labels
Milestone

Comments

@sanfordd
Copy link

sanfordd commented May 30, 2018

We currently have one jobs server running 5 resque workers (resque-pool).

Some jobs still run on the app server (mostly the stock sufia jobs), while some run on the jobs server (most of our local/customized ones).

Ideally, it would make sense for all jobs to be running on jobs server(s), off the app server. A) Because jobs server can, at the moment, be more easily scaled (main thing preventing scaling of app server is the file system for browser-uploaded files. B) Because it makes sense to keep heavy jobs work (which is async, and tends to be more accomodating of being slowed down) from slowing down the app server (which needs to return quick synchronous responses to browsers).

We want to be able to easily scale the number of jobs workers. A) So we can possibly move more workers off app to jobs, B) So we can make ingests go faster, deciding how much jobs resources we want to pay for, for what speed of ingest.

You can scale the number of jobs workers by:

a) Adding more workers to the existing jobs server. This may likely require more RAM though, and they may end up fighting for disk IO bandwidth.

b) Adding more jobs servers. This will require some changes to our ansible deploy, and possibly tweaks to capistrano too. Once properly deployed though, none of the rest of our stack needs to know there are more than one jobs server. The workers, running on one or more jobs servers, contact resque to take jobs off the queue, nobody needs to know how many servers there are.

Either of these could be fine. We want to set things up to be able to scale jobs workers whenever we want, without having to rewrite code (including ansible/capisrano) to do so.

Once we have this, we will want to explore trying to move some of the stock sufia jobs off app server, to jobs server. (Will need to be carefu about their interrationships and dependencies, it is tangled).

@sanfordd sanfordd modified the milestones: post-launch, Backlog May 30, 2018
@sanfordd
Copy link
Author

sanfordd commented May 30, 2018

So a quick set of systems concerns to track

  • The servers need to have different names (this is trivial)
  • Do they need shared disk space? no
  • Do they need to talk to each other outside of their shared PG/Redis database connections? no
  • Check current Postgres connection limits
  • Consider moving postgres off to another box
    @jrochkind The second and third one I'll need your opinion on.

@sanfordd
Copy link
Author

Since those two are a no, I'm going to check but adding any number of jobs servers is likely to be fairly trivial.

@sanfordd
Copy link
Author

Current discussion has made us decide a larger server is both cost effective/neutral to running multiple servers and simpler to implement.

@jrochkind
Copy link
Contributor

jrochkind commented May 30, 2018

Additionally, I've discovered there maybe OTHER ways that multiple jobs servers break the sufia/samvera stack, that I hadn't previously been aware of, and would require more work to fix sufia. alas.

Increasing size of jobs server is definitely our best bet for now, I think, if it seems feasible and affordable.

@sanfordd

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants