Scale jobs servers #1075

sanfordd · 2018-05-30T18:19:08Z

We currently have one jobs server running 5 resque workers (resque-pool).

Some jobs still run on the app server (mostly the stock sufia jobs), while some run on the jobs server (most of our local/customized ones).

Ideally, it would make sense for all jobs to be running on jobs server(s), off the app server. A) Because jobs server can, at the moment, be more easily scaled (main thing preventing scaling of app server is the file system for browser-uploaded files. B) Because it makes sense to keep heavy jobs work (which is async, and tends to be more accomodating of being slowed down) from slowing down the app server (which needs to return quick synchronous responses to browsers).

We want to be able to easily scale the number of jobs workers. A) So we can possibly move more workers off app to jobs, B) So we can make ingests go faster, deciding how much jobs resources we want to pay for, for what speed of ingest.

You can scale the number of jobs workers by:

a) Adding more workers to the existing jobs server. This may likely require more RAM though, and they may end up fighting for disk IO bandwidth.

b) Adding more jobs servers. This will require some changes to our ansible deploy, and possibly tweaks to capistrano too. Once properly deployed though, none of the rest of our stack needs to know there are more than one jobs server. The workers, running on one or more jobs servers, contact resque to take jobs off the queue, nobody needs to know how many servers there are.

Either of these could be fine. We want to set things up to be able to scale jobs workers whenever we want, without having to rewrite code (including ansible/capisrano) to do so.

Once we have this, we will want to explore trying to move some of the stock sufia jobs off app server, to jobs server. (Will need to be carefu about their interrationships and dependencies, it is tangled).

The text was updated successfully, but these errors were encountered:

sanfordd · 2018-05-30T18:28:52Z

So a quick set of systems concerns to track

The servers need to have different names (this is trivial)
Do they need shared disk space? no
Do they need to talk to each other outside of their shared PG/Redis database connections? no
Check current Postgres connection limits
Consider moving postgres off to another box
@jrochkind The second and third one I'll need your opinion on.

sanfordd · 2018-05-30T18:35:43Z

Since those two are a no, I'm going to check but adding any number of jobs servers is likely to be fairly trivial.

sanfordd · 2018-05-30T18:44:51Z

Current discussion has made us decide a larger server is both cost effective/neutral to running multiple servers and simpler to implement.

jrochkind · 2018-05-30T22:36:03Z

Additionally, I've discovered there maybe OTHER ways that multiple jobs servers break the sufia/samvera stack, that I hadn't previously been aware of, and would require more work to fix sufia. alas.

Increasing size of jobs server is definitely our best bet for now, I think, if it seems feasible and affordable.

@sanfordd

sanfordd modified the milestones: post-launch, Backlog May 30, 2018

sanfordd added the Systems label May 30, 2018

jrochkind mentioned this issue Jun 5, 2018

move more ingest and other jobs to jobs server #1081

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale jobs servers #1075

Scale jobs servers #1075

sanfordd commented May 30, 2018 •

edited by jrochkind

Loading

sanfordd commented May 30, 2018 •

edited

Loading

sanfordd commented May 30, 2018

sanfordd commented May 30, 2018

jrochkind commented May 30, 2018 •

edited

Loading

Scale jobs servers #1075

Scale jobs servers #1075

Comments

sanfordd commented May 30, 2018 • edited by jrochkind Loading

sanfordd commented May 30, 2018 • edited Loading

sanfordd commented May 30, 2018

sanfordd commented May 30, 2018

jrochkind commented May 30, 2018 • edited Loading

sanfordd commented May 30, 2018 •

edited by jrochkind

Loading

sanfordd commented May 30, 2018 •

edited

Loading

jrochkind commented May 30, 2018 •

edited

Loading