Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arq scheduler #20

Merged
merged 26 commits into from
Feb 24, 2023
Merged

Arq scheduler #20

merged 26 commits into from
Feb 24, 2023

Conversation

sverhoeven
Copy link
Member

@sverhoeven sverhoeven commented Jan 16, 2023

Use https://arq-docs.helpmanual.io/ to create a scheduler that uses Redis as a queue and multiple workers (bartender perform cli).

TODO:

  • ArqScheduler
  • Tests
  • Build from config.yaml
  • Document scheduler in README
  • Development documentation: start redis Docker container + start worker
  • bartender perform subcommand for worker
  • [ ] Add redis + workers to docker-compose postponed to Docker compose deployment with arq workers #30

To test:

  1. Create config file with uncommented copy of https://github.com/i-VRESSE/bartender/blob/arq/config-example.yaml#L53-L64 as the first destination
  2. Start Redis container with docker run --detach --publish 6379:6379 redis:7
  3. Start bartender serve
  4. In another shell start bartender mix
  5. Submit a job as described at https://github.com/i-VRESSE/bartender/tree/arq#word-count-example

Fixes #21
Should be merged after #17

@sverhoeven sverhoeven changed the base branch from main to multi-scheduler January 16, 2023 14:15
@sverhoeven sverhoeven marked this pull request as ready for review January 24, 2023 10:54
@sverhoeven sverhoeven requested a review from Peter9192 January 24, 2023 10:57
Base automatically changed from multi-scheduler to main January 27, 2023 12:06
Copy link
Contributor

@Peter9192 Peter9192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well. I still get issues when the default job dir doesn't exist. Also, I have a question about the bartender mix cli and the assignment of one vs multiple workers.

src/bartender/__main__.py Outdated Show resolved Hide resolved
src/bartender/schedulers/arq.py Show resolved Hide resolved
src/bartender/schedulers/arq.py Outdated Show resolved Hide resolved
src/bartender/schedulers/slurm.py Show resolved Hide resolved
if not configs:
raise ValueError("No destination found in config file using arq scheduler")

asyncio.run(run_workers(configs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here. Do you need a separate destination for each worker? And each arqscheduler can have only one worker?

Copy link
Member Author

@sverhoeven sverhoeven Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to hear that,
Each arqscheduler needs at least one worker., but multiple (on different machines) is used to handle more jobs running at the same time.

You can have multiple destinations, for example

destinations:
  quick:
    scheduler:
       type: arq
       redis_dsn: redis://localhost:6379
    filesystem:
       type: local
  small:
    scheduler:
       type: arq
       redis_dsn: redis://localhost:6379
       queue: small
    filesystem:
       type: dirac
  big:
    scheduler:
       type: arq
       redis_dsn: redis://bartender.uu.nl:6379
       queue: big
    filesystem:
       type: sftp
       hostname: headnode.cluster.uu.nl
  • quick, could have worker on same machine as bartender web service. bartender perform --destination quick
  • small, could have workers run on a grid machines
  • big, could have workers on hpc cluster compute nodes.

While checking arq docs I saw it used some bad defaults, so I added max_jobs and job_timeout props to ArqSchedulerConfig.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should such an example be part of the docs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you add more workers by passing the same scheduler address in different destinations? Would be good to add more documentation on the new configuration page in the docs, yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when redis is on internet a strong password should be added and some firewall rules

src/bartender/web/api/job/sync.py Show resolved Hide resolved
src/bartender/web/api/job/views.py Show resolved Hide resolved
tests/schedulers/test_arq.py Outdated Show resolved Hide resolved
tests/schedulers/test_arq.py Show resolved Hide resolved
@Peter9192
Copy link
Contributor

Opened #48 to see what's required to apply google docstring format to this branch as well. There seem to be no conflicts between this branch and #45. After merging #45, the only changes needed in this branch are contained in this commit: 1921954

Copy link
Member Author

@sverhoeven sverhoeven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, you have given me some nice improvements.

src/bartender/schedulers/arq.py Show resolved Hide resolved
src/bartender/__main__.py Outdated Show resolved Hide resolved
if not configs:
raise ValueError("No destination found in config file using arq scheduler")

asyncio.run(run_workers(configs))
Copy link
Member Author

@sverhoeven sverhoeven Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to hear that,
Each arqscheduler needs at least one worker., but multiple (on different machines) is used to handle more jobs running at the same time.

You can have multiple destinations, for example

destinations:
  quick:
    scheduler:
       type: arq
       redis_dsn: redis://localhost:6379
    filesystem:
       type: local
  small:
    scheduler:
       type: arq
       redis_dsn: redis://localhost:6379
       queue: small
    filesystem:
       type: dirac
  big:
    scheduler:
       type: arq
       redis_dsn: redis://bartender.uu.nl:6379
       queue: big
    filesystem:
       type: sftp
       hostname: headnode.cluster.uu.nl
  • quick, could have worker on same machine as bartender web service. bartender perform --destination quick
  • small, could have workers run on a grid machines
  • big, could have workers on hpc cluster compute nodes.

While checking arq docs I saw it used some bad defaults, so I added max_jobs and job_timeout props to ArqSchedulerConfig.

src/bartender/__main__.py Show resolved Hide resolved
src/bartender/schedulers/arq.py Outdated Show resolved Hide resolved
src/bartender/web/api/job/sync.py Show resolved Hide resolved
src/bartender/web/api/job/views.py Show resolved Hide resolved
tests/schedulers/test_arq.py Show resolved Hide resolved
if not configs:
raise ValueError("No destination found in config file using arq scheduler")

asyncio.run(run_workers(configs))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should such an example be part of the docs?

Copy link
Contributor

@Peter9192 Peter9192 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up. It would be nice to add some more explanation to the new configuration docs page in from #49

@sverhoeven sverhoeven merged commit ebaa0f3 into main Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redis scheduler
2 participants