Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlight lack of ongoing support on README #197

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,17 @@
# ClusterManagers

Support for different job queue systems commonly used on compute clusters.
> [!WARNING]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to grab a repostatus.org badge. "Unsupported" or "Suspended" seem like likely fits?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we could have it at a finer level, like "Unsupported" on the different individual cluster managers? Since it seems like from the discourse thread that there are certain managers that work well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw I use and try to help maintaining the lsf manager.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it sounds like lsf is known to work well. Any others? We could just have a column with whether a method works or not

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condor works with some common filesystem mounted assumptions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be worth it to also have a clause explaining that most cluster managers are not expected to be guaranteed to work out of the box. I need to supply quite a few extra program flags to the LSF manger and I also need a separate hack to find an open port for the workers for it to work on my system. I don't consider it a bug or deficit that ClusterManagers can't find a set of working arguments for me automatically.

For the maintenance status question:
Would it be useful if the level of support is indicated through a column with maintainer names (with something like "maintainer needed" indicates that it is not well maintained)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrChainsaw I think both of those ideas are really good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this stand - did you want to add that stuff to the table in the readme?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Sorry I didn’t adjust the pr yet; just busy with teaching

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good! There's no real urgency, just want to make sure I'm not the one holding it up 😉

> This package is looking for a maintainer. Most users doing serious distributed calculations
> should use [MPI.jl](https://github.com/JuliaParallel/MPI.jl) instead.

Experiment to support different job queue systems commonly used on compute clusters with Distributed.jl.

## Available job queue systems

The below table summarizes the job queue systems with implementations.
However, several of them are known to not work with recent cluster management versions,
so use them with caution.

## Currently supported job queue systems

| Job queue system | Command to add processors |
| ---------------- | ------------------------- |
Expand Down
Loading