Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated docs because of meadow deprecations #232

Merged
merged 1 commit into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ eHive is a system for running computation pipelines on distributed computing res

The name comes from the way pipelines are processed by a swarm of autonomous agents.

> [!IMPORTANT]
> As per eHive version 2.7.0, we changed the preferred/default eHive meadow from *LSF* to *SLURM*.
>
> Also, because our pipelines are transitioning to *SLURM*, we have decided to deprecate all the meadows other than *SLURM* and *Local*, and not to support them anymore with immediate effect.
> For details about the schedulers, please see [Grid scheduler and Meadows](#grid-scheduler-and-meadows) below in this README.
>
> Please, do not hesitate to contact us, should this be a problem for you.

Available documentation
-----------------------

Expand Down Expand Up @@ -70,6 +78,9 @@ Grid scheduler and Meadows
--------------------------

eHive has a generic interface named _Meadow_ that describes how to interact with an underlying grid scheduler (submit jobs, query job's status, etc). eHive is compatible with
[SLURM](https://slurm.schedmd.com)

The following schedulers are deprecated and not supported as of eHive 2.7.0.
[IBM Platform LSF](http://www-03.ibm.com/systems/spectrum-computing/products/lsf/),
Sun Grid Engine (now known as Oracle Grid Engine),
[HTCondor](https://research.cs.wisc.edu/htcondor/),
Expand Down Expand Up @@ -98,8 +109,8 @@ docker run -it ensemblorg/ensembl-hive beekeeper.pl -url $URL -loop -sleep 0.2
docker run -it ensemblorg/ensembl-hive runWorker.pl -url $URL
```

Docker Swarm
------------
Docker Swarm (DEPRECATED)
-------------------------

Once packaged into Docker images, a pipeline can actually be run under the
Docker Swarm orchestrator, and thus on any cloud infrastructure that supports
Expand Down
39 changes: 18 additions & 21 deletions docs/contrib/alternative_meadows.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,33 +19,30 @@ LOCAL
configuration found in the ``hive_config.json`` file and (ii) the
*analysis_capacity* and *hive_capacity* mechanisms.

LSF
A meadow that supports `IBM Platform LSF <http://www-03.ibm.com/systems/spectrum-computing/products/lsf/>`__
SLURM
A meadow that supports `SLURM <https://slurm.schedmd.com/>`__
This meadow is extensively used by the Ensembl project and is regularly
updated. It is fully implemented and supports workloads reaching
thousands of parallel jobs.

Other meadows have been contributed to the project, though sometimes not
all the features are implemented. Being developed outside of the main
codebase, they may be at times out of sync with the latest version of
eHive. Nevertheless, several are continuously tested on `Travis CI
<https://travis-ci.org/Ensembl>`__ using single-machine Docker
installations. Refer to the documentation or README in each of those
repositories to know more about their compatibility and support.
Other meadows - now deprecated - contributed to the project in the past,
though sometimes not all the features were implemented. Being developed outside of the main
codebase, they could be at times out of sync with the latest version of
eHive. These meadows are listed below for the records.

LSF (Deprecated)
A meadow that supports `IBM Platform LSF <http://www-03.ibm.com/systems/spectrum-computing/products/lsf/>`__ This meadow was extensively used by the Ensembl project until 2024. It was fully implemented and supported workloads reaching thousands of parallel jobs.

SGE
SGE (Deprecated)
A meadow that supports Sun Grid Engine (now known as Oracle Grid Engine). Available for download on GitHub at `Ensembl/ensembl-hive-sge <https://github.com/Ensembl/ensembl-hive-sge>`__.

HTCondor
HTCondor (Deprecated)
A meadow that supports `HTCondor <https://research.cs.wisc.edu/htcondor/>`__. Available for download on GitHub at `Ensembl/ensembl-hive-htcondor <https://github.com/Ensembl/ensembl-hive-htcondor>`__.

PBSPro
PBSPro (Deprecated)
A meadow that supports `PBS Pro <http://www.pbspro.org>`__. Available for download on GitHub at `Ensembl/ensembl-hive-pbspro <https://github.com/Ensembl/ensembl-hive-pbspro>`__.

SLURM
A meadow that supports `Slurm <https://slurm.schedmd.com/>`__. Available for download on GitHub at `tweep/ensembl-hive-slurm <https://github.com/tweep/ensembl-hive-slurm>`__.

DockerSwarm
DockerSwarm (Deprecated)
A meadow that can control and run on `Docker Swarm <https://docs.docker.com/engine/swarm/>`__.
Available for download on GitHub at
`Ensembl/ensembl-hive-docker-swarm <https://github.com/Ensembl/ensembl-hive-docker-swarm>`__.
Expand All @@ -69,25 +66,25 @@ The table below lists the capabilities of each meadow, and whether they are avai
- Yes
- Partially implemented
- Not available
* - LSF
* - LSF (Deprecated)
- Yes
- Yes
- Yes
- Yes
- Yes
* - SGE
* - SGE (Deprecated)
- Yes
- Yes
- Yes
- Yes
- Not implemented
* - HTCondor
* - HTCondor (Deprecated)
- Yes
- Yes
- Yes
- Yes
- Not implemented
* - PBSPro
* - PBSPro (Deprecated)
- Yes
- Yes
- Yes
Expand All @@ -99,7 +96,7 @@ The table below lists the capabilities of each meadow, and whether they are avai
- Yes
- Yes
- Yes
* - DockerSwarm
* - DockerSwarm (Deprecated)
- Yes
- Yes
- Not implemented
Expand Down
10 changes: 5 additions & 5 deletions docs/creating_pipelines/meadows_and_resources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ order for a Meadow to be available, two conditions must be met:

- The appropriate Meadow driver must be installed and accessible to Perl (e.g. in your $PERL5LIB).

- Meadow drivers for LSF and for the LOCAL Meadow are included with the eHive distribution. Other Meadow drivers are :ref:`available in their own repositories <other-job-schedulers>`.
- Meadow drivers for SLURM and for the LOCAL Meadow are included with the eHive distribution. Other Meadow drivers are :ref:`available in their own repositories <other-job-schedulers>`.

- The Beekeeper must be running on a head node that can submit jobs managed by the corresponding job management engine.

Expand Down Expand Up @@ -57,14 +57,14 @@ The Resource Description is a data structure (in practice written as a
perl hashref) that links Meadows to a job scheduler submission string
for that Meadow. For example, the following data structure defines a
Resource Class with a Resource Class Name '1Gb_job'. This Resource
Class has a Resource Description for running under the LSF scheduler,
and another description for running under the SGE scheduler:
Class has a Resource Description for running under the SLURM scheduler,
and another description for running under the LSF scheduler:

.. code-block:: perl

{
'1Gb_job' => { 'LSF' => '-M 1024 -R"select[mem>1024] rusage[mem=1024ma]"',
'SGE' => '-l h_vmem=1G',
'1Gb_job' => { 'SLURM' => ' --time=1:00:00 --mem=1000m',
'LSF' => '-M 1024 -R"select[mem>1024] rusage[mem=1024ma]"',
},
}

Expand Down
2 changes: 1 addition & 1 deletion docs/creating_pipelines/pipeconfigs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ Resource classes for a pipeline are defined in a PipeConfig's resource_classes m

return {
%{$self->SUPER::resource_classes},
'high_memory' => { 'LSF' => '-C0 -M16000 -R"rusage[mem=16000]"' },
'high_memory' => { 'SLURM' => ' --time=7-00:00:00 --mem=100000m' },
};
}

27 changes: 13 additions & 14 deletions docs/dev/development_guidelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -232,19 +232,15 @@ Internal versioning
eHive has a number of interfaces, that are mostly versioned. You can see
them by running ``beekeeper.pl --versions``::

CodeVersion 2.5
CompatibleHiveDatabaseSchemaVersion 92
CompatibleGuestLanguageCommunicationProtocolVersion 0.3
CodeVersion 2.7.0
CompatibleHiveDatabaseSchemaVersion 96
MeadowInterfaceVersion 5
Meadow::DockerSwarm 5.1 unavailable
Meadow::HTCondor 5.0 unavailable
Meadow::LOCAL 5.0 available
Meadow::LSF 5.2 unavailable
Meadow::PBSPro 5.1 unavailable
Meadow::SGE 4.0 incompatible
GuestLanguageInterfaceVersion 3
GuestLanguage[python3] 3.0 available
GuestLanguage[ruby] N/A unavailable
Meadow::SLURM 5.5 available
GuestLanguageInterfaceVersion 5
GuestLanguage[python3] 5.0 available


* *CodeVersion* is the software version (see how it is handled in the section
below).
Expand All @@ -265,16 +261,16 @@ Releases, code branching and GIT

There are three kinds of branches in eHive:

* ``version/X.Y`` represent released versions of eHive. They are considered
* ``version/X.Y.Z`` represent released versions of eHive. They are considered
*stable*, i.e. are feature-frozen, and only receive bug-fixes. Schema
changes are prohibited as it would break the database versioning
mechanism. Users on a given ``version/X.Y`` branch must be able to
mechanism. Users on a given ``version/X.Y.Z`` branch must be able to
blindly update their checkout without risking breaking anything. It is
forbidden to force push these branches (they are in fact marked as
*protected* on Github).
* ``main`` is the staging branch for the next stable release of eHive. It
receives new features (incl. schema changes) until we decide to create a
new ``version/X.Y`` branch out of it. Like ``version/X.Y``, ``main`` is
new ``version/X.Y.Z`` branch out of it. Like ``version/X.Y.Z``, ``main`` is
*protected* and cannot be force-pushed.
* ``experimental/XXX`` are where *experimental* features are being
developed. These branches can be created, removed or rebased at will. If
Expand All @@ -289,7 +285,10 @@ commits, some bugs have to be fixed differently on different branches. If
that is the case, either fix the merge commit immediately, or do a merge
for the sake of it (``git merge -s ours``) and then add the correct
commits. Forcing merges to happen provides a clearer history and
facilitates tools like ``git bisect``.
facilitates tools like ``git bisect``.

This is however the historical way. Since eHive version 2.7, all the older
versions are deprecated and not maintained anymore.

Experimental branches should be rebased onto main just before the final
merge (which then becomes a **fast-forward**). Together with the above
Expand Down
2 changes: 1 addition & 1 deletion docs/running_pipelines/error-recovery.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The recommended way to stop Workers is to set the analysis_capacity for Analyses

``tweak_pipeline.pl -url sqlite:///my_hive_db -SET analysis[logic_name].analysis_capacity=0``

In some situations it may be necessary to take more drastic action to stop the Workers in a pipeline. In order to do this, you may need to find the underlying processes and kill them using the command appropriate for your scheduler (e.g. ``bkill`` for LSF, ``qdel`` for PBS-like systems) - or using the ``kill`` command if the Worker is running in the LOCAL meadow. It may help to look up the Worker's process IDs in the "worker" table:
In some situations it may be necessary to take more drastic action to stop the Workers in a pipeline. In order to do this, you may need to find the underlying processes and kill them using the command appropriate for your scheduler (e.g. ``bkill`` for LSF, ``scancel`` for SLURM) - or using the ``kill`` command if the Worker is running in the LOCAL meadow. It may help to look up the Worker's process IDs in the "worker" table:

``db_cmd.pl -url sqlite://my_hive_db -sql 'SELECT process_id FROM worker WHERE status in ("JOB_LIFECYCLE", "SUBMITTED")'``

Expand Down
3 changes: 1 addition & 2 deletions docs/running_pipelines/management.rst
Original file line number Diff line number Diff line change
Expand Up @@ -187,8 +187,7 @@ analyses as required to provide the appropriate memory usage steps, e.g.

Relying on MEMLIMIT can be inconvenient at times:

* The mechanism may not be available on all job schedulers (of the ones
eHive support, only LSF has that functionality).
* The mechanism may not be available on all job schedulers.
* When LSF kills the jobs, the open file handles and database connections
are interrupted, potentially leading in corrupted data, and temporary
files hanging around.
Expand Down