Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preparing for v2.9.0 #304

Merged
merged 86 commits into from
Jul 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
60aba88
Modify summarize_metrics to return a table of data rather than printi…
ksuderman Dec 11, 2023
b27956a
Add --markdown option to summarize commands
ksuderman Dec 11, 2023
228ac9f
Bump to dev version
ksuderman Dec 12, 2023
94f23e0
Finish summarize --markdown implementations
ksuderman Dec 12, 2023
837b60b
Bump dev version
ksuderman Dec 12, 2023
7bbafe4
Add units to markdown table header row. Better formatting for floats
ksuderman Dec 12, 2023
bfe272b
Better column formatting for markdown tables
ksuderman Dec 12, 2023
3ce1924
Merge pull request #243 from galaxyproject/242-results-to-markdown
ksuderman Dec 12, 2023
61b1b29
Change order that benchmarks are executed during an experiment.
ksuderman Dec 13, 2023
9d0eb6d
Add --timeout option to job.wait
ksuderman Dec 13, 2023
dfb2d50
Use --run-number to specify starting int when numbering benchmark runs
ksuderman Dec 13, 2023
e9d0370
Handle case with --run-number is not specified
ksuderman Dec 13, 2023
01f624c
Improve mardown for experiment summarize
ksuderman Dec 13, 2023
c371f14
Add --sort-by option to experiment.summarize
ksuderman Dec 13, 2023
061805d
Merge pull request #245 from galaxyproject/244-benchmark-run-order
ksuderman Dec 13, 2023
b9a62b6
Merge pull request #246 from galaxyproject/210-wait-timeout
ksuderman Dec 13, 2023
7487c89
Merge branch 'dev' into 169-run-number
ksuderman Dec 13, 2023
090c70b
Merge pull request #247 from galaxyproject/169-run-number
ksuderman Dec 13, 2023
a0b4ec7
Merge pull request #249 from galaxyproject/248-sort-by
ksuderman Dec 13, 2023
ac90880
Dev 2 version
ksuderman Dec 13, 2023
c3a5678
Fix exception generating markdown if cell is empty
ksuderman Dec 13, 2023
dfec3af
Merge pull request #255 from galaxyproject/254-markdown-exceptions
ksuderman Dec 13, 2023
1572f11
Version dev.3
ksuderman Dec 13, 2023
e8940b8
Dev version 4
ksuderman Dec 13, 2023
c0086e7
Add --sort-by to all summarize commands
ksuderman Dec 14, 2023
528bd44
Merge pull request #256 from galaxyproject/251-summarize-sort
ksuderman Dec 14, 2023
a253ed1
Fix header for markdown tables
ksuderman Dec 14, 2023
1a686b4
Merge pull request #260 from galaxyproject/258-markdown-table-header
ksuderman Dec 14, 2023
3d627d7
Add invocation.show
ksuderman Dec 15, 2023
0dfb4c1
Merge pull request #262 from galaxyproject/259-invocation-show
ksuderman Dec 15, 2023
dac22bc
Adde help goal to the Makefile
ksuderman Dec 15, 2023
24e327a
Improve history name lookup
ksuderman Dec 15, 2023
7b1a3c9
Merge pull request #265 from galaxyproject/257-history-name
ksuderman Dec 15, 2023
4958a7a
Round up memory and runtime values if they would display zeroes
ksuderman Dec 15, 2023
b064013
Merge pull request #266 from galaxyproject/264-round-values
ksuderman Dec 15, 2023
d0c526b
Limit the number of attempts a job will be restarted.
ksuderman Dec 15, 2023
81bd807
Merge pull request #267 from galaxyproject/261-job-restarts
ksuderman Dec 15, 2023
32077fa
Add document to menu.yml for the --sort-by option
ksuderman Dec 15, 2023
7c2ad51
Add documentation for experiment.summarize --markdown
ksuderman Dec 15, 2023
1f09521
Merge pull request #268 from galaxyproject/263-update-menu
ksuderman Dec 15, 2023
211bab9
Pass env to all helm invocations
ksuderman Dec 18, 2023
3f6890a
Add try_for method to retry api calls
ksuderman Dec 18, 2023
0d00e65
Fix bug handling start at value
ksuderman Dec 18, 2023
382bfb3
Retry invoking and waiting for invocations
ksuderman Dec 18, 2023
ab3ec5c
Fix run numbering range.
ksuderman Dec 20, 2023
fe6567d
Retry getting jobs list
ksuderman Dec 20, 2023
d078c0b
Use argparse in dataset.list
ksuderman Jan 19, 2024
04763b4
Don't add one to the run number when generating the history name.
ksuderman Jan 19, 2024
76d15dc
Merge pull request #271 from galaxyproject/270-run-number
ksuderman Jan 19, 2024
1fc5c04
Bump version
ksuderman Jan 19, 2024
76abe10
Merge pull request #273 from galaxyproject/269-dataset-list
ksuderman Jan 20, 2024
9e62c70
Added code documentation.
ksuderman Jan 31, 2024
6c07eb7
Print the starting run number after its value has been checked.
ksuderman Jan 31, 2024
2b6bfac
Merge pull request #276 from galaxyproject/275-run-number
ksuderman Jan 31, 2024
29f5436
More code documentation.
ksuderman Jan 31, 2024
144ebaa
Rename list methods to prevent name collisions with the list type
ksuderman Mar 29, 2024
abfe344
Merge pull request #277 from galaxyproject/272-list-methods
ksuderman Mar 29, 2024
f3011b2
Merge branch 'dev' into documentation
ksuderman Mar 29, 2024
111fa50
Merge pull request #278 from galaxyproject/documentation
ksuderman Mar 29, 2024
6efb9ce
Allow the Galaxy master API key to defined in the profile.
ksuderman May 22, 2024
5a85ae1
Add configuration as a command alias
ksuderman May 22, 2024
95008f9
Merge pull request #281 from galaxyproject/279-config-command
ksuderman May 22, 2024
1b4eeb5
Update year in copyright notices
ksuderman May 22, 2024
2326217
Merge pull request #282 from galaxyproject/280-copyright
ksuderman May 22, 2024
0533f8c
Update samples and cleanup unused
nuwang May 24, 2024
dca5e0c
Merge pull request #284 from galaxyproject/update_samples
ksuderman May 24, 2024
40792ca
Add --no-tools flag for workflow import and upload
ksuderman May 28, 2024
83a53f2
Merge pull request #290 from galaxyproject/85-master-api-key
ksuderman May 28, 2024
1770635
Merge pull request #289 from galaxyproject/288-no-tools-installed
ksuderman May 28, 2024
88e13b8
Bump dev version
ksuderman May 29, 2024
2ad1a1e
Update bump script
ksuderman May 29, 2024
676572c
Try to fix problems resolving dataset collection IDs
ksuderman May 29, 2024
9dca47c
Merge pull request #292 from galaxyproject/debugging
ksuderman May 29, 2024
fd36cd3
Bump build number
ksuderman May 29, 2024
6bc3439
Search for a local .abm directory before using the global directory
ksuderman Jun 4, 2024
5e9ab17
Allow multiple datasets to be imported at once.
ksuderman Jun 4, 2024
715098a
Use local .abm directory for configurations if it exists.
ksuderman Jun 5, 2024
774f956
Use gi.jobs.get_metrics(job_id) to get the metrics for a job before w…
ksuderman Jun 6, 2024
52424a9
Merge pull request #299 from galaxyproject/298-replace-show-job
ksuderman Jun 6, 2024
b2c71a9
Merge pull request #296 from galaxyproject/295-local-configuration
ksuderman Jun 6, 2024
1a86f2f
Allow --history to be specified by name
ksuderman Jul 13, 2024
6de42f9
Document --name option when uploading/downloading datasets
ksuderman Jul 13, 2024
71f9aea
Update requirements
ksuderman Jul 13, 2024
2b3e2be
Merge pull request #303 from galaxyproject/302-dataset-import-history
ksuderman Jul 13, 2024
da7f6f4
Format code with Black and iSort
ksuderman Jul 13, 2024
24b7b5b
Merge branch 'master' into dev
ksuderman Jul 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2021 Galaxy Project
Copyright (c) 2024 Galaxy Project

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
11 changes: 11 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,15 @@
.PHONY: dist
help:
@echo
@echo "GOALS"
@echo " clean - deletes the dist directory and egg-info"
@echo " dist - creates the distribution package (wheel)"
@echo " format - runs Black and isort"
@echo " test-deploy - deploys to test.pypi.org"
@echo " deploy - deploys to pypi.org"
@echo " release - creates a GitHub release package"
@echo

dist:
python3 setup.py sdist bdist_wheel

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ The `kubectl` program is only required when bootstrapping a new Galaxy instance,

### Credentials

You will need an [API key](https://training.galaxyproject.org/training-material/faqs/galaxy/preferences_admin_api_key.html) for every Galaxy instance you would like to intereact with. You will also need the *kubeconfig* file for each Kubernetes cluster. The `abm` script loads the Galaxy server URLs, API keys, and the location of the *kubeconfig* files from a Yaml configuration file that it expects to find in `$HOME/.abm/profile.yml` or `.abm-profile.yml` in the current directory. You can use the `profile-sample.yml` file as a starting point and it includes the URLs for all Galaxy instances we have used to date (December 22, 2021 as of this writing).
You will need an [API key](https://training.galaxyproject.org/training-material/faqs/galaxy/preferences_admin_api_key.html) for every Galaxy instance you would like to intereact with. You will also need the *kubeconfig* file for each Kubernetes cluster. The `abm` script loads the Galaxy server URLs, API keys, and the location of the *kubeconfig* files from a Yaml configuration file that it expects to find in `$HOME/.abm/profile.yml` or `.abm-profile.yml` in the current directory. You can use the `samples/profile.yml` file as a starting point and it includes the URLs for all Galaxy instances we have used to date (December 22, 2021 as of this writing).

:bulb: It is now possible (>=2.0.0) to create Galaxy users and their API keys directly with `abm`.

Expand Down
4 changes: 2 additions & 2 deletions abm/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""
The Automated Benchmarking Tool

Copyright 2023 The Galaxy Project. All rights reserved.
Copyright 2024 The Galaxy Project. All rights reserved.

"""

Expand Down Expand Up @@ -64,7 +64,7 @@ def command_list(commands: list):


def copyright():
print(f" Copyright 2023 The Galaxy Project. All Rights Reserved.\n")
print(f" Copyright 2024 The Galaxy Project. All Rights Reserved.\n")


def print_main_help(menu_data):
Expand Down
23 changes: 21 additions & 2 deletions abm/lib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,16 @@

sys.path.append(os.path.dirname(os.path.realpath(__file__)))

# from common import parse_profile

# Where the workflow invocation data returned by Galaxy will be saved.
INVOCATIONS_DIR = "invocations"
# Where workflow runtime metrics will be saved.
METRICS_DIR = "metrics"

# Global instance of a YAML parser so we can reuse it if needed.
parser = None


# Keys used in various dictionaries.
class Keys:
NAME = 'name'
RUNS = 'runs'
Expand All @@ -22,3 +24,20 @@ class Keys:
COLLECTION = 'collection'
HISTORY_BASE_NAME = 'output_history_base_name'
HISTORY_NAME = 'history_name'


# def get_master_api_key():
# '''
# Get the master API key from the environment or configuration file.
# '''
# if 'GALAXY_MASTER_API_KEY' in os.environ:
# return os.environ['GALAXY_MASTER_API_KEY']
# config_path = os.path.expanduser("~/.abm/config.yml")
# if not os.path.exists(config_path):
# raise RuntimeError(f"ERROR: Configuration file not found: {config_path}")
# with open(config_path, 'r') as f:
# config = yaml.safe_load(f)
# key = config.get('GALAXY_MASTER_API_KEY', None)
# if key == None:
# raise RuntimeError("ERROR: GALAXY_MASTER_API_KEY not found in config.yml")
# return key
129 changes: 101 additions & 28 deletions abm/lib/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,18 @@
from bioblend.galaxy import GalaxyInstance, dataset_collections
from lib import INVOCATIONS_DIR, METRICS_DIR, Keys
from lib.common import (Context, _get_dataset_data, _make_dataset_element,
connect, print_json)
connect, print_json, try_for)
from lib.history import wait_for

log = logging.getLogger('abm')


def run_cli(context: Context, args: list):
"""
Runs a single workflow defined by *args[0]*
Command line handler to run a single benchmark.

:param args: a list that contains:
args[0] - the path to the benchmark configuration file
args[1] - the prefix to use when creating the new history in Galaxy
args[2] - the name of the experiment, if part of one. This is used to
generate output folder names.
:param context: a context object the defines how to connect to the Galaxy server.
:param args: parameters from the command line

:return: True if the workflows completed sucessfully. False otherwise.
"""
Expand All @@ -43,11 +40,15 @@ def run_cli(context: Context, args: list):


def run(context: Context, workflow_path, history_prefix: str, experiment: str):
# if len(args) > 1:
# history_prefix = args[1]
# if len(args) > 2:
# experiment = args[2].replace(' ', '_').lower()
"""
Does the actual work of running a benchmark.

:param context: a context object the defines how to connect to the Galaxy server.
:param workflow_path: path to the ABM workflow file. (benchmark really). NOTE this is NOT the Galaxy .ga file.
:param history_prefix: a prefix value used when generating new history names.
:param experiment: the name of the experiment (arbitrary string). Used to generate new history names.
:return: True if the workflow run completed successfully. False otherwise.
"""
if os.path.exists(INVOCATIONS_DIR):
if not os.path.isdir(INVOCATIONS_DIR):
print('ERROR: Can not save invocation status, directory name in use.')
Expand Down Expand Up @@ -76,7 +77,7 @@ def run(context: Context, workflow_path, history_prefix: str, experiment: str):
workflows = parse_workflow(workflow_path)
if not workflows:
print(f"Unable to load any workflow definitions from {workflow_path}")
return
return False

print(f"Found {len(workflows)} workflow definitions")
for workflow in workflows:
Expand Down Expand Up @@ -144,11 +145,13 @@ def run(context: Context, workflow_path, history_prefix: str, experiment: str):
dsid = find_collection_id(gi, dsname)
dsdata = _get_dataset_data(gi, dsid)
if dsdata is None:
raise Exception(
f"ERROR: unable to resolve {dsname} to a dataset."
)
dsid = dsdata['id']
dssize = dsdata['size']
# raise Exception(
# f"ERROR: unable to resolve {dsname} to a dataset."
# )
dssize = 0
else:
dsid = dsdata['id']
dssize = dsdata['size']
input_data_size.append(dssize)
print(f"Input collection ID: {dsname} [{dsid}] {dssize}")
inputs[input[0]] = {'id': dsid, 'src': 'hdca', 'size': dssize}
Expand All @@ -173,7 +176,7 @@ def run(context: Context, workflow_path, history_prefix: str, experiment: str):
histories = gi.histories.get_histories(name=spec['history'])
if len(histories) == 0:
print(f"ERROR: History {spec['history']} not foune")
return
return False
hid = histories[0]['id']
pairs = 0
paired_list = spec['paired']
Expand All @@ -183,7 +186,13 @@ def run(context: Context, workflow_path, history_prefix: str, experiment: str):
for key in item.keys():
# print(f"Getting dataset for {key} = {item[key]}")
value = _get_dataset_data(gi, item[key])
size += value['size']
if value is None:
print(
f"ERROR: Unable to find dataset {item[key]}"
)
return
if size in value:
size += value['size']
elements.append(
_make_dataset_element(key, value['id'])
)
Expand Down Expand Up @@ -224,16 +233,20 @@ def run(context: Context, workflow_path, history_prefix: str, experiment: str):
else:
raise Exception(f'Invalid input value')
print(f"Running workflow {wfid} in history {new_history_name}")
invocation = gi.workflows.invoke_workflow(
f = lambda: gi.workflows.invoke_workflow(
wfid, inputs=inputs, history_name=new_history_name
)
invocation = try_for(f, 3)
id = invocation['id']
# invocations = gi.invocations.wait_for_invocation(id, 86400, 10, False)
f = lambda: gi.invocations.wait_for_invocation(id, 86400, 10, False)
try:
invocations = gi.invocations.wait_for_invocation(id, 86400, 10, False)
except:
invocations = try_for(f, 2)
except Exception as e:
print(f"Exception waiting for invocations")
pprint(invocation)
sys.exc_info()
raise e
print("Waiting for jobs")
if history_prefix is not None:
parts = history_prefix.split()
Expand Down Expand Up @@ -265,6 +278,14 @@ def run(context: Context, workflow_path, history_prefix: str, experiment: str):


def translate(context: Context, args: list):
"""
Translates the human readable names of datasets and workflows in to the Galaxy
ID that is unique to each server.

:param context: the conext object used to connect to the Galaxy server
:param args: [0] the path to the benchmarking YAML file to translate
:return: Nothing. Prints the translated workflow file to stdout.
"""
if len(args) == 0:
print('ERROR: no workflow configuration specified')
return
Expand Down Expand Up @@ -307,6 +328,14 @@ def translate(context: Context, args: list):


def validate(context: Context, args: list):
"""
Checks to see if the workflow and all datasets defined in the benchmark can
be found on the server.

:param context: the context object used to connect to the Galaxy instance
:param args: [0] the benchmark YAML file to be validated.
:return:
"""
if len(args) == 0:
print('ERROR: no workflow configuration specified')
return
Expand Down Expand Up @@ -412,10 +441,10 @@ def validate(context: Context, args: list):


def wait_for_jobs(context, gi: GalaxyInstance, invocations: dict):
"""Blocks until all jobs defined in the *invocations* to complete.
"""Blocks until all jobs defined in *invocations* are complete (in a terminal state).

:param gi: The *GalaxyInstance** running the jobs
:param invocations:
:param invocations: a dictionary containing information about the jobs invoked
:return:
"""
wfid = invocations['workflow_id']
Expand All @@ -429,6 +458,7 @@ def wait_for_jobs(context, gi: GalaxyInstance, invocations: dict):
jobs = gi.jobs.get_jobs(history_id=hid)
for job in jobs:
data = gi.jobs.show_job(job['id'], full_details=True)
data['job_metrics'] = gi.jobs.get_job_metrics(job['id'])
metrics = {
'run': run,
'cloud': cloud,
Expand Down Expand Up @@ -485,6 +515,11 @@ def wait_for_jobs(context, gi: GalaxyInstance, invocations: dict):


def parse_workflow(workflow_path: str):
"""
Loads the benchmark YAML file.
:param workflow_path: the path to the file to be loaded.
:return: a dictionary containing the benchmark.
"""
if not os.path.exists(workflow_path):
print(f'ERROR: could not find workflow file {workflow_path}')
return None
Expand All @@ -503,6 +538,14 @@ def parse_workflow(workflow_path: str):


def find_workflow_id(gi, name_or_id):
"""
Resolves the human-readable name for a workflow into the unique ID on the
Galaxy instance.

:param gi: the connection object to the Galaxy instance
:param name_or_id: the name of the workflow
:return: The Galaxy workflow ID or None if the workflow could not be located
"""
try:
wf = gi.workflows.show_workflow(name_or_id)
return wf['id']
Expand All @@ -519,7 +562,14 @@ def find_workflow_id(gi, name_or_id):


def find_dataset_id(gi, name_or_id):
# print(f"Finding dataset {name_or_id}")
"""
Resolves the human-readable name if a dataset into the unique ID on the
Galaxy instance

:param gi: the connection object to the Galaxy instance
:param name_or_id: the name of the dataset.
:return: the Galaxy dataset ID or None if the dataset could not be located.
"""
try:
ds = gi.datasets.show_dataset(name_or_id)
return ds['id']
Expand All @@ -544,6 +594,14 @@ def find_dataset_id(gi, name_or_id):


def find_collection_id(gi, name):
"""
Resolves a human-readable collection name into the unique Galaxy ID.

:param gi: the connection object to the Galaxy instance
:param name: the name of the collection to resolve
:return: The unique Galaxy ID of the collection or None if the collection
can not be located.
"""
kwargs = {'limit': 10000, 'offset': 0}
datasets = gi.datasets.get_datasets(**kwargs)
if len(datasets) == 0:
Expand All @@ -565,7 +623,22 @@ def find_collection_id(gi, name):


def test(context: Context, args: list):
id = 'c90fffcf98b31cd3'
"""
Allows running testing code from the command line.

:param context: a connection object to a Galaxy instance
:param args: varies
:return: varies, typically None.
"""
# id = 'c90fffcf98b31cd3'
# gi = connect(context)
# inputs = gi.workflows.get_workflow_inputs(id, 'PE fastq input')
# pprint(inputs)

gi = connect(context)
inputs = gi.workflows.get_workflow_inputs(id, 'PE fastq input')
pprint(inputs)
print("Calling find_collection_id")
dsid = find_collection_id(gi, args[0])
print(f"Collection ID: {dsid}")
print("Calling _get_dataset_data")
dsdata = _get_dataset_data(gi, dsid)
pprint(dsdata)
4 changes: 3 additions & 1 deletion abm/lib/cloudlaunch.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
from cloudlaunch_cli.main import create_api_client
from common import Context

# DEPRECATED - Cloudlaunch is no longer used to manage Galaxy clusters.

BOLD = '\033[1m'
CLEAR = '\033[0m'

Expand Down Expand Up @@ -40,7 +42,7 @@ def h1(text):
'''


def list(context: Context, args: list):
def do_list(context: Context, args: list):
archived = False
filter = None
status = lambda t: t.instance_status if t.instance_status else t.status
Expand Down
Loading