Skip to content

Commit

Permalink
Merge pull request #1194 from phac-nml/object-store
Browse files Browse the repository at this point in the history
Object store
  • Loading branch information
ericenns authored Feb 3, 2023
2 parents 3c22445 + 4e23959 commit 36f67dd
Show file tree
Hide file tree
Showing 136 changed files with 4,155 additions and 1,320 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ jobs:
"galaxy_testing",
"galaxy_pipeline_testing",
"open_api_testing",
"file_system_testing",
]

steps:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,6 @@ jobs:
with:
github_token: ${{ secrets.github_token }}
reporter: github-pr-review
fail_error: true
fail_on_error: true
level: error
checkstyle_config: './checkstyle.xml'
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
* [Developer]: Replaced Apache OLTU with Nimbusds for performing OAuth2 authentication flow during syncing and Galaxy exporting. See [PR 1432](https://github.com/phac-nml/irida/pull/1432)
* [Developer/UI]: Performance enhancements to the metadata uploader. See [PR 1445](https://github.com/phac-nml/irida/pull/1445).
* [Developer/UI]: Fix for updating sample modified date when metadata is deleted. See [PR 1457](https://github.com/phac-nml/irida/pull/1457).
* [Developer]: Added support for cloud based storage. Currently, Microsoft Azure Blob and Amazon AWS S3 are supported. [See PR 1194](https://github.com/phac-nml/irida/pull/1194)

## [22.09.7] - 2022/01/24
* [UI]: Fixed bugs on NCBI Export page preventing the NCBI `submission.xml` file from being properly written. See [PR 1451](https://github.com/phac-nml/irida/pull/1451)
Expand Down
15 changes: 15 additions & 0 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,17 @@ dependencies {
exclude(group = "jakarta.validation", module = "jakarta.validation-api")
}

// Microsoft Azure
implementation("com.azure:azure-storage-blob:12.18.0") {
exclude(group = "jakarta.xml.bind", module = "jakarta.xml.bind-api")
exclude(group = "jakarta.activation", module = "jakarta.activation-api")
}

// Amazon AWS
implementation("com.amazonaws:aws-java-sdk-s3:1.12.326") {
exclude(group = "commons-logging", module = "commons-logging")
}

// Customized fastqc
implementation(files("${projectDir}/lib/jbzip2-0.9.jar"))
implementation(files("${projectDir}/lib/sam-1.103.jar"))
Expand Down Expand Up @@ -401,6 +412,10 @@ val integrationTestsMap = mapOf(
"tags" to "IntegrationTest & Galaxy & Pipeline",
"excludeListeners" to "ca.corefacility.bioinformatics.irida.junit5.listeners.*"
),
"fileSystem" to mapOf(
"tags" to "IntegrationTest & FileSystem",
"excludeListeners" to "ca.corefacility.bioinformatics.irida.junit5.listeners.*"
),
)

integrationTestsMap.forEach {
Expand Down
22 changes: 22 additions & 0 deletions doc/administrator/galaxy/cleanup/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,25 @@ Once this script is installed, it can be scheduled to run periodically by adding
This will clean up any **deleted** files every day at 2:00 am. Log files will be stored in `galaxy/galaxy_cleanup.log` and `galaxy/cleanup_datasets/*.log`.

For more information please see the [Purging Histories and Datasets](https://galaxyproject.org/admin/config/performance/purge-histories-and-datasets/) document. ***Note: the metadata about each analysis will still be stored and available in Galaxy, but the data file contents will be permanently removed.***

# Cleaning up temporary files

When using Galaxy with an IRIDA instance which is using cloud based storage (Azure, AWS, etc) for example, files are uploaded from IRIDA instead of linking to them since the files are stored in the cloud and not on a shared filesystem. Since these files are uploaded to Galaxy it is a good idea to clean these files up. An example script that can be used to clean these files up is provided below:

```bash
#!/bin/bash

GALAXY_ROOT_DIR=/path/to/galaxy-dist
CLEANUP_LOG=$GALAXY_ROOT_DIR/irida_galaxy_tmp_files_cleanup.log
TMP_FILES_DIR=$GALAXY_ROOT_DIR/databases/tmp/
NUMBER_OF_DAYS_OLD=30

source $CONDA_ROOT/bin/activate galaxy

echo -e "\nBegin temporary file cleanup at `date`" >> $CLEANUP_LOG
find $TMP_FILES_DIR -mindepth 1 -mtime +$NUMBER_OF_DAYS_OLD -delete

echo -e "\nEnd temporary file cleanup at `date`" >> $CLEANUP_LOG
```

This can be added as a cleanup script which can be scheduled to run using cron.
4 changes: 2 additions & 2 deletions doc/administrator/galaxy/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,9 @@ The overall architecture of IRIDA and Galaxy is as follows:

![irida-galaxy.jpg][]

1. IRIDA manages all input files for a workflow. This includes sequencing reads, reference files, and the Galaxy workflow definition file. On execution of a workflow, references to these files are sent to a Galaxy instance using the [Galaxy API][]. It is assumed that these files exist on a file system shared between IRIDA and Galaxy.
1. IRIDA manages all input files for a workflow. This includes sequencing reads, reference files, and the Galaxy workflow definition file. On execution of a workflow, if using cloud based storage the files are uploaded to a Galaxy instance, otherwise references to these files are sent to a Galaxy instance, using the [Galaxy API][]. If using IRIDA with cloud based storage (Azure, AWS, etc) the files will be downloaded to the IRIDA server, then uploaded to Galaxy, otherwise it is assumed that these files exist on a file system shared between IRIDA and Galaxy.
2. All tools used by a workflow are assumed to have been installed in Galaxy during the setup of IRIDA. The Galaxy workflow is uploaded to Galaxy and the necessary tools are executed by Galaxy. Galaxy can be setup to either execute tools on a local machine, or submit jobs to a cluster.
3. Once the workflow execution is complete, a copy of the results are downloaded into IRIDA and stored in the shared filesystem.
3. Once the workflow execution is complete, a copy of the results are downloaded into IRIDA and stored in the shared filesystem or uploaded to the cloud based storage being used by IRIDA.

[Docker]: https://www.docker.com/
[irida-galaxy.jpg]: images/irida-galaxy.jpg
Expand Down
2 changes: 1 addition & 1 deletion doc/administrator/galaxy/setup/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ This document describes the necessary steps for installing and integrating [Gala
The following must be set up before proceeding with the installation.

1. A machine that has been set up to install Galaxy. This could be the same machine as the IRIDA web interface, or (recommended) a separate machine.
2. A shared filesystem has been set up between IRIDA and Galaxy. If Galaxy will be submitting to a compute cluster this filesystem must also be shared with the cluster.
2. A shared filesystem has been set up between IRIDA and Galaxy if using a local filesystem and not cloud based storage. If Galaxy will be submitting to a compute cluster this filesystem must also be shared with the cluster.

* this comment becomes the table of contents.
{:toc}
Expand Down
4 changes: 3 additions & 1 deletion doc/developer/getting-started/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,9 +199,10 @@ Gradle will download all required dependencies and run the full suite of unit te
##### Integration tests
{:.no_toc}

IRIDA has 5 integration test tasks which splits the integration test suite into functional groups. This allows GitHub Actions to run the tests in parallel, and local test executions to only run the required portion of the test suite. The 5 tasks are the following:
IRIDA has 6 integration test tasks which splits the integration test suite into functional groups. This allows GitHub Actions to run the tests in parallel, and local test executions to only run the required portion of the test suite. The 6 tasks are the following:

* `serviceITest` - Runs the service layer and repository testing.
* `fileSystemITest` - Runs the file system testing
* `uiITest` - Integration tests for IRIDA's web interface.
* `restITest` - Tests IRIDA's REST API.
* `galaxyITest` - Runs tests for IRIDA communicating with Galaxy. This profile will automatically start a test galaxy instance to test with.
Expand All @@ -217,6 +218,7 @@ As the integration tests simulate a running IRIDA installation, in order to run

Where <TEST PROFILE> is one of the following:
* `service_testing` - Runs the `serviceITest` task
* `file_system_testing` - Runs the `fileSystemITest` task
* `ui_testing` - Runs the `uiITest` task
* `rest_testing` - Runs the `restITest` task
* `galaxy_testing` - Runs the `galaxyITest` task
Expand Down
47 changes: 45 additions & 2 deletions doc/developer/setup/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,13 +122,56 @@ docker run hello-world

### Configure Filesystem Locations

IRIDA stores much of its metadata in the relational database, but all sequencing and analysis files are stored on the filesystem. Directory configuration is:
IRIDA stores much of its metadata in the relational database. As of IRIDA 23.01, you can use cloud based storage (BETA) as well as a local filesystem to store the sequencing, reference, and analysis output files. Currently, Azure Blob and AWS S3 storage is supported.

Directory configuration is:

* **Sequencing Data**: `sequence.file.base.directory`
* **Reference Files**: `reference.file.base.directory`
* **Analysis Output**: `output.file.base.directory`

If the directories that are configured do not exist (they don't likely exist if you don't configure them), IRIDA will default to automatically creating a temporary directory using Java's [`Files.createTempDirectory`](http://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#createTempDirectory-java.lang.String-java.nio.file.attribute.FileAttribute...-).
If using a local filesystem and these directories that are configured do not exist (they don't likely exist if you don't configure them), IRIDA will default to automatically creating a temporary directory using Java's [`Files.createTempDirectory`](http://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#createTempDirectory-java.lang.String-java.nio.file.attribute.FileAttribute...-).
However, if you are using cloud based storage you will still need to set these directories in the configuration as these will make up the virtual path to the file, but no local directories will be created.

To setup IRIDA to use cloud based file storage, follow the instructions below for the storage type.

### Setup using Azure Storage Blob

In the configuration file (such as irida.conf) you will need to add these configuration values:

* `irida.storage.type=azure`
* `azure.container.name=CONTAINER_NAME` where the CONTAINER_NAME is a container previously setup on Azure
* `azure.container.url=CONTAINER_ENDPOINT_URL`
* `azure.sas.token=SAS_TOKEN` where the SAS_TOKEN has both read/write permissions

See [Azure Storage Setup](https://learn.microsoft.com/en-us/azure/storage/blobs/) for instructions on how to setup Blob storage.

Microsoft has also made available a storage emulator,`Azurite`, which can be used to develop and test Azure storage functionality on a local machine instead of requiring the use of an Azure Storage account. See [Microsoft Azurite](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite).

For you to be able to use `Azurite` you will need to get the `SAS` token for the blob container. This can be retrieved by setting up [Azure Storage Explorer](https://learn.microsoft.com/en-us/azure/vs-azure-tools-storage-manage-with-storage-explorer) and adding a new resource (`Local Storage Emulator`). Once that is set-up, click on the `Blob Containers menu` item in the explorer and then right click on the container (created by default) `test`, and click Get Shared Access Signature. From the popup window, you can select the date range of validity of the token and permissions (Read, Write, Delete, List, Add, and Create) for the container, and then click create.

Once you have Azurite setup and container created, you can update these configuration values (in irida.conf etc)
* `irida.storage.type=azure`
* `azure.container.name=test`
* `azure.container.url=http://127.0.0.1:10000/devstoreaccount1/test?SAS_TOKEN_RETRIEVED_ABOVE`
* `azure.sas.token=SAS_TOKEN_RETRIEVED_ABOVE` where the SAS_TOKEN has both read/write permissions

If using `Azurite` make sure you have it running before starting up IRIDA.

### Setup using Amazon AWS S3 Bucket Storage

In the configuration file (such as irida.conf) you will need to add these configuration values:

* `irida.storage.type=aws`
* `aws.bucket.name=BUCKET_NAME` where the BUCKET_NAME is the S3 Bucket previously setup and has read/write permissions.
* `aws.bucket.region=BUCKET_REGION`
* `aws.access.key=ACCESS_KEY`
* `aws.secret.key=SECRET_KEY`

See [AWS S3 Bucket Storage Setup](https://docs.aws.amazon.com/AmazonS3/latest/userguide/GetStartedWithS3.html) for instructions on how to setup S3 storage.

There is no other configuration necessary in IRIDA to use cloud based storage. After adding these values to the configuration file you should be able to start up IRIDA, and it will use the cloud based storage that is configured.


### Testing IRIDA

Expand Down
27 changes: 25 additions & 2 deletions run-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ JDBC_URL=jdbc:mysql://$DATABASE_HOST:$DATABASE_PORT/$DATABASE_NAME
TMP_DIRECTORY=`mktemp -d /tmp/irida-test-XXXXXXXX`
chmod 777 $TMP_DIRECTORY # Needs to be world-accessible so that Docker/Galaxy can access

S3MOCK_DOCKER_NAME=irida-docker-s3Mock
AZURITE_DOCKER_NAME=irida-docker-azurite
GALAXY_DOCKER=phacnml/galaxy-irida-20.09:21.05.2-it
GALAXY_DOCKER_NAME=irida-galaxy-test
GALAXY_PORT=48889
Expand Down Expand Up @@ -104,6 +106,18 @@ test_service() {
return $exit_code
}

test_file_system() {
docker run -d -p 9090:9090 -p 9191:9191 --name $S3MOCK_DOCKER_NAME -t adobe/s3mock

docker run -d -p 10000:10000 -p 10001:10001 -p 10002:10002 --name $AZURITE_DOCKER_NAME mcr.microsoft.com/azure-storage/azurite
./gradlew clean check fileSystemITest -Dspring.datasource.url=$JDBC_URL -Dfile.processing.decompress=true -Dirida.it.rootdirectory=$TMP_DIRECTORY -Dspring.datasource.dbcp2.max-wait=$DB_MAX_WAIT_MILLIS $@
exit_code=$?

docker rm -f -v $S3MOCK_DOCKER_NAME
docker rm -f -v $AZURITE_DOCKER_NAME;
return $exit_code
}

test_rest() {
./gradlew clean check restITest -Dspring.datasource.url=$JDBC_URL -Dfile.processing.decompress=true -Dirida.it.rootdirectory=$TMP_DIRECTORY -Dspring.datasource.dbcp2.max-wait=$DB_MAX_WAIT_MILLIS $@
exit_code=$?
Expand Down Expand Up @@ -169,7 +183,7 @@ test_open_api() {
}

test_all() {
for test_profile in test_rest test_service test_ui test_galaxy test_galaxy_pipelines test_open_api;
for test_profile in test_rest test_service test_ui test_galaxy test_galaxy_pipelines test_open_api test_file_system;
do
tmp_dir_cleanup
eval $test_profile
Expand Down Expand Up @@ -198,9 +212,11 @@ then
echo -e "\t--no-kill-docker: Do not kill Galaxy Docker after Galaxy tests have run."
echo -e "\t--no-headless: Do not run chrome in headless mode (for viewing results of UI tests)."
echo -e "\t--selenium-docker: Use selenium/standalone-chrome docker container for executing UI tests."
echo -e "\ttest_type: One of the IRIDA test types {service_testing, ui_testing, rest_testing, galaxy_testing, galaxy_pipeline_testing, open_api_testing, all}."
echo -e "\ttest_type: One of the IRIDA test types {service_testing, ui_testing, rest_testing, galaxy_testing, galaxy_pipeline_testing, open_api_testing, file_system_testing, all}."
echo -e "\t[gradle options]: Additional options to pass to 'gradle'. In particular, can pass '--test ca.corefacility.bioinformatics.irida.fully.qualified.name' to run tests from a particular class.\n"
echo -e "Examples:\n"
echo -e "$0 file_system_testing\n"
echo -e "\tThis will test the File System of IRIDA, cleaning up the test database/docker containers first.\n"
echo -e "$0 service_testing\n"
echo -e "\tThis will test the Service layer of IRIDA, cleaning up the test database/docker containers first.\n"
echo -e "$0 -d irida_integration_test2 galaxy_testing\n"
Expand Down Expand Up @@ -301,6 +317,13 @@ case "$1" in
exit_code=$?
posttest_cleanup
;;
file_system_testing)
shift
pretest_cleanup
test_file_system $@
exit_code=$?
posttest_cleanup
;;
all)
shift
pretest_cleanup
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import ca.corefacility.bioinformatics.irida.pipeline.upload.galaxy.GalaxyWorkflowService;
import ca.corefacility.bioinformatics.irida.plugins.IridaPlugin;
import ca.corefacility.bioinformatics.irida.plugins.IridaPluginException;
import ca.corefacility.bioinformatics.irida.repositories.filesystem.IridaFileStorageUtility;
import ca.corefacility.bioinformatics.irida.repositories.sample.SampleRepository;
import ca.corefacility.bioinformatics.irida.service.AnalysisService;
import ca.corefacility.bioinformatics.irida.service.AnalysisSubmissionService;
Expand Down Expand Up @@ -105,6 +106,10 @@ public class AnalysisExecutionServiceConfig {
@Autowired
private List<AnalysisSampleUpdater> defaultAnalysisSampleUpdaters;

@Autowired
private IridaFileStorageUtility iridaFileStorageUtility;


private List<AnalysisSampleUpdater> loadPluginAnalysisSampleUpdaters() {
List<AnalysisSampleUpdater> pluginUpdaters = Lists.newLinkedList();

Expand Down Expand Up @@ -159,7 +164,7 @@ public AnalysisWorkspaceServiceGalaxy analysisWorkspaceService() {
return new AnalysisWorkspaceServiceGalaxy(galaxyHistoriesService, galaxyWorkflowService,
galaxyLibrariesService, iridaWorkflowsService, analysisCollectionServiceGalaxy(),
analysisProvenanceService(), analysisParameterServiceGalaxy,
sequencingObjectService);
sequencingObjectService, iridaFileStorageUtility);
}

@Lazy
Expand All @@ -171,6 +176,6 @@ public AnalysisProvenanceServiceGalaxy analysisProvenanceService() {
@Lazy
@Bean
public AnalysisCollectionServiceGalaxy analysisCollectionServiceGalaxy() {
return new AnalysisCollectionServiceGalaxy(galaxyHistoriesService);
return new AnalysisCollectionServiceGalaxy(galaxyHistoriesService, iridaFileStorageUtility);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ public class ExecutionManagerConfig {

/**
* Builds a new ExecutionManagerGalaxy from the given properties.
*
*
* @return An ExecutionManagerGalaxy.
* @throws ExecutionManagerConfigurationException If no execution manager is configured.
*/
Expand All @@ -89,7 +89,7 @@ public ExecutionManagerGalaxy executionManager() throws ExecutionManagerConfigur

/**
* Builds a new ExecutionManagerGalaxy given the following environment properties.
*
*
* @param urlProperty The property defining the URL to Galaxy.
* @param apiKeyProperty The property defining the API key to Galaxy.
* @param emailProperty The property defining the account email in Galaxy.
Expand All @@ -111,7 +111,7 @@ private ExecutionManagerGalaxy buildExecutionManager(String urlProperty, String

/**
* Gets and validates a GalaxyAccountEmail from the given property.
*
*
* @param emailProperty The property to find the email address.
* @return A valid GalaxyAccountEmail.
* @throws ExecutionManagerConfigurationException If the properties value was invalid.
Expand All @@ -132,7 +132,7 @@ private GalaxyAccountEmail getGalaxyEmail(String emailProperty) throws Execution

/**
* Gets and validates a Galaxy API key from the given property.
*
*
* @param apiKeyProperty The API key property to get.
* @return A API key for Galaxy.
* @throws ExecutionManagerConfigurationException If the given properties value was invalid.
Expand All @@ -149,7 +149,7 @@ private String getAPIKey(String apiKeyProperty) throws ExecutionManagerConfigura

/**
* Gets and validates the given property for a Galaxy url.
*
*
* @param urlProperty The property with the Galaxy URL.
* @return A valid Galaxy URL.
* @throws ExecutionManagerConfigurationException If the properties value was invalid.
Expand All @@ -170,7 +170,7 @@ private URL getGalaxyURL(String urlProperty) throws ExecutionManagerConfiguratio

/**
* Gets and validates a property with the storage strategy for Galaxy.
*
*
* @param dataStorageProperty The property with the storage strategy for Galaxy.
* @return The corresponding storage strategy object, defaults to DEFAULT_DATA_STORAGE if invalid.
*/
Expand Down
Loading

0 comments on commit 36f67dd

Please sign in to comment.