Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete sequence files from file system via cleanup script #1469

Merged
merged 20 commits into from
Mar 9, 2023
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,4 @@ src/main/webapp/node_modules
/src/main/resources/application-local.properties
java_pid*.hprof
gradle.properties
.virtualenv
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
* [Developer]: Deprecated "/api/projects/{projectId}/samples/bySequencerId/{seqeuncerId}" in favour of "/api/projects/{projectId}/samples/bySampleName", which accepts a json property "sampleName"
* [Developer]: Fixed bug in setting a `default_sequencing_object and default_genome_assembly to `NULL` for a sample when the default sequencing object or genome assembly were removed. [See PR 1466](https://github.com/phac-nml/irida/pull/1466)
* [Developer]: Fixed bug preventing a `sample` with an analysis submission from being deleted. [See PR 1467](https://github.com/phac-nml/irida/pull/1467)
* [Developer]: Added script to do initial cleanup of sequence files from file system. [See PR 1469](https://github.com/phac-nml/irida/pull/1469)

## [22.09.7] - 2023/01/24
* [UI]: Fixed bugs on NCBI Export page preventing the NCBI `submission.xml` file from being properly written. See [PR 1451](https://github.com/phac-nml/irida/pull/1451)
Expand Down
4 changes: 4 additions & 0 deletions UPGRADING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ Upgrading
This document summarizes the environmental changes that need to be made when
upgrading IRIDA that cannot be automated.

Unreleased
----------
* This upgrade deletes sequence files from the file system when they are removed from IRIDA. To clean up all previously removed sequence files, a script can be found under the `src/main/resources/scripts/sequence-files` folder in the IRIDA repo.

22.05 to 22.09
--------------
* This upgrade switches the OAuth2 implementation from using spring-security-oauth to spring-security-oauth2-authorization-server and spring-security-oauth2-resource-server. Due to the dependency updates we have changed the format of the OAuth2 access tokens, they are now JWT Tokens (https://jwt.io/introduction) and are encrypted/decrypted using a certificate within a java keystore. No default java keystore is provided, so administrators will need to update their deployments to configure an appropriate java keystore. The same java keystore will need to be present on all servers which allow api access, otherwise access tokens generated on one server will not work on any other server.
Expand Down
20 changes: 20 additions & 0 deletions src/main/resources/scripts/sequence-files/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Please follow the instructions below on how to run the purge_sequencing_files.py script.
The assumption is that Python3 and pip3 are already installed.

Install virtual env.
$ pip3 install virtualenv

Create a virtual python environment.
$ python3 -m venv .virtualenv

Activate the environment.
$ source .virtualenv/bin/activate

Install libraries.
$ pip3 install -r requirements.txt

Run the script to purge the sequence files on the filesystem.
$ python3 purge_sequence_files.py --help

Activate the environment.
$ deactivate
61 changes: 61 additions & 0 deletions src/main/resources/scripts/sequence-files/purge_sequence_files.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/python
import argparse
import mysql.connector
import os

def remove(path, purge):
if purge:
try:
if os.path.isdir(path):
JeffreyThiessen marked this conversation as resolved.
Show resolved Hide resolved
os.rmdir(path)
elif os.path.isfile(path):
os.remove(path)
else:
print("Unable to delete ", path)
except OSError as e:
print(e)
else:
print(path)

def list_sequence_files(host, user, password, database):
db = mysql.connector.connect(
host=host,
user=user,
password=password,
database=database
)
cursor = db.cursor()
# TODO: Should we double check this file doesn't exist in the actual table in case it was manually restored?
cursor.execute("SELECT DISTINCT file_path FROM sequence_file_AUD WHERE revtype=2")
JeffreyThiessen marked this conversation as resolved.
Show resolved Hide resolved
result = cursor.fetchall()
cursor.close()
db.close()
return result

def main():
parser = argparse.ArgumentParser(description="This program lists the sequence files and folders that have been previously deleted in IRIDA.")
parser.add_argument('--purge', help="Deletes the sequence files and folders from the filesystem.", action="store_true")
parser.add_argument('--baseDirectory', help="The sequence file base directory.", required=True)
JeffreyThiessen marked this conversation as resolved.
Show resolved Hide resolved
parser.add_argument('--host', default='localhost', help="The database host name.", required=False)
parser.add_argument('--database', default='irida_test', help="The database name.", required=False)
parser.add_argument('--user', default='test', help="The database user name.", required=False)
parser.add_argument('--password', default='test', help="The database password.", required=False)

args = parser.parse_args()
rows = list_sequence_files(args.host, args.user, args.password, args.database)

if rows:
for row in rows:
sequence_file_directory = os.path.dirname(os.path.dirname(args.baseDirectory + row[0]))
for root, dirs, files in os.walk(sequence_file_directory, topdown=False):
for name in files:
file = os.path.join(root, name)
remove(file, args.purge)
for name in dirs:
directory = os.path.join(root, name)
remove(directory, args.purge)
remove(sequence_file_directory, args.purge)
print("All done.")

if __name__ == '__main__':
main()
1 change: 1 addition & 0 deletions src/main/resources/scripts/sequence-files/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mysql-connector-python==8.0.22
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,8 @@ public void testAnalysisDetails() {
assertEquals(6, page.getNumberOfListItemValues(), "There should be only 6 values for these labels");

String[] expectedAnalysisDetails = new String[] {
"My Completed Submission",
"4",
"SNVPhyl Phylogenomics Pipeline (1.0.1)",
"MEDIUM",
"Oct 6, 2013, 10:01 AM",
"a few seconds" };
"My Completed Submission", "4", "SNVPhyl Phylogenomics Pipeline (1.0.1)", "MEDIUM",
"Oct 6, 2013, 10:01 AM", "a few seconds" };
assertTrue(page.analysisDetailsEqual(expectedAnalysisDetails),
"The correct details are displayed for the analysis");
}
Expand Down Expand Up @@ -401,7 +397,6 @@ void testTreeOutput() throws IOException {
page.openTreeShapeDropdown();
assertEquals("Diagonal", page.getCurrentTreeShapeTitleAttr());


page.openMetadataDropdown();
assertEquals(4, page.getNumberOfMetadataFields());
page.selectedMetadataTemplate("Testing Template 1");
Expand All @@ -423,7 +418,6 @@ void testTreeOutput() throws IOException {
Collections.sort(allSelectedFields);
assertEquals(allFields, allSelectedFields);


page.openLegend();
assertTrue(page.legendContainsCorrectAmountOfMetadataFields());
}
Expand All @@ -434,8 +428,8 @@ public void testUnknownPipelineOutput() throws IOException, URISyntaxException,
IridaWorkflow unknownWorkflow;

// Register an UNKNOWN workflow
Path workflowVersion1DirectoryPath = Paths
.get(TestAnalysis.class.getResource("workflows/TestAnalysis/1.0").toURI());
Path workflowVersion1DirectoryPath = Paths.get(
TestAnalysis.class.getResource("workflows/TestAnalysis/1.0").toURI());

iridaWorkflowsService = new IridaWorkflowsService(new IridaWorkflowSet(Sets.newHashSet()),
new IridaWorkflowIdSet(Sets.newHashSet()));
Expand Down Expand Up @@ -473,12 +467,8 @@ public void testUnknownPipelineOutput() throws IOException, URISyntaxException,
assertEquals(6, page.getNumberOfListItemValues(), "There should be only 6 values for these labels");

String[] expectedAnalysisDetails = new String[] {
"My Completed Submission UNKNOWN PIPELINE",
"14",
"Unknown Pipeline (Unknown Version)",
"MEDIUM",
"Oct 6, 2013, 10:01 AM",
"a few seconds" };
"My Completed Submission UNKNOWN PIPELINE", "14", "Unknown Pipeline (Unknown Version)", "MEDIUM",
"Oct 6, 2013, 10:01 AM", "a few seconds" };
assertTrue(page.analysisDetailsEqual(expectedAnalysisDetails),
"The correct details are displayed for the analysis");
}
Expand Down Expand Up @@ -519,4 +509,4 @@ public void testGalaxyHistoryIdVisibleOnError() {
assertTrue(page.galaxyHistoryIdVisible(), "Galaxy History Id link should not be displayed");
}

}
}