Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding start and end dates to cleanup script #1487

Merged
merged 3 commits into from
Dec 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
* [Developer]: Added override flag that determines if files should be deleted from file storage. [See PR 1486](https://github.com/phac-nml/irida/pull/1486)
* [Developer]: Fixed flaky text in `PipelinesPhylogenomicsPageIT#testPageSetup` test. See [PR 1490](https://github.com/phac-nml/irida/pull/1492)
* [ALL]: Added LDAP/ADLDAP support.
* [Developer]: Added start and ends dates to filesystem clean up script. [See PR 1487](https://github.com/phac-nml/irida/pull/1487)

## [23.01.3] - 2023/05/09
* [Developer]: Fixed issue with metadata uploader removing existing data. See [PR 1489](https://github.com/phac-nml/irida/pull/1489)
Expand Down
16 changes: 13 additions & 3 deletions src/main/resources/scripts/sequence-files/purge_sequence_files.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/usr/bin/python
import argparse
import datetime
import mysql.connector
import os

Expand All @@ -17,7 +18,7 @@ def remove(path, purge):
else:
print(path)

def list_sequence_files(host, user, password, database):
def list_sequence_files(startDate, endDate, host, user, password, database):
db = mysql.connector.connect(
host=host,
user=user,
Expand All @@ -26,7 +27,14 @@ def list_sequence_files(host, user, password, database):
)
cursor = db.cursor()
# TODO: Should we double check this file doesn't exist in the actual table in case it was manually restored?
cursor.execute("SELECT DISTINCT file_path FROM sequence_file_AUD WHERE revtype=2")
if(startDate and endDate):
cursor.execute("SELECT DISTINCT file_path FROM sequence_file_AUD WHERE revtype=2 AND modified_date BETWEEN %s AND %s", (startDate, endDate))
elif(startDate):
cursor.execute("SELECT DISTINCT file_path FROM sequence_file_AUD WHERE revtype=2 AND modified_date >= %s", (startDate,))
elif(endDate):
cursor.execute("SELECT DISTINCT file_path FROM sequence_file_AUD WHERE revtype=2 AND modified_date <= %s", (endDate,))
else:
cursor.execute("SELECT DISTINCT file_path FROM sequence_file_AUD WHERE revtype=2")
result = cursor.fetchall()
cursor.close()
db.close()
Expand All @@ -35,14 +43,16 @@ def list_sequence_files(host, user, password, database):
def main():
parser = argparse.ArgumentParser(description="This program lists the sequence files and folders that have been previously deleted in IRIDA.")
parser.add_argument('--purge', help="Deletes the sequence files and folders from the filesystem.", action="store_true")
parser.add_argument('--startDate', type=datetime.date.fromisoformat, help="The start date in format YYYY-MM-DD (inclusive).", required=False)
parser.add_argument('--endDate', type=datetime.date.fromisoformat, help="The end date in format YYYY-MM-DD (inclusive).", required=False)
parser.add_argument('--baseDirectory', default='/tmp/irida/sequence-files', help="The sequence file base directory.", required=False)
parser.add_argument('--host', default='localhost', help="The database host name.", required=False)
parser.add_argument('--database', default='irida_test', help="The database name.", required=False)
parser.add_argument('--user', default='test', help="The database user name.", required=False)
parser.add_argument('--password', default='test', help="The database password.", required=False)

args = parser.parse_args()
rows = list_sequence_files(args.host, args.user, args.password, args.database)
rows = list_sequence_files(args.startDate, args.endDate, args.host, args.user, args.password, args.database)
if rows:
for row in rows:
sequence_file_directory = os.path.dirname(os.path.dirname(os.path.join(args.baseDirectory, row[0])))
Expand Down
Loading