Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a query for archivable histories based on user and history last active time #143

Merged
merged 2 commits into from
Jan 8, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions parts/22-query.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4979,3 +4979,71 @@ query_tools-usage-per-month() { ##? [--startmonth=<YYYY>-<MM>] [--endmonth=<YYYY
month DESC
EOF
}

query_archivable-histories() { ##? [--user-last-active=360] [--history-last-active=360] [--size]
meta <<-EOF
AUTHORS: natefoo
ADDED: 22
EOF
handle_help "$@" <<-EOF
Get a list of archivable histories based on user and history age.

$ gxadmin query archivable-histories
...

The --size option can be used to show the size of the histories returned, but can significantly slow the
query.

One useful way to use this function is like so:

$ gxadmin tsvquery archivable-histories --size | \\
awk -F'\\t' '{print \$1; sum+=\$NF;} END {print "Total: " sum/1024^3 " GB" > "/dev/stderr";}' | \\
GALAXY_CONFIG_FILE=/gx/config/galaxy.yml xargs /gx/venv/bin/python3 | \\
/gx/galaxy/scripts/secret_decoder_ring.py encode

This outputs the total size to archive to stderr while encoding all history IDs on stdout for
consumption by API-based archival tools.
EOF

extra_selects=
extra_joins=
extra_conds=
group_by=
if [[ -n $arg_size ]]; then
extra_selects=',
sum(dataset.total_size) AS size
'
extra_joins='JOIN history_dataset_association on history.id = history_dataset_association.history_id
JOIN dataset on history_dataset_association.dataset_id = dataset.id'
extra_conds='AND NOT history_dataset_association.purged
AND NOT dataset.purged'
group_by='GROUP BY
history.id, galaxy_user.id
'
fi

email=$(gdpr_safe galaxy_user.email email)

read -r -d '' QUERY <<-EOF
SELECT
history.id,
$email,
date(galaxy_user.update_time) user_age,
date(history.update_time) history_age
$extra_selects
FROM
history
JOIN galaxy_user ON history.user_id = galaxy_user.id
$extra_joins
WHERE
NOT history.published
AND history.update_time < now() - interval '$arg_history_last_active days'
AND galaxy_user.update_time < now() - interval '$arg_user_last_active days'
$extra_conds
$group_by
ORDER BY
user_age ASC,
galaxy_user.email ASC,
history_age ASC
EOF
}
Loading