Skip to content

Commit

Permalink
Add a query for archivable histories based on user and history last a…
Browse files Browse the repository at this point in the history
…ctive time
  • Loading branch information
natefoo committed Dec 14, 2023
1 parent 8aa5e41 commit be44144
Showing 1 changed file with 67 additions and 0 deletions.
67 changes: 67 additions & 0 deletions parts/22-query.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4905,3 +4905,70 @@ query_tpt-tool-memory() { ##? [--startyear=<YYYY>] [--endyear=<YYYY>] [--formula
EOF
}

query_archivable-histories() { ##? [--user-last-active=360] [--history-last-active=360] [--size]
meta <<-EOF
AUTHORS: natefoo
ADDED: 22
EOF
handle_help "$@" <<-EOF
Get a list of archivable histories based on user and history age.
$ gxadmin query archivable-histories
...
The --size option can be used to show the size of the histories returned, but can significantly slow the
query.
One useful way to use this function is like so:
$ gxadmin tsvquery archivable-histories --size | \\
awk -F'\\t' '{print \$1; sum+=\$NF;} END {print "Total: " sum/1024^3 " GB" > "/dev/stderr";}' | \\
GALAXY_CONFIG_FILE=/gx/config/galaxy.yml xargs /gx/venv/bin/python3 | \\
/gx/galaxy/scripts/secret_decoder_ring.py encode
This outputs the total size to archive to stderr while encoding all history IDs on stdout for
consumption by API-based archival tools.
EOF

extra_selects=
extra_joins=
extra_conds=
group_by=
if [[ -n $arg_size ]]; then
extra_selects=',
sum(dataset.id) AS size
'
extra_joins='JOIN history_dataset_association on history.id = history_dataset_association.history_id
JOIN dataset on history_dataset_association.dataset_id = dataset.id'
extra_conds='AND NOT history_dataset_association.purged
AND NOT dataset.purged'
group_by='GROUP BY
history.id, galaxy_user.id
'
fi

email=$(gdpr_safe galaxy_user.email email)

read -r -d '' QUERY <<-EOF
SELECT
history.id,
$email,
date(galaxy_user.update_time) user_age,
date(history.update_time) history_age
$extra_selects
FROM
history
JOIN galaxy_user ON history.user_id = galaxy_user.id
$extra_joins
WHERE
NOT history.published
AND history.update_time < now() - interval '$arg_history_last_active days'
AND galaxy_user.update_time < now() - interval '$arg_user_last_active days'
$extra_conds
$group_by
ORDER BY
user_age ASC,
galaxy_user.email ASC,
history_age ASC
EOF
}

0 comments on commit be44144

Please sign in to comment.