Hard removal of items in the database (scalability discussion) #13992

jdavcs · 2022-06-01T17:12:01Z

hexylena · 2022-06-02T08:25:12Z

database objects (tables, columns) that consume the most resources

I made this graph a long time ago to answer the same question, hope it helps!
https://stats.galaxyproject.eu/d/000000019/galaxy-database?orgId=1&refresh=5m&viewPanel=40

jdavcs · 2022-06-02T14:11:12Z

I made this graph a long time ago to answer the same question, hope it helps!

Oh yes, absolutely, thanks! I have a spreadsheet with reasonably precise estimates of number of rows per table on main, but that one is from 2020, so I'll update it and then post it here. (while size per table is a more useful metric, a curious detail, at least from 2 years ago, was that 42 tables had zero rows).

hexylena · 2022-06-02T14:30:25Z

Really easy to get from the following query if you just need fast estimates:

SELECT reltuples, relname FROM pg_class where relname not like 'pg_%' and relname not like 'ix_%' and relname not like '%_pkey' order by reltuples desc;

   reltuples   |                             relname
---------------+-----------------------------------------------------------------
  5.796195e+08 | job_metric_numeric
 4.5395987e+08 | job_parameter
 3.0803376e+08 | job_state_history
 1.6113411e+08 | history_dataset_association_history
  1.511119e+08 | dataset_permissions
 1.0779887e+08 | history_dataset_association
  9.121957e+07 | job_to_input_dataset
  8.872237e+07 | dataset
  8.311228e+07 | dataset_collection_element
 7.9665984e+07 | job_to_output_dataset
 4.7278964e+07 | job
 4.2060216e+07 | job_to_output_dataset_collection
 4.1388948e+07 | job_metric_text
 3.2805136e+07 | cleanup_event_dataset_association
 3.2322692e+07 | implicit_collection_jobs_job_association
 2.5986542e+07 | galaxy_session
 2.0864564e+07 | cleanup_event_hda_association

~~I'll add it to gxadmin since that's a useful query to have.~~
Edit it's there. gxadmin query pg-rows-per-table

$ gxadmin query pg-rows-per-table | head
 table_schema |                           table_name                           |     rows
--------------+----------------------------------------------------------------+---------------
 galaxy       | job_metric_numeric                                             |  5.796195e+08
 galaxy       | job_parameter                                                  | 4.5395987e+08
 galaxy       | job_state_history                                              | 3.0803376e+08
 galaxy       | history_dataset_association_history                            | 1.6113411e+08
 galaxy       | dataset_permissions                                            |  1.511119e+08
 galaxy       | history_dataset_association                                    | 1.0779887e+08
 galaxy       | job_to_input_dataset                                           |  9.121957e+07
 galaxy       | dataset                                                        |  8.872237e+07

Edit 2: 50 empty tables

$ gxadmin query pg-rows-per-table | grep ' 0' | wc -l
50

jdavcs · 2022-06-02T14:55:36Z

Really easy to get from the following query if you just need fast estimates:

Thank you! Very easy indeed. So we have 110(!) empty tables (which includes a few that are not in the model, but still..)

natefoo · 2022-06-02T15:11:53Z

And for the record, here's the query I was using in Montpellier to get some of the space usage numbers:

galaxy_main=> SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS index
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS table
  FROM (
  SELECT *, total_bytes-index_bytes-coalesce(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS table_name
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r'
  ) a
) a order by a.total_bytes desc;

Just total bytes alone would be useful as a gxadmin query to use as a Grafana data source.

jdavcs added kind/enhancement area/admin area/database Galaxy's database or data access layer area/backend labels Jun 1, 2022

jdavcs added this to the 22.09 milestone Jun 1, 2022

jdavcs mentioned this issue Jun 1, 2022

Drop update_time from JobStateHistory #13997

Merged

5 tasks

mvdbeek assigned jdavcs Aug 30, 2022

jdavcs added the planning label Oct 20, 2022

martenson mentioned this issue Oct 28, 2022

Executive summary of 23.1 Backend Goals #14886

Closed

dannon modified the milestones: 23.0, 23.1 Feb 14, 2023

mvdbeek modified the milestones: 23.1, 23.2 Jul 21, 2023

jdavcs modified the milestones: 23.2, 24.0 Jan 10, 2024

jdavcs modified the milestones: 24.0, 24.1 Mar 1, 2024

jdavcs removed this from the 24.1 milestone May 14, 2024

jdavcs mentioned this issue Jan 15, 2025

Allow purging users, jobs completely from the DB #13738

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard removal of items in the database (scalability discussion) #13992

Hard removal of items in the database (scalability discussion) #13992

jdavcs commented Jun 1, 2022 •

edited

Loading

hexylena commented Jun 2, 2022

jdavcs commented Jun 2, 2022

hexylena commented Jun 2, 2022 •

edited

Loading

jdavcs commented Jun 2, 2022

natefoo commented Jun 2, 2022

Hard removal of items in the database (scalability discussion) #13992

Hard removal of items in the database (scalability discussion) #13992

Comments

jdavcs commented Jun 1, 2022 • edited Loading

hexylena commented Jun 2, 2022

jdavcs commented Jun 2, 2022

hexylena commented Jun 2, 2022 • edited Loading

jdavcs commented Jun 2, 2022

natefoo commented Jun 2, 2022

jdavcs commented Jun 1, 2022 •

edited

Loading

hexylena commented Jun 2, 2022 •

edited

Loading