Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop database views #15876

Merged
merged 15 commits into from
Apr 26, 2023
Merged

Drop database views #15876

merged 15 commits into from
Apr 26, 2023

Conversation

jdavcs
Copy link
Member

@jdavcs jdavcs commented Mar 30, 2023

See alternative implementation in #15888

IMPORTANT: I've dropped this code (joinedload option for job_state_summary in history_contents manager). I'm not sure if this breaks something (or negatively affects performance). If this needs to be restored, my suggestion is to go with #15888, since this version does not use a mapped class and so it cannot pull these summary counts as a query load option.

Why drop views:

  • A view is a database-level abstraction we simply don't need: we define our abstractions in the code base, not the database. There's no difference in terms of convenience or performance between querying a database view or a SQLAlchemy construct like a Selectable.
  • Views add unnecessary complexity for mapping: it's a custom DDL element that requires some setup (see reference implementation on sqlalchemy's wiki), and, especially, database migrations (see Alembic's replaceable objects).

Unlike #15888, this is a more intrusive refactoring of HistoryDatasetCollectionJobStateSummary's implementation.

  1. Here we remove the mapped class and instead use a simple query. I think there is no need for a mapped class: there is no state that the ORM needs to track: it's just one readonly row containing 16 integer values which are aggregate counts. Executing a query seems like a better fit: is simple, and it's much more flexible than dealing with a mapped class. (also, as per SA's docs, this is not a recommended approach)
  2. I've simplified the query. It returns the same data (I've verified this on main). The previous version used an outer join, which resulted in containing rows that did not add anything to the counts (rows having no associated job contributed neither state counts, nor total count), which is why a group by clause was necessary. More importantly, the previous query could only be used in a join - i.e., the class was ONLY used to define a relationship with a parent class - which added the HDCA.id as the join criteria. Without it the query is ..expensive: cost=20432585 (so using the class as a standalone makes no sense).
  3. I've rearranged the joins to make the logic follow the join path: we select job state, joining along this path: icjja > icj > hdca. Again, there's no difference in what data is produced by the query.
  4. If there are no associated jobs, the simplified query returns a tuple of 15 nulls and one zero for total jobs. That's because we don't select the jobs we won't need in the first place, and the sum function returns a null if there are no rows to sum over. We fix this in the code by simply assigning zeros to all counts if the total count is zero.

SAME NOTE as in 15888: The part of the query that enumerates the job states (aggregate sum columns) is now generated dynamically based on members of model.Job.states. This was done as a safeguard against missing a state (assertion added) or listing a non-existent state (e.g. the original view's query had a bug: the state deleted_new has been replaced with deleting, which may have resulted in incorrect counts.)

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs added kind/refactoring cleanup or refactoring of existing code, no functional changes area/database Galaxy's database or data access layer labels Mar 30, 2023
@jdavcs jdavcs added this to the 23.1 milestone Mar 30, 2023
@jdavcs
Copy link
Member Author

jdavcs commented Mar 31, 2023

Superseded by #15888. (both are valid alternatives)

@jdavcs jdavcs closed this Mar 31, 2023
@jdavcs
Copy link
Member Author

jdavcs commented Apr 1, 2023

Nope, let's try again. I think I found the problem.

@jdavcs jdavcs reopened this Apr 1, 2023
@jdavcs jdavcs force-pushed the dev_no_views branch 2 times, most recently from 8eac1a1 to 333bbc6 Compare April 1, 2023 04:32
@mvdbeek
Copy link
Member

mvdbeek commented Apr 5, 2023

This looks fine to me, but don't you need to explicitly drop the view in a new migration ?

@jdavcs
Copy link
Member Author

jdavcs commented Apr 10, 2023

@mvdbeek Thank you for looking at these!

I've added the revision for dropping the view (I had it in the PR's initial to-dos but then lost it in the edits.. Thanks!)

I've rewritten the revision script that edited the view (c39f1de47a04) without the use of alembic_util. I think that's all we need. I've tested it - works both ways. I've also dropped the alembic_util dependency in the next commit - I don't think we need it anymore.

@jdavcs jdavcs marked this pull request as ready for review April 10, 2023 22:08
@mvdbeek mvdbeek self-requested a review April 11, 2023 14:03
lib/galaxy/model/__init__.py Outdated Show resolved Hide resolved
lib/galaxy/model/__init__.py Outdated Show resolved Hide resolved
jdavcs and others added 3 commits April 20, 2023 10:12
value.value for value in ...values!

Co-authored-by: Nicola Soranzo <[email protected]>
@jdavcs
Copy link
Member Author

jdavcs commented Apr 20, 2023

@nsoranzo thanks for the suggestions! #15811 builds on top of this one - I'll rebase that one accordingly once this is merged.

@mvdbeek mvdbeek merged commit eb4cd72 into galaxyproject:dev Apr 26, 2023
@mvdbeek
Copy link
Member

mvdbeek commented Apr 26, 2023

Thanks @jdavcs!

@jdavcs
Copy link
Member Author

jdavcs commented Apr 26, 2023

Thank you for reviewing, @nsoranzo and @mvdbeek!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer kind/refactoring cleanup or refactoring of existing code, no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants