Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Database Refactoring #4958

Merged
merged 17 commits into from
Nov 10, 2018
Merged

Major Database Refactoring #4958

merged 17 commits into from
Nov 10, 2018

Conversation

ewdurbin
Copy link
Member

working on resolving #4772

@ewdurbin

This comment has been minimized.

@ewdurbin

This comment has been minimized.

@ewdurbin

This comment has been minimized.

@steiza

This comment has been minimized.

@ewdurbin

This comment has been minimized.

@di
Copy link
Member

di commented Oct 27, 2018

@ewdurbin Are you auto-generating the migration from the models? Looks to me like there's a bit of disagreement between the two.

@ewdurbin
Copy link
Member Author

@di not completely no. there is currently some disagreement because I wanted to get what I had up for review before departing for a bit! fixing tests and syncing now!

@ewdurbin
Copy link
Member Author

I think that resolves most of the schema and model concerns. Now it's a matter of cleaning up the newly goof'd references. I plan to pick this up tomorrow morning if no-one runs with it.

Contributors: Feel free to pick branch from this and add commits in a separate PR! Just note that you've done so and reference this pull request. I'll happily close and pick up from your work in the morning :)

@ewdurbin ewdurbin added the help needed We'd love volunteers to advise on or help fix/implement this. label Oct 28, 2018
@ewdurbin
Copy link
Member Author

Could use some help pushing this along. There's something I don't seem to understand in the relationship between Dependency and Release. I'm not sure if that's a modeling issue or an issue with our test factories.

Copy link
Member

@dstufft dstufft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like some of the other tracebacks are due to project_id being null when a flush is happening. You might have to adjust some queries to handle this change.

warehouse/packaging/models.py Outdated Show resolved Hide resolved
warehouse/packaging/models.py Outdated Show resolved Hide resolved
@dstufft
Copy link
Member

dstufft commented Oct 28, 2018

I've fixed a few more of the tests. I think the remaining tests exist because we still have Release.name, File.name, File.version, but nothing is populating them. I see three possible options for fixing it:

  1. Adjust all of the queries we make, to pass those values in.
  2. Add a trigger in the database to automatically populate those columns from the referenced parent whenever the value is null.
  3. Remove Release.name, File.name, and File.version, and update all of our queries to use joins to get that information from the normalized location.

@dstufft
Copy link
Member

dstufft commented Oct 28, 2018

Oh, and (1) and (2) will need something to make sure that if we update Project.name or Release.version, it gets updated in the references Release and/or File objects.

(3) is maybe the right answer, but we'd have to see how it performed, doing those joins might increase the query duration by a non-trivial amount. I have no idea.

@dstufft
Copy link
Member

dstufft commented Nov 5, 2018

@dstufft dstufft removed the help needed We'd love volunteers to advise on or help fix/implement this. label Nov 6, 2018
@dstufft dstufft changed the title WIP: Migrate to UUID Primary Key for Project and Release models WIP: Major Database Refactoring Nov 6, 2018
@dstufft
Copy link
Member

dstufft commented Nov 6, 2018

I'm hijacking this PR to include a number of other changes that I think we should do, but I've avoided doing due to downtime they would likely require. Since this PR is going to require downtime already, merging all of these changes into a singe PR allows us to just take the downtime once.

@dstufft
Copy link
Member

dstufft commented Nov 6, 2018

Ok, so the effect of this PR:

  • Add a Project.id and Release.id column.
  • Replaces all of the FKs to Project and Release to use that new id column.
  • Deletes the Admin role, which is no longer being used.
  • Updates Role.user, Release.uploader, and JournalEntry.submitted_by to FK using User.id instead of User.username.
    • In the case of Release.uploader, this includes removing a gnarly join and just setting the uploader at upload time.
  • Renames packages to projects.
  • Renames accounts_user to users.
  • Renames accounts_email to user_emails.
  • Renames warehouse_admin_* to admin_*.

Replacing all uses of things like Release.name to be Release.project.name means we're now triggering joins (or extra queries) when we weren't previously. I've tried to go through and spot check to see if we've introduced any of those, and fixed any I found. However it's possible that some will still have been missed. We'll want to monitor performance after this to make sure we're not degrading our perf.

I expect this set of migrations will take a bit to run. On the dev db, my make initdb is taking about 10 minutes, and the production database is quite a bit larger.

@dstufft dstufft changed the title WIP: Major Database Refactoring Major Database Refactoring Nov 6, 2018
@ewdurbin
Copy link
Member Author

ewdurbin commented Nov 7, 2018

Given the complexity and potential for a long migration, I'd like to schedule downtime to complete this rollout.

This Saturday morning Eastern time is good for me, but I'd also like it if I could have some additional support from either @dstufft or @di during this time. Does Saturday 2018-11-10T10:00:00-05:00 work for either/both of you?

I'd estimate we schedule two hours of downtime for uploads, but most likely only use about half of that.

@dstufft
Copy link
Member

dstufft commented Nov 7, 2018

Maybe, there's something on my calendar, but I'm not sure what it is or if I have to go to it. Will have to check and get back.

@di
Copy link
Member

di commented Nov 7, 2018

I'll be at PyCon CA that weekend -- Sunday would be better for me.

@ewdurbin
Copy link
Member Author

ewdurbin commented Nov 7, 2018

Sunday is toast for me. Committed all afternoon.

@dstufft if something a bit later works for you Saturday I'm good all afternoon (12PM-5PM Eastern)

@ewdurbin
Copy link
Member Author

ewdurbin commented Nov 8, 2018

@ewdurbin
Copy link
Member Author

Deployment to prod disabled temporarily. Merging to send to test.pypi.org.

@ewdurbin ewdurbin merged commit 6cb8bf9 into master Nov 10, 2018
@ewdurbin ewdurbin deleted the surrogate_keys branch November 10, 2018 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants