-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate identity hashes #5256
Migrate identity hashes #5256
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
fides
|
Project |
fides
|
Branch Review |
refs/pull/5256/merge
|
Run status |
|
Run duration | 00m 35s |
Commit |
|
Committer | Adrian Galvan |
View all properties for this run ↗︎ |
Test results | |
---|---|
|
0
|
|
0
|
|
0
|
|
0
|
|
4
|
View all changes introduced in this branch ↗︎ |
@classmethod | ||
@abstractmethod | ||
def bcrypt_hash_value( | ||
cls, | ||
value: MultiValue, | ||
encoding: str = "UTF-8", | ||
) -> Optional[str]: | ||
"""Hash value using bcrypt.""" | ||
|
||
@classmethod | ||
@abstractmethod | ||
def hash_value( | ||
cls, | ||
value: MultiValue, | ||
encoding: str = "UTF-8", | ||
) -> Optional[str]: | ||
"""Hash value using SHA-256.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set these as abstract because I expect any class that implements the HashMigrationMixin
to have these two functions. Not sure if there is a better way to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks right - you may need/want to declare the mixin class as an ABC
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding it but I got this error
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
Going to save further research into this until I address the other issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@galvana this is looking quite good - a really nice implementation and an elegant way of handling the migration. very well documented throughout, and nicely designed - it's really easy to follow.
i've got some minor comments throughout, let me know what you think. i didn't test myself, but assume you've given it, or will give it, some rigorous manual testing, given how involved of a change it is.
i tried to leave some comments on places where we may want to mark code for eventual deprecation/removal since some of this is only needed temporarily by nature...but i certainly missed some spots. so that's just a general comment - it may be good taking a pass through and clearly marking which modules/classes/functions are just needed in an "interim" while we ensure migrations are completed....
src/fides/api/alembic/migrations/versions/a249a089f23b_add_identity_salt_table.py
Outdated
Show resolved
Hide resolved
) | ||
|
||
|
||
@celery_app.task(base=DatabaseTask, bind=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think this should be a celery task? you're not running it with celery, just our apscheduler
task runner which doesn't interact with celery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following the pattern we had for other jobs, plus this has a nice way to get a database connection 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, i do appreciate looking for consistency, but i'm pretty sure that this is rarely/never intended if the function isn't actually going to be run as a celery task.
as i look a bit closer into it, i suppose it's possible that this type of usage could be intentional, since we actually leverage a different db engine (our 'task engine') within the DatabaseTask.get_new_session
...so maybe there's a use case where we'd want to use that engine, even if the functions aren't running as celery tasks? it feels a bit contrived - but i'd like to check with @pattisdr if that's what was intended when adding the decorator to e.g. this function in the DSR 3.0 refactor - or if that's also just accidental usage!
i do wanna get clear on this, because as your comment indicates, if we don't start clearing this up, it's only going to cause more confusion as we add more scheduled tasks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too long ago to recall but I doubt it was intentional, just an easy way to get the session. I agree we're creating unneccesary confusion here by doing it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @pattisdr -
in that case @galvana, i'd suggest we remove this. it should be pretty straightforward for us to get a well-handled db session - see an analogous example here in the discovery monitor executor (this function is also run as a scheduled task!). that's using a function defined in fidesplus
to retrieve the db session but we can/should add a @contextmanager
decorator to our deps.get_db()
in fides OSS, i think, so it can be used similarly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you two for the input! I've promoted get_db
to a context manager, let's just make sure all the tests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @adamsachs. I addressed most of your concerns except for these two:
- Still using a Celery task for the migration, I'm just sticking to the convention for our other jobs
- I ran into conflicts adding
ABC
toHashMigrationMixin
and decided it wasn't worth troubleshooting further at this point in the sprint
src/fides/api/alembic/migrations/versions/a249a089f23b_add_identity_salt_table.py
Outdated
Show resolved
Hide resolved
src/fides/api/alembic/migrations/versions/a249a089f23b_add_identity_salt_table.py
Outdated
Show resolved
Hide resolved
@classmethod | ||
@abstractmethod | ||
def bcrypt_hash_value( | ||
cls, | ||
value: MultiValue, | ||
encoding: str = "UTF-8", | ||
) -> Optional[str]: | ||
"""Hash value using bcrypt.""" | ||
|
||
@classmethod | ||
@abstractmethod | ||
def hash_value( | ||
cls, | ||
value: MultiValue, | ||
encoding: str = "UTF-8", | ||
) -> Optional[str]: | ||
"""Hash value using SHA-256.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding it but I got this error
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
Going to save further research into this until I address the other issues.
) | ||
|
||
|
||
@celery_app.task(base=DatabaseTask, bind=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following the pattern we had for other jobs, plus this has a nice way to get a database connection 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great @galvana , thanks for the extra care in testing some of my questions out! 👍
only lingering question for me is the celery decorator, don't need to hold up approval on that but i would like to get that resolved one way or another before we merge
fides
|
Project |
fides
|
Branch Review |
main
|
Run status |
|
Run duration | 00m 35s |
Commit |
|
Committer | Adrian Galvan |
View all properties for this run ↗︎ |
Test results | |
---|---|
|
0
|
|
0
|
|
0
|
|
0
|
|
4
|
View all changes introduced in this branch ↗︎ |
Closes PROD-2633
Description Of Changes
This PR migrates usages of the
bcrypt
hash toSHA-256 + salt
for non-credential data. Thebcrypt
hash function is a computationally intensive function that should only be used for passwords, client secrets, and other credentials. The use ofbcrypt
to hash identity data like emails and phone numbers was causing performance issues for any endpoint that hashed incoming identities either for saving or for searching.At a high level the migration approach is:
bcrypt
hash and aSHA-256
hashSHA-256
bcrypt
hashes toSHA-256
hashesSHA-256
hashesCode Changes
is_hash_migrated
column to select tables to track the migrationidentity_table
with a single encrypted value to store the system-wide identity salt@cache
decoratedget_identity_salt
function that also initializes the salt value on server startup if it doesn't existhash_with_salt
tohash_credential_with_salt
(bcrypt) for passwords andhash_value_with_salt
(SHA-256) for identity valuesHashMigrationMixin
with shared migration logicbcrypt_migration_task
jobSteps to Confirm
Completed hash migration for...
privacyrequest
table and make sure every row hasis_hash_migrated
set to trueThis is the easiest check but this migration applies to
Pre-Merge Checklist
CHANGELOG.md
main
downgrade()
migration is correct and works