-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete table database versioning (revised PR with unit tests) #2496
Conversation
61e44f2
to
ca03d48
Compare
Decision after speaking to @PriyaBasker23 For now, we will just delete the table & data from the new version. This deviates from our original plan of deleting from all historical versions "delete means delete", but we had some doubts about implementing contradictory behaviour (i.e. creating a new major version while at the same time making breaking changes to existing versions doesn't make sense). Created #2500 to revisit the delete means delete behaviour. For now the exact logic doesn't matter as long as the endpoint behaves consistently. |
Need to rebase this monday due to conflicts. Remaining stuff
|
This allows us to create databases based on other databases.
Ensure we have create/get/list/delete operations for tables and databases. Additionally, add a clone_database operation for us during major version updates. In this scenario we need to create a copy of a database with a new version number.
5b29d71
to
c0c1e1c
Compare
c0c1e1c
to
a1c024b
Compare
For now, we will just delete the table & data from the new version. This deviates from our original plan of deleting from all historical versions "delete means delete", but we had some doubts about implementing contradictory behaviour (i.e. creating a new major version while at the same time making breaking changes to existing versions doesn't make sense). This behaviour will be revisited in #2500
a1c024b
to
37ee76f
Compare
dd7429f
to
a62d02d
Compare
a62d02d
to
32f19e2
Compare
- Support for 7.5.0 - Fix clone operation passing invalid parameters - Configure logger to use the console logger when testing locally
0116cd1
to
876e7a3
Compare
containers/daap-python-base/src/var/task/glue_and_athena_utils.py
Outdated
Show resolved
Hide resolved
containers/daap-python-base/src/var/task/glue_and_athena_utils.py
Outdated
Show resolved
Hide resolved
containers/daap-python-base/src/var/task/glue_and_athena_utils.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Murdo <[email protected]>
This takes the changes to versioning.py from #2465, adds unit tests, and reworks the behaviour slightly.
This could do with more integration testing, but I have tested the happy path by running the container locally, and once this is rolled out to API gateway we will do some further testing in order to document the scenarios here: https://dsdmoj.atlassian.net/wiki/spaces/DataPlatform/pages/4557963328/user+journeys+via+API+WIP
New behaviour for delete table
We decided Thursday morning that the delete table endpoint should return early if a schema doesn't exist, even if there may be data hanging around. I.e. we shouldn't attempt to delete data for which there is no metadata. In future we will ensure that any generated schemas are written to s3 as a prerequisite of loading data into a table.
For now, we will just delete the table & data from the new version. This deviates from our original plan of deleting from all historical versions (i.e. "delete means delete"), but we had some doubts about implementing contradictory behaviour (i.e. creating a new major version while at the same time making breaking changes to existing versions doesn't make sense). I've created 💣 "Delete means delete" when deleting a table #2500 to revisit the delete means delete behaviour. For now the exact logic doesn't matter as long as the endpoint behaves consistently.
So the expected behaviour when deleting a table is now:
Copying tables between databases
Originally this had an exclude list of fields that should not be copied over when we copy tables. I've changed this to an include list so that the code is less likely to break completely if amazon starts returning additional fields in the get table response. (When testing this I ran into several fields that weren't handled by the previous version of this code)
Refactor of glue_and_athena_utils
This now contains functions for creating, deleting, and fetching databases and tables. I've renamed some of the existing functions in this module to follow a common naming convention, but kept aliases with the old names.