Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete table database versioning (revised PR with unit tests) #2496

Merged
merged 13 commits into from
Nov 28, 2023

Conversation

MatMoore
Copy link
Contributor

@MatMoore MatMoore commented Nov 23, 2023

This takes the changes to versioning.py from #2465, adds unit tests, and reworks the behaviour slightly.

This could do with more integration testing, but I have tested the happy path by running the container locally, and once this is rolled out to API gateway we will do some further testing in order to document the scenarios here: https://dsdmoj.atlassian.net/wiki/spaces/DataPlatform/pages/4557963328/user+journeys+via+API+WIP

New behaviour for delete table

  1. We decided Thursday morning that the delete table endpoint should return early if a schema doesn't exist, even if there may be data hanging around. I.e. we shouldn't attempt to delete data for which there is no metadata. In future we will ensure that any generated schemas are written to s3 as a prerequisite of loading data into a table.

  2. For now, we will just delete the table & data from the new version. This deviates from our original plan of deleting from all historical versions (i.e. "delete means delete"), but we had some doubts about implementing contradictory behaviour (i.e. creating a new major version while at the same time making breaking changes to existing versions doesn't make sense). I've created 💣 "Delete means delete" when deleting a table #2500 to revisit the delete means delete behaviour. For now the exact logic doesn't matter as long as the endpoint behaves consistently.

So the expected behaviour when deleting a table is now:

  1. If the schema is not found, do nothing
  2. Create a new version of the database with the new version number in
  3. If the glue table is there, delete it from the new version only
  4. Data will still be there in historical versions

Copying tables between databases

Originally this had an exclude list of fields that should not be copied over when we copy tables. I've changed this to an include list so that the code is less likely to break completely if amazon starts returning additional fields in the get table response. (When testing this I ran into several fields that weren't handled by the previous version of this code)

Refactor of glue_and_athena_utils

This now contains functions for creating, deleting, and fetching databases and tables. I've renamed some of the existing functions in this module to follow a common naming convention, but kept aliases with the old names.

@MatMoore MatMoore changed the title Delete table database versioning 2 Delete table database versioning (revised PR with unit tests) Nov 23, 2023
@MatMoore MatMoore force-pushed the delete-table-database-versioning-2 branch from 61e44f2 to ca03d48 Compare November 23, 2023 17:14
@MatMoore
Copy link
Contributor Author

MatMoore commented Nov 24, 2023

Decision after speaking to @PriyaBasker23

For now, we will just delete the table & data from the new version. This deviates from our original plan of deleting from all historical versions "delete means delete", but we had some doubts about implementing contradictory behaviour (i.e. creating a new major version while at the same time making breaking changes to existing versions doesn't make sense).

Created #2500 to revisit the delete means delete behaviour. For now the exact logic doesn't matter as long as the endpoint behaves consistently.

@MatMoore
Copy link
Contributor Author

MatMoore commented Nov 24, 2023

Need to rebase this monday due to conflicts.

Remaining stuff

  • Bump the version
  • Make sure glue errors are handled in the delete table lambda
  • Test the lambda

This allows us to create databases based on other databases.
Ensure we have create/get/list/delete operations for tables and
databases.

Additionally, add a clone_database operation for us during major version
updates. In this scenario we need to create a copy of a database with
a new version number.
@MatMoore MatMoore force-pushed the delete-table-database-versioning-2 branch from 5b29d71 to c0c1e1c Compare November 27, 2023 13:37
@MatMoore MatMoore force-pushed the delete-table-database-versioning-2 branch from c0c1e1c to a1c024b Compare November 27, 2023 13:44
For now, we will just delete the table & data from the new version. This
deviates from our original plan of deleting from all historical versions
"delete means delete", but we had some doubts about implementing
contradictory behaviour (i.e. creating a new major version while at the
same time making breaking changes to existing versions doesn't make
sense).

This behaviour will be revisited in #2500
@MatMoore MatMoore force-pushed the delete-table-database-versioning-2 branch from a1c024b to 37ee76f Compare November 27, 2023 13:56
@MatMoore MatMoore force-pushed the delete-table-database-versioning-2 branch from dd7429f to a62d02d Compare November 27, 2023 15:41
@MatMoore MatMoore force-pushed the delete-table-database-versioning-2 branch from a62d02d to 32f19e2 Compare November 27, 2023 15:48
- Support for 7.5.0
- Fix clone operation passing invalid parameters
- Configure logger to use the console logger when testing locally
@MatMoore MatMoore force-pushed the delete-table-database-versioning-2 branch from 0116cd1 to 876e7a3 Compare November 27, 2023 15:56
@MatMoore MatMoore marked this pull request as ready for review November 27, 2023 16:03
@MatMoore MatMoore requested a review from a team November 27, 2023 16:03
murdo-moj
murdo-moj previously approved these changes Nov 28, 2023
@MatMoore MatMoore merged commit 0a776e9 into main Nov 28, 2023
33 checks passed
@MatMoore MatMoore deleted the delete-table-database-versioning-2 branch November 28, 2023 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants