Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(core): UUID suffix instead of Unix timestamp #72

Merged
merged 1 commit into from
Nov 10, 2023

Conversation

emesar
Copy link
Contributor

@emesar emesar commented Nov 9, 2023

I'm attempting parallel writes to the same target BigQuery table from multiple source taps (using Meltano). When a Unix timestamp is used as a suffix for the intermediate table, there is a significant chance that concurrent processes produce the same name for it (i.e., if they happen to call time.time() within the same second). This results in a scenario in which the first process to finish loading records drops the table, and a latter process raises an exception - e.g.: google.api_core.exceptions.BadRequest: 400 Not found: Table project:dataset.table_name__1699498892 was not found.

I'm not sure if there was another motivation behind using the Unix timestamp as a suffix, but if it was purely for the sake of uniqueness, I'd like to recommend using a UUID v4 suffix instead, which is effectively guaranteed to be unique across concurrent processes.

@z3z1ma
Copy link
Owner

z3z1ma commented Nov 10, 2023

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants