refactor(core): UUID suffix instead of Unix timestamp #72

emesar · 2023-11-09T03:32:39Z

I'm attempting parallel writes to the same target BigQuery table from multiple source taps (using Meltano). When a Unix timestamp is used as a suffix for the intermediate table, there is a significant chance that concurrent processes produce the same name for it (i.e., if they happen to call time.time() within the same second). This results in a scenario in which the first process to finish loading records drops the table, and a latter process raises an exception - e.g.: google.api_core.exceptions.BadRequest: 400 Not found: Table project:dataset.table_name__1699498892 was not found.

I'm not sure if there was another motivation behind using the Unix timestamp as a suffix, but if it was purely for the sake of uniqueness, I'd like to recommend using a UUID v4 suffix instead, which is effectively guaranteed to be unique across concurrent processes.

z3z1ma · 2023-11-10T02:43:32Z

LGTM

refactor(core): UUID suffix instead of Unix timestamp

f376497

z3z1ma merged commit 2d59eae into z3z1ma:main Nov 10, 2023

maamoonhussain mentioned this pull request Jan 17, 2024

feat: add date suffix in addition to UUID suffix for a well organized data lake (GCS Load Patten) #77

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(core): UUID suffix instead of Unix timestamp #72

refactor(core): UUID suffix instead of Unix timestamp #72

emesar commented Nov 9, 2023

z3z1ma commented Nov 10, 2023

refactor(core): UUID suffix instead of Unix timestamp #72

refactor(core): UUID suffix instead of Unix timestamp #72

Conversation

emesar commented Nov 9, 2023

z3z1ma commented Nov 10, 2023