Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): redshift source gets external table types properly #3371

Merged

Conversation

treff7es
Copy link
Contributor

Earlier the redshift source failed to get varchar sizes and because of this all the varchars become NullType.
Here is the corresponding sqlalchemy code why this happened -> https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/75db33620b87e732386b1b63986b67d058689f6c/sqlalchemy_redshift/dialect.py#L694

I changed the query back to the sqlalchemy default (https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/75db33620b87e732386b1b63986b67d058689f6c/sqlalchemy_redshift/dialect.py#L777) and I unioned the external tables to it as well.
This way the varchar sizes showed up properly and even column encoding and dist/sort keys get populated.

The only caveat was that the external column types are different from the postgres types, so I had to map those to one of the redshift types. -> https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html (see datatypes part)

Here is a comparison how the columns looked like with the old query:
Screenshot 2021-10-12 at 9 45 33

How it looks like with the new query:
Screenshot 2021-10-12 at 9 46 24

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@treff7es treff7es changed the title fix(ingest): redshift usage properly counting sqls fix(ingest): redshift source parse varchars with size Oct 12, 2021
@treff7es treff7es force-pushed the redshift_source_varchar_fix branch from 2f15947 to 2d5acbd Compare October 12, 2021 18:27
@treff7es treff7es closed this Oct 13, 2021
@treff7es treff7es reopened this Oct 13, 2021
@treff7es treff7es force-pushed the redshift_source_varchar_fix branch from 2d5acbd to 662dd09 Compare October 13, 2021 17:09
@treff7es treff7es closed this Oct 14, 2021
@treff7es treff7es reopened this Oct 14, 2021
@treff7es treff7es force-pushed the redshift_source_varchar_fix branch from 678ab9c to 89e1db9 Compare October 21, 2021 17:44
@treff7es treff7es changed the title fix(ingest): redshift source parse varchars with size fix(ingest): redshift source gets external table types properly Oct 21, 2021
@github-actions
Copy link

github-actions bot commented Oct 26, 2021

Unit Test Results

  39 files  ±0    39 suites  ±0   18m 1s ⏱️ +5s
314 tests ±0  314 ✔️ ±0    0 💤 ±0  0 ±0 
963 runs  ±0  941 ✔️ ±0  22 💤 ±0  0 ±0 

Results for commit da8af55. ± Comparison against base commit 561c04b.

♻️ This comment has been updated with latest results.

@treff7es treff7es force-pushed the redshift_source_varchar_fix branch from 871e9c1 to da8af55 Compare October 26, 2021 06:51
@shirshanka shirshanka added the hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ label Oct 27, 2021
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit cdaa957 into datahub-project:master Oct 27, 2021
@treff7es treff7es deleted the redshift_source_varchar_fix branch February 8, 2023 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants