Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source S3: timestamp parquet data #6613

Merged
merged 3 commits into from
Oct 4, 2021

Conversation

antixar
Copy link
Contributor

@antixar antixar commented Oct 1, 2021

What

Describe what the change is solving
It helps to add screenshots if it affects the frontend.

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions
  • Connector version bumped like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

@CLAassistant
Copy link

CLAassistant commented Oct 1, 2021

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ antixar
❌ Maksym Pavlenok


Maksym Pavlenok seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions bot added the area/connectors Connector related issues label Oct 1, 2021
@antixar antixar self-assigned this Oct 1, 2021
@antixar
Copy link
Contributor Author

antixar commented Oct 1, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1295416949
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1295416949
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              200     94    53%
	 source_acceptance_test/tests/test_full_refresh.py       18     11    39%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  41     24    41%
	 source_acceptance_test/utils/compare.py                 47     20    57%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     111     11    90%
	 ------------------------------------------------------------------------
	 TOTAL                                                  856    416    51%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            83     59    29%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     19    42%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        660    393    40%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20      3    85%
	 source_s3/s3file.py                                                  49      3    94%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      2    95%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     20    72%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            63     46    27%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      0   100%
	 source_s3/source_files_abstract/stream.py                           183     11    94%
	 source_s3/stream.py                                                  43      3    93%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               612    128    79%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20     13    35%
	 source_s3/s3file.py                                                  49     26    47%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      0   100%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     19    73%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            63      5    92%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      3    81%
	 source_s3/source_files_abstract/stream.py                           183     94    49%
	 source_s3/stream.py                                                  43     31    28%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               612    231    62%

@jrhizor jrhizor temporarily deployed to more-secrets October 1, 2021 14:23 Inactive
@antixar antixar requested a review from Phlair October 1, 2021 14:47
@antixar antixar linked an issue Oct 1, 2021 that may be closed by this pull request
Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good technically, but maybe it could be slightly better structured to be more readable? Feels like this is creating another location where parquet types are being defined on top of the PARQUET_TYPES map. Could this all be unified to be a bit more clear + some comments/docstrings on this logic?

…es_abstract/formats/parquet_parser.py

Co-authored-by: George Claireaux <[email protected]>
@antixar antixar temporarily deployed to more-secrets October 1, 2021 15:31 Inactive
@antixar
Copy link
Contributor Author

antixar commented Oct 3, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1301286744
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1301286744
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              200     94    53%
	 source_acceptance_test/tests/test_full_refresh.py       18     11    39%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  41     24    41%
	 source_acceptance_test/utils/compare.py                 47     20    57%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     111     11    90%
	 ------------------------------------------------------------------------
	 TOTAL                                                  856    416    51%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            83     59    29%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     19    42%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        660    393    40%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20      3    85%
	 source_s3/s3file.py                                                  49      3    94%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      2    95%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     20    72%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            61     44    28%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      0   100%
	 source_s3/source_files_abstract/stream.py                           183     11    94%
	 source_s3/stream.py                                                  43      3    93%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               610    126    79%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20     13    35%
	 source_s3/s3file.py                                                  49     26    47%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      0   100%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     19    73%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%/actions-runner/_work/airbyte/airbyte/airbyte-integrations/connectors/source-s3/.venv/lib/python3.8/site-packages/coverage/data.py:118: CoverageWarning: Data file '/actions-runner/_work/airbyte/airbyte/airbyte-integrations/connectors/source-s3/.coverage.ip-10-0-38-89.7546.318456' doesn't seem to be a coverage data file: Couldn't use data file '/actions-runner/_work/airbyte/airbyte/airbyte-integrations/connectors/source-s3/.coverage.ip-10-0-38-89.7546.318456': no such table: coverage_schema
	 source_s3/source_files_abstract/formats/parquet_parser.py            61      3    95%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	   data._warn(str(exc))
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      3    81%
	 source_s3/source_files_abstract/stream.py                           183     94    49%
	 source_s3/stream.py                                                  43     31    28%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               610    229    62%

@antixar
Copy link
Contributor Author

antixar commented Oct 3, 2021

@Phlair , I've moved all type mappings (pyarrow, json) to a single place.

@antixar antixar temporarily deployed to more-secrets October 3, 2021 20:34 Inactive
@antixar antixar requested a review from Phlair October 3, 2021 20:34
@jrhizor jrhizor temporarily deployed to more-secrets October 3, 2021 20:35 Inactive
Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like it, nice one!

@antixar antixar merged commit b80f81e into master Oct 4, 2021
@antixar antixar deleted the antixar/6594-source-s3-parquet-timestamp branch October 4, 2021 11:04
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* fix datetime parquet data

* Update airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/formats/parquet_parser.py

Co-authored-by: George Claireaux <[email protected]>

* aggregate pyarrow types

Co-authored-by: Maksym Pavlenok <[email protected]>
Co-authored-by: George Claireaux <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source S3 is not able to read Parquet file
5 participants