Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: table-key-properties in metadata extra does not populate schema key_properties for tap-bigquery #2660

Closed
1 task
ReubenFrankel opened this issue Sep 11, 2024 · 5 comments · Fixed by #2770
Assignees
Labels
Milestone

Comments

@ReubenFrankel
Copy link
Contributor

ReubenFrankel commented Sep 11, 2024

Singer SDK Version

0.39.1

Is this a regression?

  • Yes

Python Version

3.9

Bug scope

Taps (catalog, state, etc.)

Operating System

Ubuntu 22.04 LTS

Description

I've been trying to use table-key-properties to configure key properties for a BigQuery stream:

plugins:
  extractors:
  - name: tap-bigquery
    variant: meltanolabs
    pip_url: git+https://github.com/Matatika/tap-bigquery.git
    metadata:
      '*-events_*':
        table-key-properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id

In this case, key_properties is empty so when loading to a database target no primary keys are configured:

Generated catalog
{
  "streams": [
    {
      "tap_stream_id": "analytics_376142012-events_20240825",
      "table_name": "events_20240825",
      "replication_method": "",
      "key_properties": [],
      "schema": ...,
      "is_view": false,
      "stream": "analytics_376142012-events_20240825",
      "metadata": [
        ...,
        {
          "breadcrumb": [],
          "metadata": {
            "inclusion": "available",
            "table-key-properties": [
              "event_timestamp",
              "event_name",
              "event_bundle_sequence_id"
            ],
            "forced-replication-method": "",
            "schema-name": "analytics_376142012",
            "selected": true
          }
        },
        ...
      ],
      "selected": true,
      "table_key_properties": [
        "event_timestamp",
        "event_name",
        "event_bundle_sequence_id"
      ]
    }
  ]
}

Primary keys are configured when using key-properties/key_properties (not documented here):

    metadata:
      '*-events_*':
        key-properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id
Generated catalog
{
  "streams": [
    {
      "tap_stream_id": "analytics_376142012-events_20240825",
      "table_name": "events_20240825",
      "replication_method": "",
      "key_properties": [
        "event_timestamp",
        "event_name",
        "event_bundle_sequence_id"
      ],
      "schema": ...,
      "is_view": false,
      "stream": "analytics_376142012-events_20240825",
      "metadata": [
        ...,
        {
          "breadcrumb": [],
          "metadata": {
            "inclusion": "available",
            "table-key-properties": [],
            "forced-replication-method": "",
            "schema-name": "analytics_376142012",
            "selected": true,
            "key-properties": [
              "event_timestamp",
              "event_name",
              "event_bundle_sequence_id"
            ]
          }
        },
        ...
      ],
      "selected": true
    }
  ]
}
    metadata:
      '*-events_*':
        key_properties:
        - event_timestamp
        - event_name
        - event_bundle_sequence_id
Generated catalog
{
  "streams": [
    {
      "tap_stream_id": "analytics_376142012-events_20240825",
      "table_name": "events_20240825",
      "replication_method": "",
      "key_properties": [
        "event_timestamp",
        "event_name",
        "event_bundle_sequence_id"
      ],
      "schema": ...,
      "is_view": false,
      "stream": "analytics_376142012-events_20240825",
      "metadata": [
        ...,
        {
          "breadcrumb": [],
          "metadata": {
            "inclusion": "available",
            "table-key-properties": [],
            "forced-replication-method": "",
            "schema-name": "analytics_376142012",
            "selected": true,
            "key_properties": [
              "event_timestamp",
              "event_name",
              "event_bundle_sequence_id"
            ]
          }
        },
        ...
      ],
      "selected": true
    }
  ]
}

Why does this work? What is the correct way to do this? I assume table-key-properties is the intended approach, so I don't know if I'm going to run into some undefined behaviour down the line.


Related:

Code

No response

Link to Slack/Linen

https://meltano.slack.com/archives/C069CQNHDNF/p1725976166433339

@ReubenFrankel
Copy link
Contributor Author

Worth noting that you can also set key properties via stream maps:

    config:
      stream_maps:
        '*-events_*':
          __key_properties__:
          - event_timestamp
          - event_name
          - event_bundle_sequence_id

@steven-luabase
Copy link

steven-luabase commented Nov 18, 2024

Running into the same issue with tap-postgres

Using table-key-properties does not seem to work

plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    metadata:
      '*-events_*':
        table-key-properties:
        - _id

whereas using just key-properties does work:

plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    metadata:
      '*-events_*':
        key-properties:
        - _id

@steven-luabase
Copy link

Also worth noting that for other taps, like tap-mssql table-key-properties does work whereas using just key-properties does not.

Maybe because it's using a different version of the sdk?

@edgarrmondragon
Copy link
Collaborator

Also worth noting that for other taps, like tap-mssql table-key-properties does work whereas using just key-properties does not.

Maybe because it's using a different version of the sdk?

Rather, that tap-mssql is not based on the Singer SDK to that would explain why this bug doesn't show there.

@edgarrmondragon
Copy link
Collaborator

I've started #2770 to address this. I'm thinking we'll want to deprecate the non-standard fields eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants