Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: use correct check for column prop in column schema #1347

Merged
merged 4 commits into from
May 12, 2024

Conversation

z3z1ma
Copy link
Collaborator

@z3z1ma z3z1ma commented May 11, 2024

Description

In dlt <= 0.4.9
I would construct schemas like so:

def describe_to_columns(
    describe: Describe, normalizer: t.Callable[[str], str] = str
) -> t.List[TColumnSchema]:
    """Convert a Salesforce describe to a list of SCT columns."""
    compound = {f["compoundFieldName"] for f in describe["fields"] if f["compoundFieldName"]}
    compound -= {"Name"}
    _, rk = get_projection_and_rk(describe)
    return [
        {
            "name": normalizer(field["name"]),
            "unique": field["unique"],
            "hard_delete": field["name"].lower() == "isdeleted",  # type: ignore
            "primary_key": field["name"].lower() == "id",
            "dedup_sort": "desc" if field["name"] is rk else None,
            **SF_TO_DLT.get(field["type"], {"data_type": "text"}),
        }
        for field in describe["fields"]
        if field["name"] != "attributes" and field["name"] not in compound
    ]

And this would work fine. However in 0.4.10, there is an irrecoverable error that took down our pipelines leveraging this technique. This is due to the check for column props which would detect multiple dedup_sort hints in the following schema (and in all schemas) incorrectly. There is no dedup_sort hint unless there is a truthy value that is one of asc or desc

{
    "name": "business_value_assessment_c",
    "columns": {
        "id": {
            "name": "id",
            "unique": False,
            "hard_delete": False,
            "primary_key": True,
            "dedup_sort": None,
            "data_type": "text",
            "nullable": False,
        },
        "is_deleted": {
            "name": "is_deleted",
            "unique": False,
            "hard_delete": True,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "bool",
        },
        "name": {
            "name": "name",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "created_date": {
            "name": "created_date",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "timestamp",
        },
        "created_by_id": {
            "name": "created_by_id",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "last_modified_date": {
            "name": "last_modified_date",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "timestamp",
        },
        "last_modified_by_id": {
            "name": "last_modified_by_id",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "system_modstamp": {
            "name": "system_modstamp",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "timestamp",
        },
        "last_activity_date": {
            "name": "last_activity_date",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "date",
        },
        "last_viewed_date": {
            "name": "last_viewed_date",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "timestamp",
        },
        "last_referenced_date": {
            "name": "last_referenced_date",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "timestamp",
        },
        "account_c": {
            "name": "account_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "completed_date_c": {
            "name": "completed_date_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "date",
        },
        "deliverable_c": {
            "name": "deliverable_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "description_c": {
            "name": "description_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "engaged_date_c": {
            "name": "engaged_date_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "date",
        },
        "opportunity_c": {
            "name": "opportunity_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "owner_c": {
            "name": "owner_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "status_c": {
            "name": "status_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "closed_reason_c": {
            "name": "closed_reason_c",
            "unique": False,
            "hard_delete": False,
            "primary_key": False,
            "dedup_sort": None,
            "data_type": "text",
        },
        "_dlt_load_id": {
            "name": "_dlt_load_id",
            "data_type": "text",
            "nullable": False,
        },
        "_dlt_id": {
            "name": "_dlt_id",
            "data_type": "text",
            "nullable": False,
            "unique": True,
        },
    },
    "write_disposition": "merge",
    "resource": "Business_Value_Assessment__c",
    "x-normalizer": {"seen-data": True},
    "x-merge-strategy": "delete-insert",
}

I have fixed this check and verified my pipelines are working again.

Copy link

netlify bot commented May 11, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 27ec1a3
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66412aed52aca100081f909d

@rudolfix
Copy link
Collaborator

@z3z1ma thx for the fix, we'll soon rewrite the part that handles hints and merges columns and tables. ie. for all possible hints we'll add data type, default values, how they behave when columns are merged etc.. there are a few undefined behaviors still (mostly when merging columns).

now I've used existing helper function that checks is hint has default and added your dedup case to tests.

@rudolfix rudolfix force-pushed the fix/incorrect-cprop-check branch from 90fde3c to 27ec1a3 Compare May 12, 2024 20:47
@rudolfix rudolfix merged commit d26c5d0 into devel May 12, 2024
49 of 50 checks passed
@rudolfix rudolfix deleted the fix/incorrect-cprop-check branch May 12, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants