Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(virtual dataset sync): Sync virtual dataset columns when changing the SQL query #30903

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

fisjac
Copy link
Contributor

@fisjac fisjac commented Nov 12, 2024

SUMMARY

When changing the SQL query for virtual datasets, the user first needs to save the virtual dataset, then needs to reopen the "edit dataset" modal and manually sync the columns. This PR will introduce changes to automatically sync the columns upon saving a virtual dataset with a new SQL query.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

https://www.loom.com/share/b9a1121ec44b47438e2675334403ee79

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro
Copy link
Member

This is a total pain, super happy to see this potentially get fixed soon! ❤️

@fisjac fisjac marked this pull request as ready for review November 22, 2024 21:24
@dosubot dosubot bot added the data:dataset Related to dataset configurations label Nov 22, 2024
@fisjac fisjac force-pushed the virtual-dataset-sync branch from 84e8a90 to 2041141 Compare November 25, 2024 15:53
@fisjac fisjac force-pushed the virtual-dataset-sync branch 2 times, most recently from fc3f83c to 3d82059 Compare November 27, 2024 06:04
catalog?: string;
schema?: string;
table_name?: string;
database?: Record<string, unknown>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we improve the typing here? We probably have a type for database defined somewhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fisjac +1 for this

extra: datasource.extra,
is_managed_externally: datasource.is_managed_externally,
external_url: datasource.external_url,
metrics: datasource?.metrics?.map((metric: Record<string, unknown>) => {
Copy link
Member

@kgabryje kgabryje Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
metrics: datasource?.metrics?.map((metric: Record<string, unknown>) => {
metrics: datasource?.metrics?.map((metric: Metric) => {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below comment regarding is_certified

}
return metricBody;
}),
columns: datasource?.columns?.map((column: Record<string, unknown>) => ({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
columns: datasource?.columns?.map((column: Record<string, unknown>) => ({
columns: datasource?.columns?.map((column: Column) => ({

(or ColumnMeta)

Copy link
Contributor Author

@fisjac fisjac Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some issues with doing this due to a part of the code that checks for an is_certified property on somewhat ambiguous Metric Column and Data items. I agree that these types should be tightened up, but I don't have enough context to start tweaking these in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fisjac should not be too hard to find out what the right type is? Maybe we need a partial type or to update the original type if a property appears to be missing. It defeats the purpose of using typescript if we opt for unknown types

@kgabryje
Copy link
Member

Small typing suggestions, other than that the code looks great 👍

@fisjac fisjac force-pushed the virtual-dataset-sync branch from b8228f8 to 47d5b3a Compare December 6, 2024 05:19
@dkrat7
Copy link
Contributor

dkrat7 commented Dec 6, 2024

Hi,
syncing columns without saving and reopening the editor is great improvement.

However, i have some concerns regarding autosyncing on save:

What will happen if I accidently make a change to a query which will result in zero rows returned - all columns including metadata will be permanently removed?

Sometimes I include Jinja code in a query with condition that dataset returns zero rows if some filter is not selected (for example) - I won't be able to save that query without losing columns?

@fisjac
Copy link
Contributor Author

fisjac commented Dec 6, 2024

@dkrat7 to my knowledge, columns are not dropped based on whether row level data are available for that column, but based upon whether the column is selected as part of the query. In your case, the columns will be dropped until the query is edited to include the desired columns.

Point being, if your query would include (or exclude) a column, then the virtual dataset will be synced to reflect the state of that query upon save.

@yousoph
Copy link
Member

yousoph commented Dec 6, 2024

/testenv up

@apache apache deleted a comment from github-actions bot Dec 10, 2024
@apache apache deleted a comment from github-actions bot Dec 10, 2024
@betodealmeida
Copy link
Member

What will happen if I accidently make a change to a query which will result in zero rows returned - all columns including metadata will be permanently removed?

It should be fine, since the cursor should still return a description of the columns even if the number of rows is 0. For example, for Postgres:

import psycopg2

query = "SELECT * FROM your_table WHERE 1=0;"  # Query guaranteed to return zero rows

db_config = {
    "dbname": "DB",
    "user": "user",
    "password": "password",
    "host": "host",
    "port": 5432,
}

with psycopg2.connect(**db_config) as conn:
    with conn.cursor() as cursor:
        cursor.execute(query)

        print("Cursor description:")
        for col in cursor.description:
            print(
                f"Name: {col.name}, Type: {col.type_code}, Display Size: {col.display_size}"
            )

For a test table this prints:

Cursor description:
Name: show_id, Type: 25, Display Size: None
Name: type, Type: 25, Display Size: None
Name: title, Type: 25, Display Size: None
Name: director, Type: 25, Display Size: None
Name: country, Type: 25, Display Size: None
Name: date_added, Type: 25, Display Size: None
Name: release_year, Type: 23, Display Size: None
Name: rating, Type: 25, Display Size: None
Name: duration, Type: 25, Display Size: None
Name: listed_in, Type: 25, Display Size: None
Name: description, Type: 25, Display Size: None
Name: iso_date_added, Type: 25, Display Size: None
Name: iso_release_year, Type: 25, Display Size: None
Name: xata_version, Type: 23, Display Size: None
Name: xata_createdat, Type: 1184, Display Size: None
Name: xata_updatedat, Type: 1184, Display Size: None
Name: xata_id, Type: 25, Display Size: None

Copy link
Member

@geido geido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fisjac. I am requesting changes as I am not sure we should delete the E2E tests. Also, some types need more love. Product feedback might also help to discuss the toasts. I feel, in general, this needs a bit more refinement.

// ***********************************************
// Tests for setting controls in the UI
// ***********************************************
import { interceptChart } from 'cypress/utils';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that we are deleting these E2E tests?

catalog?: string;
schema?: string;
table_name?: string;
database?: Record<string, unknown>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fisjac +1 for this

}
return metricBody;
}),
columns: datasource?.columns?.map((column: Record<string, unknown>) => ({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fisjac should not be too hard to find out what the right type is? Maybe we need a partial type or to update the original type if a property appears to be missing. It defeats the purpose of using typescript if we opt for unknown types

columnChanges.finalColumns.push(currentCol);
}
});
if (columnChanges.modified.length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this is an opportunity to improve the user experience here and just show a generic message. From your sample video it looks really bad and possibly useless for the user to see such big success toasts with a bunch of columns in them. cc @kasiazjc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed that showing each of the columns changed winds up being a bit of an overkill from a user standpoint. Should we just do away with the toasts altogether in favor of something like "columns updated"?

@@ -94,31 +100,44 @@ class TextAreaControl extends Component {
if (this.props.readOnly) {
style.backgroundColor = '#f2f2f2';
}
const codeEditor = (
<div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this div for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When trying to wrap a tooltip around the TextAreaEditor without a div, the tooltip doesn't land in the right place. Placing a div enables the tooltip to behave as intended.

maxLines={inModal ? 1000 : this.props.maxLines}
editorProps={{ $blockScrolling: true }}
const textArea = (
<div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this div, what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer as above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:dataset Related to dataset configurations packages size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants