Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Row.as_dict() #309

Merged
merged 5 commits into from
Nov 7, 2024
Merged

Deprecate Row.as_dict() #309

merged 5 commits into from
Nov 7, 2024

Conversation

asnare
Copy link
Contributor

@asnare asnare commented Oct 23, 2024

This PR deprecates the Row.as_dict() method: this class is intended to be interchangeable with Spark's Row, which doesn't support .as_dict() method. Having it present leads to subtle bugs where code works when the execution-based backends are being used but fails when the Spark-based backend is used instead. Clients should instead use .asDict().

To signal the deprecation we use Python's warnings mechanism (including the use of the new annotation in Python 3.13, to help with static code analysis).

The Spark-based Row does not support this, so clients should use .asDict() instead.
@asnare asnare self-assigned this Oct 23, 2024
Copy link

github-actions bot commented Oct 23, 2024

✅ 36/36 passed, 4 skipped, 21m5s total

Running from acceptance #442

Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not follow the reasoning for why there is a bug completely. Both the as_dict and asDict exists on the Row class here. asDict reuses as_dict currently. What does reverting that dependency resolve?

@asnare
Copy link
Contributor Author

asnare commented Oct 24, 2024

I do not follow the reasoning for why there is a bug completely. Both the as_dict and asDict exists on the Row class here. asDict reuses as_dict currently. What does reverting that dependency resolve?

The reason is that the Row implementation is supposed to emulate Spark's Row. The problem is with this code:

rows = sql_backend.fetch("...")
dicts = [row.as_dict() for row in rows]

This:

  • is type-correct for all non-abstract SQL backends: the return type is Iterator[Row] where Row is this implementation, not the Spark one.
  • will work in unit tests, because they typically use the non-Spark backends.
  • will work in most integration tests, because most code runs locally using the non-Spark backends.
  • will FAIL when run as part of a workflow on an actual cluster: the Spark backend returns its own Row and that does not have the as_dict() method.

Instead the code needs to be:

rows = sql_backend.fetch("...")
dicts = [row.asDict() for row in rows]

This will work everywhere.

As such I consider .as_dict() here to be dangerous and likely to cause subtle bugs (eg. databrickslabs/ucx#3046) that are very hard to catch. It's not safe to use (because the Spark backend doesn't support it) and hence this bug, deprecating it.

Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for clarifying!

@nfx nfx merged commit 9aace9e into main Nov 7, 2024
7 of 8 checks passed
@nfx nfx deleted the deprecate-row-as-dict branch November 7, 2024 13:38
nfx added a commit that referenced this pull request Nov 8, 2024
* Added `escape_name` function to escape individual SQL names and `escape_full_name` function to escape dot-separated full names ([#316](#316)). Two new functions, `escape_name` and `escape_full_name`, have been added to the `databricks.labs.lsql.escapes` module for escaping SQL names. The `escape_name` function takes a single name as an input and returns it enclosed in backticks, while `escape_full_name` handles dot-separated full names by escaping each individual component. These functions have been ported from the `databrickslabs/ucx` repository and are designed to provide a consistent way to escape names and full names in SQL statements, improving the robustness of the system by preventing issues caused by unescaped special characters in SQL names. The test suite includes various cases, including single names, full names with different combinations of escaped and unescaped components, and special characters, with a specific focus on the scenario where the column name contains a period.
* Bump actions/checkout from 4.2.0 to 4.2.1 ([#304](#304)). In this pull request, the `actions/checkout` dependency is updated from version 4.2.0 to 4.2.1 in the `.github/workflows/release.yml` file. This update includes a new feature where `refs/*` are checked out by commit if provided, falling back to the ref specified by the `@orhantoy` user. This change improves the flexibility of the action, allowing users to specify a commit or branch for checkout. The pull request also introduces a new contributor, `@Jcambass`, who added a workflow file for publishing releases to an immutable action package. The commits for this release include changes to prepare for the 4.2.1 release, add a workflow file for publishing releases, and check out other `refs/*` by commit if provided, falling back to ref. This pull request has been reviewed and approved by Dependabot.
* Bump actions/checkout from 4.2.1 to 4.2.2 ([#310](#310)). This is a pull request to update the `actions/checkout` dependency from version 4.2.1 to 4.2.2, which includes improvements to the `url-helper.ts` file that now utilize well-known environment variables and expanded unit test coverage for the `isGhes` function. The `actions/checkout` action is commonly used in GitHub Actions workflows for checking out a repository at a specific commit or branch. The changes in this update are internal to the `actions/checkout` action and should not affect the functionality of the project utilizing this action. The pull request also includes details on the commits and compatibility score for the upgrade, and reviewers can manage and merge the request using Dependabot commands once the changes have been verified.
* Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 ([#307](#307)). In this release, the `databrickslabs/sandbox` dependency has been updated from version `acceptance/v0.3.0` to `0.3.1`. This update includes previously tagged commits, bug fixes for git-related libraries, and resolution of the `unsupported protocol scheme` error. The README has been updated with more information on using the `databricks labs sandbox` command, and installation instructions have been improved. Additionally, there have been dependency updates for `go-git` libraries and `golang.org/x/crypto` in the `/go-libs` and `/runtime-packages` directories. New commits in this release allow larger logs from acceptance tests and implement experimental OIDC refresh functionality. Ignore conditions have been applied to prevent conflicts with previous versions of the dependency. This update is recommended for users who want to take advantage of the latest bug fixes and improvements.
* Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 ([#315](#315)). In this release, the `databrickslabs/sandbox` dependency has been updated from version `acceptance/v0.3.1` to `0.4.2`. This update includes bug fixes, dependency updates, and additional go-git libraries. Specifically, the `Run integration tests` job in the GitHub Actions workflow has been updated to use the new version of the `databrickslabs/sandbox/acceptance` Docker image. The updated version also includes install instructions, usage instructions in the README, and a modification to provide more git-related libraries. Additionally, there were several updates to dependencies, including `golang.org/x/crypto` version `0.16.0` to `0.17.0`. Dependabot, a tool that manages dependencies in GitHub projects, is responsible for the update and provides instructions for resolving any conflicts or merging the changes into the project. This update is intended to improve the functionality and reliability of the `databrickslabs/sandbox` dependency.
* Deprecate `Row.as_dict()` ([#309](#309)). In this release, we are introducing a deprecation warning for the `as_dict()` method in the `Row` class, which will be removed in favor of the `asDict()` method. This change aims to maintain consistency with Spark's `Row` behavior and prevent subtle bugs when switching between different backends. The deprecation warning will be implemented using Python's warnings mechanism, including the new annotation in Python 3.13 for static code analysis. The existing functionality of fetching values from the database through `StatementExecutionExt` remains unchanged. We recommend that clients update their code to use `.asDict()` instead of `.as_dict()` to avoid any disruptions. A new test case `test_row_as_dict_deprecated()` has been added to verify the deprecation warning for `Row.as_dict()`.
* Minor improvements for `.save_table(mode="overwrite")` ([#298](#298)). In this release, the `.save_table()` method has been improved, particularly when using the `overwrite` mode. If no rows are supplied, the table will now be truncated, ensuring consistency with the mock backend behavior. This change has been optimized for SQL-based backends, which now perform truncation as part of the insert for the first batch. Type hints on the abstract method have been updated to match the concrete implementations. Unit tests and integration tests have been updated to cover the new functionality, and new methods have been added to test the truncation behavior in overwrite mode. These improvements enhance the consistency and efficiency of the `.save_table()` method when using `overwrite` mode across different backends.
* Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 ([#305](#305)). In this release, we have updated the requirement for the `databrickslabs/sandbox` package to version `acceptance/v0.3.0` in the `downstreams.yml` file. This update is necessary to use the latest version of the package, which includes several bug fixes and dependency updates. The `databrickslabs/sandbox` package is used in the acceptance tests, which are run as part of the CI/CD pipeline. It provides a set of tools and utilities for developing and testing code in a sandbox environment. The changelog for this version includes the addition of install instructions, more git-related libraries, and the modification of the README to include information about how to use it with the `databricks labs sandbox` command. Specifically, the version of the `databrickslabs/sandbox` package used in the `acceptance` job has been updated from `acceptance/v0.1.4` to `acceptance/v0.3.0`, allowing the integration tests to be run using the latest version of the package. The ignore conditions for this PR ensure that Dependabot will resolve any conflicts that may arise and can be manually triggered with the `@dependabot rebase` command.

Dependency updates:

 * Bump actions/checkout from 4.2.0 to 4.2.1 ([#304](#304)).
 * Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 ([#305](#305)).
 * Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 ([#307](#307)).
 * Bump actions/checkout from 4.2.1 to 4.2.2 ([#310](#310)).
 * Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 ([#315](#315)).
@nfx nfx mentioned this pull request Nov 8, 2024
nfx added a commit that referenced this pull request Nov 8, 2024
* Added `escape_name` function to escape individual SQL names and
`escape_full_name` function to escape dot-separated full names
([#316](#316)). Two new
functions, `escape_name` and `escape_full_name`, have been added to the
`databricks.labs.lsql.escapes` module for escaping SQL names. The
`escape_name` function takes a single name as an input and returns it
enclosed in backticks, while `escape_full_name` handles dot-separated
full names by escaping each individual component. These functions have
been ported from the `databrickslabs/ucx` repository and are designed to
provide a consistent way to escape names and full names in SQL
statements, improving the robustness of the system by preventing issues
caused by unescaped special characters in SQL names. The test suite
includes various cases, including single names, full names with
different combinations of escaped and unescaped components, and special
characters, with a specific focus on the scenario where the column name
contains a period.
* Bump actions/checkout from 4.2.0 to 4.2.1
([#304](#304)). In this
pull request, the `actions/checkout` dependency is updated from version
4.2.0 to 4.2.1 in the `.github/workflows/release.yml` file. This update
includes a new feature where `refs/*` are checked out by commit if
provided, falling back to the ref specified by the `@orhantoy` user.
This change improves the flexibility of the action, allowing users to
specify a commit or branch for checkout. The pull request also
introduces a new contributor, `@Jcambass`, who added a workflow file for
publishing releases to an immutable action package. The commits for this
release include changes to prepare for the 4.2.1 release, add a workflow
file for publishing releases, and check out other `refs/*` by commit if
provided, falling back to ref. This pull request has been reviewed and
approved by Dependabot.
* Bump actions/checkout from 4.2.1 to 4.2.2
([#310](#310)). This is a
pull request to update the `actions/checkout` dependency from version
4.2.1 to 4.2.2, which includes improvements to the `url-helper.ts` file
that now utilize well-known environment variables and expanded unit test
coverage for the `isGhes` function. The `actions/checkout` action is
commonly used in GitHub Actions workflows for checking out a repository
at a specific commit or branch. The changes in this update are internal
to the `actions/checkout` action and should not affect the functionality
of the project utilizing this action. The pull request also includes
details on the commits and compatibility score for the upgrade, and
reviewers can manage and merge the request using Dependabot commands
once the changes have been verified.
* Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1
([#307](#307)). In this
release, the `databrickslabs/sandbox` dependency has been updated from
version `acceptance/v0.3.0` to `0.3.1`. This update includes previously
tagged commits, bug fixes for git-related libraries, and resolution of
the `unsupported protocol scheme` error. The README has been updated
with more information on using the `databricks labs sandbox` command,
and installation instructions have been improved. Additionally, there
have been dependency updates for `go-git` libraries and
`golang.org/x/crypto` in the `/go-libs` and `/runtime-packages`
directories. New commits in this release allow larger logs from
acceptance tests and implement experimental OIDC refresh functionality.
Ignore conditions have been applied to prevent conflicts with previous
versions of the dependency. This update is recommended for users who
want to take advantage of the latest bug fixes and improvements.
* Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2
([#315](#315)). In this
release, the `databrickslabs/sandbox` dependency has been updated from
version `acceptance/v0.3.1` to `0.4.2`. This update includes bug fixes,
dependency updates, and additional go-git libraries. Specifically, the
`Run integration tests` job in the GitHub Actions workflow has been
updated to use the new version of the
`databrickslabs/sandbox/acceptance` Docker image. The updated version
also includes install instructions, usage instructions in the README,
and a modification to provide more git-related libraries. Additionally,
there were several updates to dependencies, including
`golang.org/x/crypto` version `0.16.0` to `0.17.0`. Dependabot, a tool
that manages dependencies in GitHub projects, is responsible for the
update and provides instructions for resolving any conflicts or merging
the changes into the project. This update is intended to improve the
functionality and reliability of the `databrickslabs/sandbox`
dependency.
* Deprecate `Row.as_dict()`
([#309](#309)). In this
release, we are introducing a deprecation warning for the `as_dict()`
method in the `Row` class, which will be removed in favor of the
`asDict()` method. This change aims to maintain consistency with Spark's
`Row` behavior and prevent subtle bugs when switching between different
backends. The deprecation warning will be implemented using Python's
warnings mechanism, including the new annotation in Python 3.13 for
static code analysis. The existing functionality of fetching values from
the database through `StatementExecutionExt` remains unchanged. We
recommend that clients update their code to use `.asDict()` instead of
`.as_dict()` to avoid any disruptions. A new test case
`test_row_as_dict_deprecated()` has been added to verify the deprecation
warning for `Row.as_dict()`.
* Minor improvements for `.save_table(mode="overwrite")`
([#298](#298)). In this
release, the `.save_table()` method has been improved, particularly when
using the `overwrite` mode. If no rows are supplied, the table will now
be truncated, ensuring consistency with the mock backend behavior. This
change has been optimized for SQL-based backends, which now perform
truncation as part of the insert for the first batch. Type hints on the
abstract method have been updated to match the concrete implementations.
Unit tests and integration tests have been updated to cover the new
functionality, and new methods have been added to test the truncation
behavior in overwrite mode. These improvements enhance the consistency
and efficiency of the `.save_table()` method when using `overwrite` mode
across different backends.
* Updated databrickslabs/sandbox requirement to acceptance/v0.3.0
([#305](#305)). In this
release, we have updated the requirement for the
`databrickslabs/sandbox` package to version `acceptance/v0.3.0` in the
`downstreams.yml` file. This update is necessary to use the latest
version of the package, which includes several bug fixes and dependency
updates. The `databrickslabs/sandbox` package is used in the acceptance
tests, which are run as part of the CI/CD pipeline. It provides a set of
tools and utilities for developing and testing code in a sandbox
environment. The changelog for this version includes the addition of
install instructions, more git-related libraries, and the modification
of the README to include information about how to use it with the
`databricks labs sandbox` command. Specifically, the version of the
`databrickslabs/sandbox` package used in the `acceptance` job has been
updated from `acceptance/v0.1.4` to `acceptance/v0.3.0`, allowing the
integration tests to be run using the latest version of the package. The
ignore conditions for this PR ensure that Dependabot will resolve any
conflicts that may arise and can be manually triggered with the
`@dependabot rebase` command.

Dependency updates:

* Bump actions/checkout from 4.2.0 to 4.2.1
([#304](#304)).
* Updated databrickslabs/sandbox requirement to acceptance/v0.3.0
([#305](#305)).
* Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1
([#307](#307)).
* Bump actions/checkout from 4.2.1 to 4.2.2
([#310](#310)).
* Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2
([#315](#315)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants