Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add decorator to guard public APIs #12635

Merged
merged 5 commits into from
Jan 22, 2021

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Jan 20, 2021

EDIT

Instead of the decorator approach described below I implemented the guard as a custom unit test.

SUMMARY

This PR adds a decorator called guard, used to protect public APIs from breaking changes without a major version bump (see #12566). If the API is changed the user receives a warning.

For example, we allow people to specify a custom function called SQL_QUERY_MUTATOR in their superset_config.py. This API is already broken, since the signature described in superset/config.py is out-of-sync with how it's actually called:

# superset/config.py
# A function that intercepts the SQL to be executed and can alter it.
# The use case is can be around adding some sort of comment header
# with information such as the username and worker node information
#
#    def SQL_QUERY_MUTATOR(sql, username, security_manager):
#        dttm = datetime.now().isoformat()
#        return f"-- [SQL LAB] {username} {dttm}\n{sql}"
SQL_QUERY_MUTATOR = None

If we look at where the function is called we can see that the signature should be slightly different:

# superset/sql_lab.py
SQL_QUERY_MUTATOR = config["SQL_QUERY_MUTATOR"]
...
if SQL_QUERY_MUTATOR:
        sql = SQL_QUERY_MUTATOR(sql, user_name, security_manager, database)

Notice the extra database argument, and the small difference between username and user_name. This was probably changed at some point without updating the config, breaking any existing SQL_QUERY_MUTATOR that don't expect the database argument!

To prevent this in the future, we create a dummy function and protect it:

@guard("W}V8JTUVzx4ur~{CEhT?")
def dummy_sql_query_mutator(
    sql: str,
    user_name: str,
    security_manager: SupersetSecurityManager,
    database: Database,
) -> str:
    """A no-op version of SQL_QUERY_MUTATOR"""
    return sql

We change the code a little bit so that the function is always called (either the custom or the dummy one):

SQL_QUERY_MUTATOR = config.get("SQL_QUERY_MUTATOR") or dummy_sql_query_mutator
...
sql = SQL_QUERY_MUTATOR(sql, user_name, security_manager, database)

Now what happens if we change the signature of the function? Let's try it:

@guard("W}V8JTUVzx4ur~{CEhT?")
def dummy_sql_query_mutator(
    sql: str,
    user_name: str,
    security_manager: SupersetSecurityManager,
    database: Database,
    foo: int,
) -> str:
    """A no-op version of SQL_QUERY_MUTATOR"""
    return sql

Changing the code like this will print the following warning:

/Users/beto/Projects/incubator-superset/superset/utils/decorators.py:125: UserWarning: The decorated object
`dummy_sql_query_mutator` (in /Users/beto/Projects/incubator-superset/superset/sql_lab.py line 54) has a
public interface which has currently been modified. This MUST only be released in a new major version of
Superset according to SIP-57. To remove this warning message update the hash in the `guard` decorator to
';>K{8wd)&8!=9?1qR@Ja'.

    @guard("W}V8JTUVzx4ur~{CEhT?")
    def dummy_sql_query_mutator(
        sql: str,
        user_name: str,
        security_manager: SupersetSecurityManager,
        database: Database,
        foo: int,
    ) -> str:
        """A no-op version of SQL_QUERY_MUTATOR"""
        return sql

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TEST PLAN

Changed signature and run unit tests, it printed the warning message above.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@ktmud
Copy link
Member

ktmud commented Jan 21, 2021

I know hashing is fast for most cases, but I'm wondering whether we can move this to be an offline job in some way?

I.e. instead of a decorator, store the hash for all public APIs in a separate file and run the checks in CI.

@betodealmeida
Copy link
Member Author

betodealmeida commented Jan 21, 2021

I know hashing is fast for most cases, but I'm wondering whether we can move this to be an offline job in some way?

I.e. instead of a decorator, store the hash for all public APIs in a separate file and run the checks in CI.

Good point... we could also just add unit tests that import the functions and compare their signature hashes to hard coded values, that might be a more efficient approach.

(Also, the hashes are computed just once when the files are "compiled", not when the functions are called.)

@codecov-io
Copy link

codecov-io commented Jan 21, 2021

Codecov Report

Merging #12635 (beef427) into master (e7def7e) will decrease coverage by 3.44%.
The diff coverage is 82.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12635      +/-   ##
==========================================
- Coverage   66.80%   63.35%   -3.45%     
==========================================
  Files        1017      487     -530     
  Lines       49730    30014   -19716     
  Branches     4864        0    -4864     
==========================================
- Hits        33221    19016   -14205     
+ Misses      16386    10998    -5388     
+ Partials      123        0     -123     
Flag Coverage Δ
cypress ?
javascript ?
python 63.35% <82.35%> (-0.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/config.py 90.61% <ø> (ø)
superset/utils/public_interfaces.py 78.57% <78.57%> (ø)
superset/sql_lab.py 80.18% <100.00%> (+0.74%) ⬆️
superset/db_engines/hive.py 0.00% <0.00%> (-85.72%) ⬇️
superset/sql_validators/postgres.py 50.00% <0.00%> (-50.00%) ⬇️
superset/db_engine_specs/hive.py 54.61% <0.00%> (-29.24%) ⬇️
superset/databases/commands/create.py 83.67% <0.00%> (-8.17%) ⬇️
superset/databases/commands/update.py 85.71% <0.00%> (-8.17%) ⬇️
superset/connectors/sqla/models.py 84.31% <0.00%> (-6.28%) ⬇️
superset/databases/commands/test_connection.py 84.78% <0.00%> (-4.35%) ⬇️
... and 549 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7def7e...beef427. Read the comment docs.

@betodealmeida
Copy link
Member Author

I reviewed superset/config.py today with @eschutho, and it looks like we need to guard a good number of objects:

  • STATS_LOGGER
  • EVENT_LOGGER
  • CUSTOM_SECURITY_MANAGER
  • SQLALCHEMY_CUSTOM_PASSWORD_STORE
  • GET_FEATURE_FLAGS_FUNC
  • LOGGING_CONFIGURATOR
  • SQLLAB_CTAS_SCHEMA_NAME_FUNC
  • CSV_TO_HIVE_UPLOAD_DIRECTORY_FUNC
  • ALLOWED_USER_CSV_SCHEMA_FUNC
  • FLASK_APP_MUTATOR
  • TRACKING_URL_TRANSFORMER
  • DB_CONNECTION_MUTATOR
  • SQL_QUERY_MUTATOR
  • WEBDRIVER_AUTH_FUNC
  • SQLA_TABLE_MUTATOR
  • QUERY_COST_FORMATTERS_BY_ENGINE

@betodealmeida betodealmeida requested a review from ktmud January 22, 2021 18:02
Comment on lines +32 to +36
@pytest.mark.parametrize("interface,expected_hash", list(hashes.items()))
def test_public_interfaces(interface, expected_hash):
"""Test that public interfaces have not been accidentally changed."""
current_hash = compute_hash(interface)
assert current_hash == expected_hash, get_warning_message(interface, current_hash)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmud this is what I had in mind; now we can just update hashes and each one will get its own unit test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Not that it matters much, but can pytest display which parameter failed the test in case of assertion errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, it treats each parameter as a separate test called unit_test_name[parameters]: https://docs.pytest.org/en/stable/parametrize.html#pytest-mark-parametrize

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clean up the decorator?

@betodealmeida
Copy link
Member Author

Should we clean up the decorator?

We should. I'm emotionally attached to it, but I'll remove it. :-P

@betodealmeida betodealmeida merged commit 4255c22 into apache:master Jan 22, 2021
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants