Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(embedded): adding logic to check dataset used by filters #24808

Merged
merged 8 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions superset/security/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
# under the License.
# pylint: disable=too-many-lines
"""A set of constants and methods to manage permissions and security"""
import json
import logging
import re
import time
Expand Down Expand Up @@ -2062,8 +2063,28 @@ def can_access_based_on_dashboard(datasource: "BaseDatasource") -> bool:
.filter(Dashboard.id.in_(dashboard_ids))
)

exists = db.session.query(query.exists()).scalar()
return exists
if db.session.query(query.exists()).scalar():
return True

# check for datasets that are only used by filters
Copy link
Member

@john-bodley john-bodley Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic makes sense. The one thing I struggle with regarding this method (for both the existing and proposed logic) is how is agnostic of the specific dashboard in question and thus iterates over all dashboards said user has access to. This raises two questions i) correctness, and ii) efficiency.

Currently I can't formulate a situation where (i) is a problem, however for (ii) this method seems highly inefficient, e.g. we loop over all the dashboards a user has access to in relation to said dataset/datasource, whereas in actuality we likely know the context a priori.

Note in #24789 this method is slated for removal, but the addition of the integration test will ensure that the logic will be preserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley I totally agree that this approach is far from ideal from a performance point of view. I think with the bigger changes that are going to be discussed, we can reformulate this process and make sure this validation happens with a target dashboard ID in mind. I saw in #24804 that you have updated the raise_for_access function so it can also receive a dashboard, so I think it would be easier to implement this improvement once those changes (and other decisions made in regards to expected behavior) are made.

Personally this is my second contribution to Superset so I would rather avoid doing bigger estructural changes until I get more familiar with the code as a whole.

dashboards_json = (
db.session.query(Dashboard.json_metadata)
.filter(Dashboard.id.in_(dashboard_ids))
.all()
)
for json_ in dashboards_json:
try:
json_metadata = json.loads(json_.json_metadata)
for filter_ in json_metadata.get("native_filter_configuration", []):
filter_dataset_ids = [
target.get("datasetId") for target in filter_.get("targets", [])
]
if datasource.id in filter_dataset_ids:
return True
except ValueError:
pass

return False

@staticmethod
def _get_current_epoch_time() -> float:
Expand Down
46 changes: 46 additions & 0 deletions tests/integration_tests/security/guest_token_security_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,21 @@
# specific language governing permissions and limitations
# under the License.
"""Unit tests for Superset"""
import json
from unittest import mock

import pytest
from flask import g

from superset import db, security_manager
from superset.connectors.sqla.models import SqlaTable
from superset.daos.dashboard import EmbeddedDashboardDAO
from superset.dashboards.commands.exceptions import DashboardAccessDeniedError
from superset.exceptions import SupersetSecurityException
from superset.models.dashboard import Dashboard
from superset.security.guest_token import GuestTokenResourceType
from superset.sql_parse import Table
from superset.utils.database import get_example_database
from tests.integration_tests.base_tests import SupersetTestCase
from tests.integration_tests.fixtures.birth_names_dashboard import (
load_birth_names_dashboard_with_slices,
Expand Down Expand Up @@ -233,3 +236,46 @@ def test_raise_for_dashboard_access_as_guest_no_rbac(self):

db.session.delete(dash)
db.session.commit()

def test_can_access_datasource_used_in_dashboard_filter(self):
"""
Test that a user can access a datasource used only by a filter in a dashboard
they have access to.
"""
# Create a test dataset
test_dataset = SqlaTable(
database_id=get_example_database().id,
schema="main",
table_name="test_table_embedded_filter",
)
db.session.add(test_dataset)
db.session.commit()

# Create an embedabble dashboard with a filter powered by the test dataset
test_dashboard = Dashboard()
test_dashboard.dashboard_title = "Test Embedded Dashboard"
test_dashboard.json_metadata = json.dumps(
{
"native_filter_configuration": [
{"targets": [{"datasetId": test_dataset.id}]}
]
}
)
test_dashboard.owners = []
test_dashboard.slices = []
test_dashboard.published = False
db.session.add(test_dashboard)
db.session.commit()
self.embedded = EmbeddedDashboardDAO.upsert(test_dashboard, [])

# grant access to the dashboad
g.user = self.authorized_guest
g.user.resources = [{"type": "dashboard", "id": str(self.embedded.uuid)}]
g.user.roles = [security_manager.get_public_role()]

# The user should have access to the datasource via the dashboard
security_manager.raise_for_access(datasource=test_dataset)

db.session.delete(test_dashboard)
db.session.delete(test_dataset)
db.session.commit()