Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Address regression introduced in #24789 #25008

Merged
merged 1 commit into from
Aug 18, 2023

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Aug 16, 2023

SUMMARY

In #24996, @jfrag1 identified an issue with #24789 as there were no guarantees that form-data provided was backed by a trusted source.

Per #24996 (comment), I looked into seeing whether the access request could be more context aware, though regrettably the caller is of type QueryContext which is dashboard agnostic. The proposed fix, per @nytai's suggestion, was to add an additional check to verify that the dashboard chart (per the dashboardId and slice_id form-data fields) is associated with the datasource in question.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Added unit tests.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Aug 16, 2023
@john-bodley john-bodley marked this pull request as ready for review August 16, 2023 19:53
form_data
and (dashboard_id := form_data.get("dashboardId"))
and (dashboard := DashboardDAO.find_by_id(dashboard_id))
and any(slc.datasource == datasource for slc in dashboard.slices)
and self.can_access_dashboard(dashboard)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that occurred to me recently, shouldn't we only be doing this check if dashboard rbac is enabled (or if the user is a guest user for embedded purposes).

For example before #24789 there was the following logic:

should_check_dashboard_access = (
    feature_flag_manager.is_feature_enabled("DASHBOARD_RBAC")
    or self.is_guest_user()
)

if not (
    self.can_access_schema(datasource)
    or self.can_access("datasource_access", datasource.perm or "")
    or self.is_owner(datasource)
    or (
         should_check_dashboard_access
         and self.can_access_based_on_dashboard(datasource)

but now, we're checking if if the user can access the dataset via the dashboard no matter what

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfrag1 I updated the PR description.

Copy link
Member

@jfrag1 jfrag1 Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that we're applying security rules intended for DASHBOARD_RBAC even when the flag isn't enabled. With the flag disabled, a user has access to any published dashboard containing a chart they have access to. With this logic, the user would then automatically get access to all other charts/datasets on the dashboard (albeit only with the dashboard context included), which I don't think should happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to bring back should_check_dashboard_access

Copy link
Member Author

@john-bodley john-bodley Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfrag1 the self.can_access_dashboard(dashboard) check is still being invoked and will always raise if a user doesn't have access to the dashboard regardless of what the context is.

The other formulation I can think of is to use the referrer to identify the dashboard in question, however I guess per here one can also spoof the referrer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, here's a more concrete example to illustrate what I'm trying to get at:

DASHBOARD_RBAC is disabled. There's a published dashboard with 2 charts, powered by datasets A and B. A user has access to dataset A, but not dataset B. The user has access to the dashboard because they have access to dataset A. The question I'd then like to pose is, what is the expected behavior when the user goes to view the dashboard?

My understanding is that the expected behavior is that the user can view the chart powered by dataset A, but the chart powered by dataset B shows an error because they don't have access to the dataset.

However, after #24789 removed the should_check_dashboard_access check, the user would be able to view both charts on the dashboard. The user's access to dataset A grants them access to the dashboard, which then grants them access to all other charts on the dashboard, which I'd argue is another regression from #24789

Copy link
Member Author

@john-bodley john-bodley Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfrag1 I think the problem we're all struggling with is what the expected behavior should be. Superset's security manager was already complex and difficult to grok—which doesn't bode well from an understanding perspective—which we made worse with the introduction of the dashboard RBAC feature and embedded dashboards (which potentially have access patterns which conflict with the core manager). As the rules become more complex and/or conflicting we end up playing the game of Whac-A-Mole—where closing one attack vector likely opens another.

Fundamentally I think this—per your problem statement—comes down to "context". Let's say your two charts are A' and B' which are powered by datasets A and B respectively. With dashboard RBAC enabled the user can view chart A' in the "context" of the dashboard, but not in standalone/explorer mode. The challenge is the request is coming from the ChartDataCommand command which is dashboard agnostic.

I here the point you're making. There's an issue where the access to a dashboard is circumventing the need to check whether the user has access to the datasource. So in addition to checking whether the user can access the dashboard we also need to check (if the check is at the datasource, query-context, or viz level) whether the dashboard access check is specific to RBAC, i.e.,

and self.can_access_dashboard(dashboard)

should likely be,

and self.can_access_dashboard(dashboard) 
and is_feature_enabled("DASHBOARD_RBAC")
and dashboard.roles

Again we're adding complex logic to try to model a complex security manager which likely exposes more corner cases/issues.

Copy link
Member

@nytai nytai Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few scenarios we need to enumerate and document, this is what makes the most sense to me.

  • DASHBOARDS_RBAC disabled: superset should behave the same way it did before this feature was introduced and permissions are applied at the dataset level via "datasource access on x"
  • DASHBOARD_RBAC enabled:
    • The dashboard has roles: If the user has rbac access via their role, all charts/fitlers/etc in the dashboard should render and the user has access to all of them.
    • The dashboard doesn't have any attached roles: the access perms should work the same way they do as if the feature is disabled: charts only render if the user has dataset access, users see the dashboard in their list if they have the "can read Dashboard" permission

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nytai I think that makes a lot of sense! Just a quick question, if DASHBOARD_RBAC is enabled and the dashboard has roles, the regular/default access validation should still work, right? For example:

  • If the FF is enabled and the dashboard has roles, both someone with role-based access and a user with regular access (granted via datasource access on x/schema access on x) should work.

Another thing to consider is if we still want to handle guest users (embedded) and DASHBOARD_RBAC access similarly. Guest users can't access explore, they are only granted access to dashboards during the token generation, and should be able to access the dashboard entirely (including filters and etc). On the other hand, DASHBOARD_RBAC users are authenticated users that can access explore/other parts of the app, so here the context is really important.

@@ -1858,9 +1858,13 @@ def raise_for_access(
or self.can_access("datasource_access", datasource.perm or "")
or self.is_owner(datasource)
or (
# Check whether the datasource is associated with a dashboard in a
# trustworthy manner, i.e., we need to validate that the specified
# dashboard is actually associated with said datasource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this logic completely, but if dashboardId can be spoofed in the form what's preventing users from reading any datasource?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's validation in the lines below that:
a. The user has access the the dashboard, and
b. The datasource being requested is present on the dashboard

@john-bodley
Copy link
Member Author

@betodealmeida and @jfrag1 the caller for this check is coming from here, which mentions,

Take a query context constructed in the client and return payload data response for the given query

where access is checked here. The challenge here is given that the query context is constructed in the client then isn't any self._query_context.raise_for_access() check is subject to being spoofed?

@john-bodley john-bodley force-pushed the john-bodley--fix-24789 branch from f2e6fb7 to c3e19d2 Compare August 17, 2023 17:31
@pull-request-size pull-request-size bot added size/M and removed size/S labels Aug 17, 2023
@john-bodley john-bodley force-pushed the john-bodley--fix-24789 branch from c3e19d2 to 84ffd94 Compare August 17, 2023 19:37
@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 17, 2023
and (dashboard := DashboardDAO.find_by_id(dashboard_id))
and self.can_access_dashboard(dashboard)
and (
dashboard_ := self.get_session.query(Dashboard)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DashboardDAO and ChartDAO have additional filtering and/or security checks when fetching objects and thus I had to revert to using the raw SQLAlchemy ORM for fetching the dashboard and chart associated with the specified IDs.

@john-bodley
Copy link
Member Author

john-bodley commented Aug 17, 2023

Thanks—drum roll please—to @betodealmeida, @eschutho, @jfrag1 , @michael-s-molina, @nytai , and @villebro for jumping on the call earlier to discuss an interim solution to this regression which should (🤞) allow us to roll forward.

I've updated the PR to extend the previous logic (stemming from @nytai's suggestion) I had drafted to ensure that:

  1. Per @villebro's suggestion we also leverage the chart information and verify that i) the chart is associated with the dashboard, and ii) the chart is associated with the said datasource. This is a tighter handshake than what we had previously, i.e., ensuring that the datasource was associated with said dashboard.
  2. Per @nytai's comment I added a check that made sure the dashboard fallback is only invoked when the DASHBOARD_RBAC feature is enabled and the dashboard has associated roles.

Note this PR does not address granting access for native filters when the DASHBOARD_RBAC feature is enabled. That issue seems orthogonal to the regression this PR is trying to address.

@john-bodley john-bodley force-pushed the john-bodley--fix-24789 branch from 84ffd94 to c465fc5 Compare August 17, 2023 19:49
Copy link
Member

@jfrag1 jfrag1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updated logic looks good to me - I think there's just one more loose end other than native filters, which is ensuring embedded guest users are granted access to datasources (for charts and native filters) with the proper dashboard context. I'm happy to work on this as a follow-up once this PR is merged

@mistercrunch mistercrunch added 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants