-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data projection with views #6181
Comments
@peternied Thanks for the proposal. Could you elaborate more on the use-case? I don't related the connection between rotated indexes and the materialized views. Can you elaborate more about the security feature use-cases? |
@peternied Reading through your description, it does look like an access control /security use-case. You can help us understand the permission better |
Attended the Search Relevance - Triage & Backlog Review Triage meeting today and had an opportunity to bring up this issue - thanks @macohen
I'll (@peternied) will continue to iterate on this issue as I get more information. Thanks! |
I really like this idea and I think it can combine with search pipelines to open some really exciting possibilities. I'm imagining a scenario where:
|
|
Wanted to provide an update before the holidays arrived - I've got a functional POC in OpenSearch [1] and the Security Plugin [2] alongside a breakdown design and implement for an experimental release [3]. After returning from the break we will see about a demo and more feedback. |
sequenceDiagram
participant Client
participant HTTP_Request as ActionHandler
participant Cluster_Metadata as Cluster Metadata Store
participant Data_Store as Indices
Client->>HTTP_Request: View List/Get/Update/Create/Delete<BR>/views or /views/{view_id}
HTTP_Request->>Cluster_Metadata: Query Views
alt Update/Create/Delete
Cluster_Metadata->>Cluster_Metadata: Refresh Cluster
end
Cluster_Metadata-->>HTTP_Request: Return
HTTP_Request-->>Client: Return
Client->>HTTP_Request: Search View<br>/views/{view_id}/search
HTTP_Request->>Cluster_Metadata: Query Views
Cluster_Metadata-->>HTTP_Request: Return
HTTP_Request->>HTTP_Request: Rewrite Search Request
HTTP_Request->>HTTP_Request: Validate Search Request
HTTP_Request->>Data_Store: Search indices
Data_Store-->>HTTP_Request: Return
HTTP_Request-->>Client: Return
|
@peternied - Thank you for proposing this idea. While some of the aspects around access control / security make sense to me, I am unable to reason other benefits of views compared to alias/index-pattern for Opensearch. Does it make more sense to extend alias for this purpose instead of introducing another concept
SQL literally joins multiple indices for querying related information across multiple indices using the single virtual index. I don't see any such correlation across indices or shards in opensearch. The only join supported by Opensearch is using parent/child relation which is limited to single shard, not even index.
The views in SQL are nothing but named query and schema-on-write makes easily translates the queries on views into bigger query on original datasets. I am unable to understand how are we planning to aggregate the information across potentially unrelated indices.
We should be able to achieve this using alias!?
Can you expand more on this? The above diagram gives some idea about the request/response flow for CRUD API, but I am really interested in how we are planning to compose the result together from potentially completely different indices, without tying them together to specific schema. |
This is where the alias fits perfectly. We can restrict the permissions to an alias for specific users/groups without worrying about the underlying indices the alias is getting mapped to.
Can you please expand this to help me understand the viability only with a new access model compared to alias or index patterns. Also even if there are minor limitations with using alias, we should be looking to augment that instead of introducing completely new concept called "views".
I believe you're confusing unions with joins, especially if you consider common log analytics use case of Opensearch. If I am looking for monthly aggregation of 4xx/5xx http status codes within log* index, it is nothing but unioning the results from different indices. Whereas SQL joins are used to run operations on related albeit very different data sets. |
Yup - I did! I'll correct that in the previous comment |
@peternied -- This is a good point. Can we manage permissions on aliases? I feel like there was some other reason why aliases are not a good fit, but I'm struggling to remember. |
@msfroh maybe this sparks something; in the Security Plugin - aliases don't have permissions concepts around them. When you use an alias, or an index pattern So a user could run a query In my mind, there is an existing conceptual model that are users are aware of, I can see the argument that an opt-in model is not a feature, but a bug. There are other manageability issues and historical features that we might not want to support, but I think those concerns can be built up and mitigated. |
IMO, that is the correct behavior. Even in SQL world, Permissions need to be granted to the person executing the query for every object referenced by the view. Except if the referenced object is owned by the view owner. In which case, the authorization decision is made using ownership chains. Should we introduce the concept of ownership to views and indices in OpenSearch? |
:+1, the views are "moving targets" and not designated users may gain unexpected permissions
I think this may not be applicable to OpenSearch at large, it may change int the future but the identity is optional. And it still opens up the hole in a system since the view could be created with |
I have been thinking of security as new feature for alias, instead of changing existing model.
I am wondering if there should be explicit |
Does that mean views can run into similar scenario as alias if the view -> index mapping changed? |
The alias model does not bundle permissions - the individual indices behind the alias are checked |
IIRC, the specific point of a view in the context of this issue is that the permissions are on the view. If the view is updated to point to a different index (or indices), then, yes, the user would be able to query that different index (through the view). |
I believe view could use index patterns, right? If yes - no updates are needed |
I am wondering, how are we enforcing permissions on the index in this case. Can this result in some escalation of privilege? User might have been explicitly denied for index A, but might have access to view pointing to index A. IAM resolves this by having pass role permission, I guess |
@msfroh @reta @jainankitk This is really good discussion. I've created an RFC [1] to discuss the problem space - I think that will lead to better alignment before I jump into low level implementation details. |
Thanks @peternied for getting this started. Can we also add below question to the above or separate issue? _I am wondering if there should be explicit views qualifier while querying them. Probably, the end user should be agnostic of whether they are querying view or an index? |
Until we are aligned on the problem being solved I don't think we can reason over this implementation detail, lets circle back around to this one. I think doing a broader API/name review will be required and this topic will come up during those discussions. |
Is your feature request related to a problem? Please describe.
Sometimes there are clear relationships between indices, e.g. http-logs-2023-01-20 http-logs-2023-01-21. As data gets reshaped or physically moved there is a desire to preserve how the data is referenced. OpenSearch Dashboards has a feature around this called index patterns that doesn't exist in the backend.
If there was a way to create a logical grouping of these physical storage mediums the responsibilities between data usage and ingestion could be separated. I think this would be a big win for lower maintenance of OpenSearch clusters over time.
Describe the solution you'd like
In SQL there are tables and views, views offer flexibility and centralized management, see great answers on this stack overflow question What is a good reason to use SQL views? Pulling from the great answer by user210748 I'd suggest this system does the following:
Describe alternatives you've considered
Aliases
OpenSearch already has aliases that represent a virtualized view, maybe they could be built up to offer these additional features. Alternatively, there are some quirks like the is_write_index that we might want to be careful around.
Data streams
Data streams are a virtualized view focused on managing the physical storage, maybe they could be built up to handle data projections filtering.
Additional context
Coming from the security plugin, there are features for document level security (DLS), field level security, and field masking. These features are built into index permissions and they are kind of clunky where a query to apply DLS has to be double-encoded in the json body. Views could easily encompass these scenarios. Modeling view creation and management as a separately from managing permissions to the views is a cleaner separation compared to what is available in the security plugin.
The text was updated successfully, but these errors were encountered: