Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Enable OpenSearch Dashboards to support multiple OpenSearch clusters #1388

Closed
zengyan-amazon opened this issue Mar 25, 2022 · 11 comments
Closed
Assignees
Labels
discuss multiple datasource multiple datasource project RFC Substantial changes or new features that require community input to garner consensus. v2.4.0 'Issues and PRs related to version v2.4.0'

Comments

@zengyan-amazon
Copy link
Member

Problem Statement

OpenSearch Dashboards (OSD for short) was design and implemented to work with one single OpenSearch cluster. Dashboards users need to navigate between Dashboards endpoints to visualize their data if they have multiple OpenSearch clusters. This experience is not user friendly and also added overheads as users need to maintain multiple OpenSearch Dashboards instances, one for each OpenSearch cluster.

We expect to provide the experience for OpenSearch Dashboards users to have one single Dashboards that can visualize data in different OpenSearch clusters. An OpenSearch that saves raw data for analysis is a data source.

The proposal here is to enable OpenSearch Dashboards to have the capability allow users to dynamically manage their data sources. Then users can build visualization and dashboards against data in those data sources, and put those visualizations into single dashboard.

Proposed Solution

We propose to add a new data-source type in Dashboards saved objects, which includes the data source URL, capabilities (such as what plugins are available), and credentials (credentials will be encrypted by OSD when persisted) to be used to access the data source. Then index-pattern can refer to a data-source, and based on this data-source reference, Dashboards server can execute the query against the specific data-source.

For instance, a data-source object may look like:

{
  "type": "data-source",
  "data-source": {
    "title": "demo-data-source",
    "host": "https://my.opensearch.domain/",
    "auth_type": "basicauth",
    "credentials": {
      "username": "dashboards_user",
      "password": "password",
    },
    "capabilities": {
      "alerting": {
        "enabled": true,
        "version": "1.2",
      },
      "ism": {
        "enabled": true,
        "supported_actions": [
          "roll_over",
          "shrink"
        ]
      }
    }
  },
  ...
} 

And we will add a reference to data-source in index-pattern, so that an index-pattern object will look like:

{
  "type": "index-pattern",
  "index-pattern": {
    "title": "demo-index-pattern",
    "fields": {
      ...
    },
    "dataSource": "data-source-obj-id" 
  },
  "references" : [
    {
      "id": "data-source-obj-id",
      "name": "kibanaSavedObjectMeta.dataSource",
      "type": "data-source"
    }
  ],
  ...
}

With the new data-source model being added, visualziations are able to get the data source reference id from index pattern and then pass it to OSD server along with the query. Then OSD server can get the data source attributes using saved object service, then query that specific data source.

The user experience will be changed by having the new data-source model. Users needs to create data sources before they can create an index pattern. Then, when creating an index pattern, users will need to select a data source which the index pattern will be associated to. Going afterwards, the visualization and dashboard building experience will remain the same as it is today.

A PoC for adding data-source model and use it in index-pattern and visualization can be found at: https://github.com/zengyan-amazon/OpenSearch-Dashboards/tree/ext-data-source-discover

There is a caveat that data-source includes user credentials, which needs to be encrypted and handled carefully. That may break the general data handling in saved object service, as data-source needs to be handled specially. Or we may end up letting OSD to manage another secure index(or data store) to handle data-source/credentails.

Scope

  • For this RFC, we focus on supporting data sources that is compatible with OpenSearch 1.x APIs. We will try to make sure the design and implementation to be extensible to support other data sources, but it is not a design goal.
  • The credentials should be handled in secure way, such as encryption is in scope.
  • Support of non-visualization plugins, such as alerting, to connect to different OpenSearch data sources is in scope.

FAQ

Is it required to have data source defined for all index patterns? What if I don't want this capability?

The plan is to have this multiple data source feature configurable, so that users can enable or disable it in OSD's yml config file.

Also, we wanted to maintain backward compatibility, so that users can upgrade safely. When an index pattern doesn't have a data source, it can fall back to use the same OpenSearch endpoint as its saved object store.

I enabled security plugin for both OpenSearch Dashboards and OpenSearch clusters, can OpenSearch Dashboards use my OSD credentials to query OpenSearch data sources?

This is more about a implementation level detail. It can work with basic auth, but not likely to work with users who logs into OSD using SSO like OIDC or SAML. We want to provide the simplest expreience to users, and will figure out more details during design and implemenation phase.

@zengyan-amazon zengyan-amazon added discuss RFC Substantial changes or new features that require community input to garner consensus. labels Mar 25, 2022
@zengyan-amazon zengyan-amazon self-assigned this Mar 25, 2022
@seraphjiang
Copy link
Member

@zhongnansu would you take a look

@zhongnansu
Copy link
Member

@zengyan-amazon @seraphjiang Thanks for putting up the detailed proposal. I do have some questions that I'd like to discuss here.

  1. Does this allow OSD to support cross cluster features(queries, visualization, etc)? Can one index-pattern be associated to indices from different cluster? From the RFC, I don't think cross cluster features will be supported, just to confirm. Also, I wonder if this is a valid user need.
  2. For the capabilities field in the data-source model, why do we need to define plugin availability explicitly, if OSD is connecting to one cluster at a time. I thought this meta info can always be retrieved on demand by calling the_cat/_plugin API to the OpenSearch endpoint defined in data-source ?

@zengyan-amazon
Copy link
Member Author

@zhongnansu these are good points and call outs

Does this allow OSD to support cross cluster features(queries, visualization, etc)? Can one index-pattern be associated to indices from different cluster? From the RFC, I don't think cross cluster features will be supported, just to confirm. Also, I wonder if this is a valid user need.

I would suggest to start with no cross cluster and then we can discuss if we want to add the capability later to OSD. cross cluster can be a follow up project, which may also requires aggregation in OSD and a lot of other considerations. We can keep it in mind when doing the design and implementation to leave as much flexibility as possbile for future. Keeping OSD flexible and extensible is always one of our principle.

For the capabilities field in the data-source model, why do we need to define plugin availability explicitly, if OSD is connecting to one cluster at a time. I thought this meta info can always be retrieved on demand by calling the_cat/_plugin API to the OpenSearch endpoint defined in data-source ?

This is more about the design detail, the point is the OSD needs to have a way to determine a specific capability is support in the given data source. Define it in data source is one approach, using _cat/_plugin API is another.

We can discuss the pros and cons in design phase, e.g. the config in data source approach is not dynamic, while the API approach has an assumption that the identity used by data source (or maybe OSD server) needs to have the permission to list all plugins in each backend cluster, which may or many not be a valid assumption.

@zhongnansu
Copy link
Member

Created a proposal for better client management in multi data source project. POC is needed
Tracking here #1499

@peternied
Copy link
Member

@zengyan-amazon This seems like an opportunity to reinvent the primitive data type used to power OpenSearch Dashboard queries, index-patterns are an OpenSearch concept. What do you think about embedding index-patterns into the data-source definition?

When adding support for other sources like SQL tables, DynamoDB, or CosmoDB there would be a common interface. Another way to frame this problem is how to write an OpenSearch data-source.

@zengyan-amazon
Copy link
Member Author

@peternied A datasource here describes an OpenSearch endpoint, which may have multiple index-patterns. And index-pattern is the foundation of visualizations, so embedding index-pattern into data source definition may not be a good idea based on current model in OpenSearch Dashboards.

If we really want to support other sources like SQL DB, DynamoDB or others, we may consider still use composition that making data source as an attribute of index-pattern, and remodel index-pattern to make it more generic to support other use cases

@bjo004
Copy link

bjo004 commented Jun 4, 2022

@zengyan-amazon @seraphjiang Thanks for putting up the detailed proposal. I do have some questions that I'd like to discuss here.

  1. Does this allow OSD to support cross cluster features(queries, visualization, etc)? Can one index-pattern be associated to indices from different cluster? From the RFC, I don't think cross cluster features will be supported, just to confirm. Also, I wonder if this is a valid user need.
  2. For the capabilities field in the data-source model, why do we need to define plugin availability explicitly, if OSD is connecting to one cluster at a time. I thought this meta info can always be retrieved on demand by calling the_cat/_plugin API to the OpenSearch endpoint defined in data-source ?

I can confirm that this is a valid user need. I'm dealing with very many OpenSearch clusters deployed in kubernetes and it would be very nice to have one OSD in front of them all and also not having to configure SSO for each OSD.

Kind regards,

Bankole.

@dblock
Copy link
Member

dblock commented Aug 31, 2022

I see that the proposal has a separate UX for credentials and data sources. I think this is a bad idea.

  1. Bad user experience. How many data sources will a typical cluster have? I bet no more than 5, so why would users have to configure credentials in a separate panel, associate them with a data source, etc?
  2. It's a security problem that implies everybody is an admin and anyone who can create a data source can see all credentials. Charlie creates credentials C, attaches them to data source D1. Now Alice attaches credentials C to data source D2, so Alice can now get a copy of C.
  3. It's a 1-way door. Once you can associate a set of credentials with multiple data sources you cannot go back to a 1:1 relationship because users rely on the many:1 behavior.

I think that for the first cut you should simplify and not build a credentials panel, but let users configure credentials in the data source editor UX. You can still store credentials in a separate object so that you can build a credentials management panel in the future.

  1. It's a lot simpler for users to edit credentials in the data source editor.
  2. It enables a security model where Charlie can create a set of credentials that will never be accessible by anyone other than Charlie. Charlie owns the data source they create, nobody else needs to modify/see it.
  3. It's a 2-way door. You can build a 1:1 data-source:credentials now, and always expand it to many:1, but not the other way around.

@zhongnansu zhongnansu added the multiple datasource multiple datasource project label Sep 1, 2022
@seraphjiang seraphjiang added the v2.4.0 'Issues and PRs related to version v2.4.0' label Sep 12, 2022
@seraphjiang
Copy link
Member

seraphjiang commented Sep 21, 2022

here is link to track the feedback
#2400

@joshuarrrr
Copy link
Member

@zengyan-amazon This issue is labeled v2.4.0. Is it ready to be closed, or should we update/remove the release label?

@seraphjiang
Copy link
Member

Closing this as it is RFC, will track task in other meta issue.
@zengyan-amazon @joshuarrrr @seanneumann @kristenTian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss multiple datasource multiple datasource project RFC Substantial changes or new features that require community input to garner consensus. v2.4.0 'Issues and PRs related to version v2.4.0'
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants