-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Vega support with MDS #5927
Comments
dataSourceId looks good call here, could we conduct an e2e poc next |
@huyaboo So we are proposing to add a new data source field the the script during creation of the visualization. |
I would recommend we use |
@BionIT, we should follow the same conventions that we use for |
Looks we already has an index property. @huyaboo would help to check if this property support pure index or index pattern. cc: @kgcreative @BionIT export interface UrlObject { |
Yeah Vega plugin supports index patterns since fetching the data is treated like running a search query. Any information in the vega spec is persisted so using an |
Thanks. Could we see an example using index-pattern in vega today without MDS. A record video will help if possible |
@huyaboo @kgcreative @bandinib-amzn @BionIT Btw, when I ask about if vega support index pattern, I mean the index-pattern created and saved in saved object. not the arbitrary target string. The user created index pattern may contain scripted fields.
|
nitpick: let's use Vega is not supported with MDS, to be consistent. |
Thanks @kgcreative and team for the brainstorming, I'd provide more information to see if that could help us to make the call.
pre-pend datasource to index could be one option, however the syntax may confuse user who using cross-cluster. whether vega support cross-cluster is unknown. it may beyond the scope of this feature request, we could create separate to track that.
currently, data-source-id is the key to retrieve the detail information. Agree with @kgcreative we could lookup ID by name. However, this will introducing another _search API dependency of this feature. besides the overhead, in vega viz, edit, rendering page. we may also take save-object import/export for MDS into consideration. @BionIT @huyaboo @bandinib-amzn would you meet and dive deep a little and come up proposal quickly
refer to more detail here
Our end vision to move to true multi datasource world. However there are ambiguous part on both use case and technical detail. from technical side, we might rely on vega/vega lite to support MDS.
From use case, we will need more use case to help us validate and prioritize. cc: @kgcreative @zengyan-amazon @BionIT @huyaboo @bandinib-amzn |
After some conversations with Vega users, the tradeoff with using dataSourceName may be worth it purely from a useability point-of-view.
This could be a potential option as well. Right now the Vega visualization only accepts OpenSearch queries.
After some consideration, I'm not sure if any approach can mitigate issues when importing/exporting. The revised proposal will support querying data from multiple indices from multiple datasources (see the
We do not need to rely on Vega/Vega-lite for this. The current plugin already supports querying multiple indices from local cluster by making the |
@huyaboo if it is confirmed, Vega don't support index-pattern. let's create an to track this feature request, we should focus on enhancement in 2.13, and make incremental progress.
@huyaboo did you get chance to take a look https://opensearch.org/blog/enhancement-multiple-data-source-import-saved-object/ I wish the change could compatible with the enhanced import/export feature delivered by @BionIT @yujin-emma . if so , there is no work, otherwise, please check with @BionIT @yujin-emma to ensure for customer who live in non-MDS world, , they could export their vega saved object, and import into world with MDS enabled.
|
Proposal
Currently, Multiple DataSource (MDS) does not support Vega visualizations. Thus, we propose to create a new optional field from within the Vega-spec
data_source_name
(tentatively named) that is passed within theurl
body of thedata
body. This field will take in a datasource name and under the hood, the visualization will be able to retrieve data from the index located in that datasource. This enables a user to retrieve data from one or more indices from one or more datasources to create custom Vega visualizations.Here is an example Vega spec that should be supported with this feature. Note that
data
field can be a singularurl
object or an array that contains multipleurl
objects:Background
Vega is a declarative visualization grammar that can be used to create and share custom interactive visualizations. It is important to note that Vega-lite is a similar but very much different lightweight visualization grammar. Both of these grammars are supported within Dashboards but with the caveat that the data is retrieved from the index BEFORE any rendering happens. This means that data is NOT dynamically loaded. Additionally, Vega is not supported with MDS since the local cluster is the assumed datasource. This proposal will fix the latter (with the former being out of scope).
It is also important to clarify what is meant by Vega support for MDS. Vega support for MDS can be interpreted in two different ways
Option 1: Vega can support visualizations which fetch data from multiple datasources
In this example, the Vega visualization is stored in Datasource A but references data from Datasources A, B, and C
Option 2: Vega can support visualizations which reference data from the local cluster or any remote datasource (but not both)
In this example, the Vega visualization is stored in Datasource A but references data only from Datasource B and not any other datasource
The proposal seeks to support Option 1. While there are more limitations with option 1 (see the
Limitations
section), having the option to fetch data from any index from any datasource (provided the user has the permissions) provides a robust visualization experience.Approach
When Dashboards parses the Vega-spec to render the visualization, it parses the URL object and passes the object into the search API, which uses
IOpenSearchSearchRequest
as a parameter. This interface provides a fielddataSourceId
that will tell dashboards to use the data source client. All that the Vega plugin would need to do is check if MDS is enabled and if so, retrieve the associated datasource id from thedata_source_name
field and pass it into the search query.Add a
datasource
field to theUrlObject
Have the saved objects client get the associated
dataSourceId
from thedata_source_name
viafind
Then, in the
_searchAPI.search()
method, we can pass in thedataSourceId
as a parameterThus, when the user wants to write a visualization with data from another datasource, they can do something like the following (mockup)
Limitations
This approach provides greater flexibility in enabling users to make visualizations. However with great power comes great responsibility.
data_source_name
is specific to an OpenSearch cluster, if users were to export/import into another cluster and the same datasource names are not configured, the visualizations cannot find the data and thus return errors. This is a tradeoff that can be made but in the future, this would ideally need to be mitigated. See below sectionImporting Vega saved objects
for mitigating some of these issues.data_source_names
to eachurl
that contains one is cumbersome, especially when multipleurl
objects were involved.Importing Vega saved objects
In addition to the above requirements, the Vega visualization should have support for importing saved objects. As mentioned in
Limitations
, full support is a challenge since these the current import logic supports only one datasource. Following similar logic as #5712, this issue will take into consideration the following scenarios:data_source_name
will not be present in these visualizations, the field can be added to the Vega spec directly.data_source_name
that uses the previous datasource will be updated to use the newdata_source_name
.Alternatives
Initially the decision was made to use the
data_source_id
vs thedata_source_name
. This was due to the fact thatdata_source_id
enforces a unique datasource to query from and does not make an extra find query to find the datasource. However, other plugins referred to datasource by name, not by id, and having name be the identifier is more user friendly.Open Question(s)
datasource
can be a bit ambiguous here, what are some alternative field names here? Im thinkingdata_source_id
would help disambiguate this fieldThe text was updated successfully, but these errors were encountered: