You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users store logs, event data, and other data in object stores, like S3, for analysis with batch-based analytics tooling (e.g., Spark). We see an opportunity to unlock complex querying (compute) and improve OpenSearch's visibility to data at rest. Given the popularity of Spark’s open source community and AWS’ investment in Spark, we are exploring an integration with OpenSearch and Spark. Spark’s complex query engine and indexing will mitigate query latency issues for data resting outside of OpenSearch.
Proposed Solution
In this release, we are building on the API based Spark work in 2.9. Users in Dashboard will be able to add a new Spark connection to OpenSearch, setup skipping, materialized views, and covering indexes, as well as search external sources from Observability Logs using PPL. Users in Discover who select a Spark data source will be moved to Observability Logs where they will be able to query using PPL.
Use Cases
Admins will be able to create a new Spark connection and limit access to the connection using OpenSearch FGAC
Admins can accelerate queries using skipping, materialized views, and covering indexes, limit access to who can manage indexes using OpenSearch FGAC, and set index trim settings to for performance and privacy purposes
OpenSearch Dashboards users who use discover will be switched to Observability Logs when they select an external source via Spark and will use OpenSearch PPL as a query language (SQL will be supported using APIs)
Users querying using Observability Logs will be able to direct query external
We weren't able to finish everything that we wanted to for the release. We will iteratively launch, and document, what we have, but we will push out formal support into an upcoming release.
Background
Users store logs, event data, and other data in object stores, like S3, for analysis with batch-based analytics tooling (e.g., Spark). We see an opportunity to unlock complex querying (compute) and improve OpenSearch's visibility to data at rest. Given the popularity of Spark’s open source community and AWS’ investment in Spark, we are exploring an integration with OpenSearch and Spark. Spark’s complex query engine and indexing will mitigate query latency issues for data resting outside of OpenSearch.
Proposed Solution
In this release, we are building on the API based Spark work in 2.9. Users in Dashboard will be able to add a new Spark connection to OpenSearch, setup skipping, materialized views, and covering indexes, as well as search external sources from Observability Logs using PPL. Users in Discover who select a Spark data source will be moved to Observability Logs where they will be able to query using PPL.
Use Cases
Resources
The text was updated successfully, but these errors were encountered: