Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a correlation field mapper #8712

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

YANG-DB
Copy link
Member

@YANG-DB YANG-DB commented Jul 16, 2023

New Correlation Mapping Type Support

As part of the Integration campaign and Integration RFC , we have introduction the SimpleSchema for Observability Domain that is based on the concept of a well-structured index which is based on a schema


The general context:

Schema

A schema is associated to an index using the mapping configuration .

This mapping structure is also composable using the composed_of template capabilities which is used extensively to allow the different assemblies of various log types.

Another concept behind the schema is the capability of reflecting relationships.

Relationships

Relationships are associations between different fields which may exist within the same mapping - we call them aliases, or defined between different mapping and we shall call them correlations.

Goal

Our goal is to formalize and generalize the relationship semantic concept whether its within the same mapping file or between different mapping files.

Correlation Engine

The emergence of the correlation-engine also significantly prove the importance of the schematic awareness of such relationship and explicitly definition them within the document mapping.

Once such mapping type is defined, auto generated correlation rules can be created and allow a strong capability of performing RCA and diagnostics of different domain investigation whether they are Observability or Security related.


Planned Steps

This new schematic concept will be separated into two parts:
Part 1:

  • adding support for existing alias field type within a dedicated GetFieldsAlias API
  • adding support for the field capacities API to reflect the alias field both for showing a concrete field's aliases and also for stating explicitly that a field is of type alias

Part 2:
Based on Part 1 add the similar capabilities for the correlation field type for both

  • adding support for new correlation field type within a dedicated GetFieldsCorrelation API
  • adding support for the field capacities API to reflect the correlation field both for showing a concrete field's correlations and also for stating explicitly that a field is of type correlation

Current Correlations representation

This correlation representation is currently defined in a proprietary way of adding this information to
the index mapping template's metadata

What we want to achieve :

  • Adding a FieldCorrelationMapper that extends the FieldAliasMapper
    FieldCorrelationMapper has the next mapping schema:
{
  "logs": {
    ...
    "mappings": {
      ...
        "traceId": {
          "ignore_above": 256,
          "type": "keyword"
        },
        "spanId": {
          "ignore_above": 256,
          "type": "keyword"
        },
        "traceIdFk": {
          "type": "correlation",
          "path": "traceId",
           "schema_pattern": "traces",
            "remote_path":"traceId"
        },
        "spanIdFk": {
          "type": "correlation",
          "path": "spanId",
           "schema_pattern": "traces",
            "remote_path":"spanId"
        },

    }
  }
}

The spanIdFk & traceIdFk fields act as correlation fields which are connecting the local (path) field to the remote ( schema_pattern:remote_path ) field

Part 1

This part will focus on the enhancement of the alias field type:

Get Field Aliases API (New)

Lets consider the next mapping

        {
              "properties": {
                   "field1":{
                       "type":"text"
                   },
                   "alias":{
                       "type":"alias",
                       "path":"field1"
                   },
                   "aliasTwo":{
                       "type":"alias",
                       "path":"field1"
                   },
                   "subFieldAlias":{
                       "type":"alias",
                       "path":"obj.subfield"
                   },
                   "obj":{
                       "properties":{
                           "subfield":{
                               "type":"keyword"
                           }
                       "subFieldInnerAlias":{
                           "type":"alias",
                           "path":"obj.subfield"
                           }
                      }
                   }
     
              }
          }

The next API was added to get the Field's aliases:
GET /_mapping/field/{fields}/aliases where fields & index accept both a single, multiple and regular expressions

For GET /_mapping/field/['field1', 'obj.subfield']/aliases will return with:

{
  "index" : {
    "mappings" : {
      "aliasTwo" : {
        "full_name" : "aliasTwo",
        "mapping" : {
          "aliasTwo" : {
            "type" : "alias",
            "path" : "field1"
          }
        }
      },
      "obj.subFieldInnerAlias" : {
        "full_name" : "obj.subFieldInnerAlias",
        "mapping" : {
          "subFieldInnerAlias" : {
            "type" : "alias",
            "path" : "obj.subfield"
          }
        }
      },
      "subFieldAlias" : {
        "full_name" : "subFieldAlias",
        "mapping" : {
          "subFieldAlias" : {
            "type" : "alias",
            "path" : "obj.subfield"
          }
        }
      },
      "alias" : {
        "full_name" : "alias",
        "mapping" : {
          "alias" : {
            "type" : "alias",
            "path" : "field1"
          }
        }
      }
    }
  }
}

Get Field capacity API

The next API was enhanced to get the Field's aliases aspect:

The next call GET old_index/_field_caps?fields=distance,route_length_miles
will return the capacities for these fields distance,route_length_miles as they are defined within different mappings, it will also reflect these field's aliases associated with each concrete field (if such exist).

When an alias field is queries for its capacity the API will explicitly state it is of an alias type and will state the mappings of-which is functions as an alias:

{
  "indices": [
    "old_index"
  ],
  "fields": {
    "distance": {
      "double": {
        "type": "double",
        "alias": false,
        "searchable": true,
        "aggregatable": true,
        "aliases": [
          "old_index:route_length_miles"
        ]
      }
    },
    "route_length_miles": {
      "double": {
        "type": "double",
        "alias": true,
        "searchable": true,
        "aggregatable": true,
        "aliases": [],
        "alias_indices": [
          "old_index"
        ]
      }
    }
  }
}

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…r with extra fields for the relationship correlation

Signed-off-by: YANGDB <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

YANG-DB added 4 commits July 18, 2023 16:25
…correlation_field_mapping

# Conflicts:
#	server/src/internalClusterTest/java/org/opensearch/search/fieldcaps/FieldCapabilitiesIT.java
…correlation_field_mapping

Signed-off-by: YANGDB <[email protected]>

# Conflicts:
#	server/src/internalClusterTest/java/org/opensearch/search/fieldcaps/FieldCapabilitiesIT.java
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@YANG-DB YANG-DB marked this pull request as ready for review July 18, 2023 23:39
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

YANG-DB added 2 commits July 21, 2023 22:44
…correlation_field_mapping

# Conflicts:
#	server/src/main/java/org/opensearch/action/fieldcaps/FieldCapabilities.java
#	server/src/test/java/org/opensearch/index/mapper/FieldCorrelationMapperTests.java
#	server/src/test/java/org/opensearch/index/mapper/FieldTypeLookupTests.java
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

YANG-DB added 2 commits July 21, 2023 23:50
…correlation_field_mapping

Signed-off-by: YANGDB <[email protected]>

# Conflicts:
#	server/src/main/java/org/opensearch/action/fieldcaps/FieldCapabilities.java
#	server/src/test/java/org/opensearch/index/mapper/FieldCorrelationMapperTests.java
#	server/src/test/java/org/opensearch/index/mapper/FieldTypeLookupTests.java
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@nknize
Copy link
Collaborator

nknize commented Jul 28, 2023

The failure is because there aren't version checks around the StreamInput / StreamOutput serialization in this PR. So when the BWC tests run (which spins up a 3.0 / 2.10 mixed cluster environment) a 3.0 Node is trying to send a variable over the wire to a 2.10 Node that doesn't have the unmarshalling logic.

> Task :qa:mixed-cluster:v2.10.0#mixedClusterTest
> Task :plugins:repository-azure:azureThirdPartyDefaultXmlTest
> Task :distribution:packages:buildNoJdkArm64Rpm
> Task :distribution:archives:buildLinuxPpc64leTar

> Task :qa:mixed-cluster:v2.10.0#mixedClusterTest FAILED
»  org.opensearch.bootstrap.StartupException: java.lang.IllegalArgumentException: unknown setting [opensearch.experimental.feature.search_pipeline.enabled] did you mean any of [opensearch.experimental.feature.remote_store.enabled, opensearch.experimental.feature.extensions.enabled, opensearch.experimental.feature.identity.enabled, opensearch.experimental.feature.telemetry.enabled]?

@YANG-DB YANG-DB requested a review from sohami as a code owner August 11, 2023 18:43
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/security-analytics.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 24m 37s

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git]

BUILD SUCCESSFUL in 23m 38s

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git]

BUILD SUCCESSFUL in 23m 9s

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

BUILD SUCCESSFUL in 32m 28s

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Sep 11, 2023
@ashking94
Copy link
Member

@YANG-DB Is this being worked upon? If not, lets close this.

@ticheng-aws
Copy link
Contributor

Hi @YANG-DB, do we have any updates?

@ticheng-aws ticheng-aws added the enhancement Enhancement or improvement to existing feature or request label Jan 6, 2024
@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 13, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants