Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add source_path information to field_caps API #49264

Closed
jimczi opened this issue Nov 18, 2019 · 10 comments
Closed

Add source_path information to field_caps API #49264

jimczi opened this issue Nov 18, 2019 · 10 comments
Assignees
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jimczi
Copy link
Contributor

jimczi commented Nov 18, 2019

Currently, the _field_caps API presents a transparent view of fields where aliases and multi-fields are not linked. However for some applications it is important to be able to retrieve the source field that was used to populate a specific field since the value of a sub-field (inside a multi-field) or an alias is not present in the _source. For this reason we have some duplicated logic in ml and sql that tries to infer the source path using some fragile heuristics. In order to simplify this retrieval this issue is a proposal to add a source_path section to the fields inside the field_caps response. This way it would be easy for consumers of this API to redirect some fields to their source path when values need to be retrieved:

{
  "indices" : [ "index1", "index2" ],
  "fields" : {
    "alias_field" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true,
        "source_path": "concrete_field"
      }
    },
    "some_field.multi_field" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true,
        "source_path": "some_field"
      }
    }
  }
}

This information should be easy to add for alias field and sub-fields inside a multi-field. It might be more problematic for copy_to since the target could have multiple source.

@jimczi jimczi added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types labels Nov 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@jtibshirani
Copy link
Contributor

One complexity to note is that a field could have a different source_path across indices. For example, a field could be mapped as an field alias in one index, and as a concrete field in another. (We actually expect this set-up to be common, as it is part of the workflow for renaming a field with time-based indices). So I think the response format would need to support separate source paths for different indexes.

@jpountz
Copy link
Contributor

jpountz commented Nov 18, 2019

Would it be good enough to return an array that contains both the old and new name in that case? I believe we would need the source_path to be an array anyway because copy_to means that a single field might have sources in multiple fields? And we would compute the union when merging field caps from multiple indices?

@jtibshirani
Copy link
Contributor

jtibshirani commented Dec 10, 2019

We discussed offline and came up with the proposal that source_path could be treated differently than the existing components in the response. We largely expect components like searchable and aggregatable to be consistent across indices, and what's most helpful to the user is a merged view of their values. But for source_path, there is no expectation of consistency across indices, and in some common use cases (like the field alias case described above), the values will differ.

Here's a sketch for how it could look to include index information for source_path. The last section all_field represents a field where other field values have been copied into it through copy_to.

{
  "indices" : [ "index1", "index2" ],
  "fields" : {
    "alias_field" : {
      "keyword" : {
        "searchable" : true,
        "aggregatable" : true,
        "source_path": [{
            "paths": ["concrete_field"],
            "indices": ["index1"]
        }]
      }
    },
    "some_field.multi_field" : {
      "keyword" : {
        "searchable" : true,
        "aggregatable" : true,
        "source_path": [{
            "paths": ["some_field"],
            "indices": ["index1", "index2"]
        }]
      }
    }
  },
   "all_field" : {
     "text" : {
       "searchable" : true,
       "aggregatable" : true,
       "source_path": [{
           "paths": ["title_field", "abstract_field", "body_field"],
           "indices": ["index1"],
       }, {
           "paths": ["title_field", "body_field"],
           "indices": ["index2"]
       }]
     }
   }
}

@jpountz
Copy link
Contributor

jpountz commented Dec 12, 2019

This makes sense to me. I wonder how to handle meta fields (e.g. _id or _index) and constant_keyword fields too.

For meta fields, maybe we don't need to do anything and just expect clients to rely on the assumption that fields that start with an underscore are a meta field and expected to be returned directly as part of the hit?

Regarding constant_keyword maybe we could slightly modify your proposal to make it possible to return a value directly, e.g.

{
  "indices" : [ "index1", "index2" ],
  "fields" : {
    "my_keyword" : {
      "constant_keyword" : {
        "searchable" : true,
        "aggregatable" : true,
        "source": [{
            "value": ["foo"],
            "indices": ["index1"]
        }]
      },
      "keyword" : {
        "searchable" : true,
        "aggregatable" : true,
        "source": [{
            "path": ["my_keyword"],
            "indices": ["index2"]
        }]
      }
    },
    "alias_field" : {
      "keyword" : {
        "searchable" : true,
        "aggregatable" : true,
        "source": [{
            "paths": ["concrete_field"],
            "indices": ["index1"]
        }]
      }
    },
    "some_field.multi_field" : {
      "keyword" : {
        "searchable" : true,
        "aggregatable" : true,
        "source": [{
            "paths": ["some_field"],
            "indices": ["index1", "index2"]
        }]
      }
    }
  },
   "all_field" : {
     "text" : {
       "searchable" : true,
       "aggregatable" : true,
       "source": [{
           "paths": ["title_field", "abstract_field", "body_field"],
           "indices": ["index1"],
       }, {
           "paths": ["title_field", "body_field"],
           "indices": ["index2"]
       }]
     }
   }
}

Another question is whether we should always return the source information, or whether we should skip it in the common case that the source path is equal to the field name on all indices?

@jtibshirani jtibshirani self-assigned this Jan 10, 2020
jtibshirani added a commit that referenced this issue Feb 4, 2020
Currently, the same class `FieldCapabilities` is used both to represent the
capabilities for one index, and also the merged capabilities across indices. To
help clarify the logic, this PR proposes to create a separate class
`IndexFieldCapabilities` for the capabilities in one index. The refactor will
also help when adding `source_path` information in #49264, since the merged
source path field will have a different structure from the field for a single index.

Individual changes: 
* Add a new class IndexFieldCapabilities.
* Remove extra constructor from FieldCapabilities.
* Combine the add and merge methods in FieldCapabilities.Builder.
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this issue Feb 4, 2020
Currently, the same class `FieldCapabilities` is used both to represent the
capabilities for one index, and also the merged capabilities across indices. To
help clarify the logic, this PR proposes to create a separate class
`IndexFieldCapabilities` for the capabilities in one index. The refactor will
also help when adding `source_path` information in elastic#49264, since the merged
source path field will have a different structure from the field for a single index.

Individual changes:
* Add a new class IndexFieldCapabilities.
* Remove extra constructor from FieldCapabilities.
* Combine the add and merge methods in FieldCapabilities.Builder.
jtibshirani added a commit that referenced this issue Feb 4, 2020
Currently, the same class `FieldCapabilities` is used both to represent the
capabilities for one index, and also the merged capabilities across indices. To
help clarify the logic, this PR proposes to create a separate class
`IndexFieldCapabilities` for the capabilities in one index. The refactor will
also help when adding `source_path` information in #49264, since the merged
source path field will have a different structure from the field for a single index.

Individual changes:
* Add a new class IndexFieldCapabilities.
* Remove extra constructor from FieldCapabilities.
* Combine the add and merge methods in FieldCapabilities.Builder.
@wylieconlon
Copy link

It seems like this proposal would work for alias and multi-mapped fields like the default text/keyword multi-mapping, and I'm looking forward to using it in Lens.

What is the expected behavior when the field is mapped with optional properties like index: false or doc_values: true? Are there edge cases to be careful of?

@jtibshirani
Copy link
Contributor

Thanks for taking a look @wylieconlon !

What is the expected behavior when the field is mapped with optional properties like index: false or doc_values: true? Are there edge cases to be careful of?

This proposal centers on adding information about the field's path in _source and won't change the existing behavior of the field caps API around parameters like index and doc_values. These parameters are reflected in the searchable, aggregatable, non_searchable_indices, etc. components of the field caps response. Let me know if I'm missing part of your question.

@Bargs
Copy link

Bargs commented Feb 12, 2020

Answering a couple questions asked over email:

Is adding information about the 'source path' enough, or would you need explicit indication that the field is an alias?

I think we might still need an explicit indication that the field is an alias. Unless I'm missing something, the currently proposed API response wouldn't allow us to determine which fields are aliases and which are copy_to. I don't have an example off the top of my head, but perhaps we'd want to treat those two differently somewhere in Kibana. A "sub_type" property on the source_path objects might be generally useful. Although it is possible to determine which fields are multi fields currently, the logic for doing so is a bit hairy and it would be nice to have a central place to check for any sub type. To be clear, this is what I'm thinking:

{
  "indices": [
    "index1",
    "index2"
  ],
  "fields": {
    "alias_field": {
      "keyword": {
        "searchable": true,
        "aggregatable": true,
        "source_path": [
          {
            "paths": [
              "concrete_field"
            ],
            "indices": [
              "index1"
            ],
            "sub_type": "alias"
          }
        ]
      }
    },
    "some_field.multi_field": {
      "keyword": {
        "searchable": true,
        "aggregatable": true,
        "source_path": [
          {
            "paths": [
              "some_field"
            ],
            "indices": [
              "index1",
              "index2"
            ],
            "sub_type": "multi"
          }
        ]
      }
    }
  },
  "all_field": {
    "text": {
      "searchable": true,
      "aggregatable": true,
      "source_path": [
        {
          "paths": [
            "title_field",
            "abstract_field",
            "body_field"
          ],
          "indices": [
            "index1"
          ],
          "sub_type": "copy_to"
        },
        {
          "paths": [
            "title_field",
            "body_field"
          ],
          "indices": [
            "index2"
          ],
          "sub_type": "copy_to"
        }
      ]
    }
  }
}

The multi-index case is a bit tricky. In particular, a field can be mapped as an alias in one index, and as a concrete field in another. We currently propose to return the full source path information separated out by index, but I'm curious if it'd be just as helpful to return a merged list of source paths across all indices.

I like how you're currently separating it out by index. Since we don't know of 100% of the places we'll need to use this info in Kibana, I think the more info the better.

Aside from my sub_type suggestion I think the API looks great and it should definitely give us the info we need in Kibana.

@jpountz
Copy link
Contributor

jpountz commented Mar 6, 2020

@jtibshirani @jimczi Should we close this in favor of #49028?

@jtibshirani
Copy link
Contributor

That works for me, we can always reopen if the idea becomes relevant again.

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

7 participants