Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregating _ignored field values #59946

Closed
aclowkey opened this issue Jul 21, 2020 · 12 comments · Fixed by #101373
Closed

Aggregating _ignored field values #59946

aclowkey opened this issue Jul 21, 2020 · 12 comments · Fixed by #101373
Assignees
Labels

Comments

@aclowkey
Copy link

When dealing with lots of data and lots of users, having the ignore_malformed option is great!
And using it in combination with _ignored field can give a lot of information about what is wrong with the data.

But unfortunately we can't aggregate on it, so it's hard to give an overview of which fields had issues in the last X hours.

I think it would be very useful to add .keyword for _ignored field, or make the _ignored field tokenised.

@aclowkey aclowkey added >enhancement needs:triage Requires assignment of a team area label labels Jul 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 21, 2020
@jimczi jimczi added :Search Foundations/Mapping Index mappings, including merging and defining field types team-discuss and removed needs:triage Requires assignment of a team area label labels Jul 21, 2020
@jtibshirani
Copy link
Contributor

We had an initial discussion and agreed that it seems useful to be able to aggregate over the _ignored field. We didn't reach a final conclusion but plan to continue the discussion. Other notes:

  • We'd like to avoid adding another option to the mapping like "doc_values": true. However we weren't sure if it was worth the cost to always enable doc_values.
  • One idea was to always enable doc_values, but stop adding the field as a stored field. The fetch phase would switch to retrieve _ignored from doc values.

@aclowkey
Copy link
Author

aclowkey commented Aug 4, 2020 via email

@davishmcclurg
Copy link

We recently started looking at using ignore_malformed in Enterprise Search. Being able to aggregate on the _ignored field would be super helpful for showing users what field types make sense for their data. Let me know if you're interested and I can explain a little bit more about our use case.

@jimczi
Copy link
Contributor

jimczi commented Aug 14, 2020

We discussed internally and agreed to always enable doc_values, but stop adding the field as a stored field. The fetch phase would switch to retrieve _ignored from doc values as Julie proposed. @aclowkey would you still be interested to address this in your PR ?

@aclowkey
Copy link
Author

aclowkey commented Aug 17, 2020

@jimczi I addressed it. I hope I didn't misunderstand you.
Please review

@javanna
Copy link
Member

javanna commented Apr 28, 2021

This came up again as we are using the _ignored field also for fields that have a script configured. In case the script fails, the document may still get indexed (with on_script_error set to continue) but the field name goes in the _ignored field. Yet another reason to enable doc_values for the _ignored field.

@javanna javanna removed the :Analytics/Aggregations Aggregations label Apr 28, 2021
@elasticmachine elasticmachine removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Apr 28, 2021
@cbuescher
Copy link
Member

I came across this issue while looking at the '_ignored' field mapper for #78981 I came across this issue and was wondering if we still want to pursue it.
Talking to @romseygeek he suggested that we might be able to use runtime fields for aggregation, but currently we don't seem to have access to
_ignored there either.

The following ideas came up during chatting:

  • we might be able to access the stored fields from scripts, even if currently this might not work for runtime fields context, but other script contexts might already have this
  • we can maybe add a special accessor to get _ignored
  • we could as planned and enable doc_values, but stop adding the field as a stored field

If we find a solution that uses the existing way that '_ignored' is stored using "stored fields" and aggregate using runtime fields, this would have the additional benefit of working with existing indexes.

@cbuescher
Copy link
Member

but currently we don't seem to have access to _ignored there either.

I played around a bit more with runtime field scripts and think I missed an option. Using:

POST /test/_search
{
  "query": {
    "exists": {
      "field": "_ignored"
    }
  },
  "runtime_mappings": {
      "my_ignored": {
        "type": "keyword",
        "script" : { "source" : """emit(params['_fields']._ignored.value)""" } 
      }
    },
  "aggs": {
    "a1": {
      "terms": {
        "field": "my_ignored"
      }
    }
  }
}

I now was able to aggregate over at least one value from the ' _ignored' field. I didn't get multiple values working (which would be necessary for an aggregation I think), but that might also already be possible with the right scripting magic.

@nik9000
Copy link
Member

nik9000 commented Oct 20, 2021

    "script" : { "source" : """emit(params['_fields']._ignored.value)""" }

I think

"""
for (def v : params['_fields']._ignored.values) {
  emit(v);
}
"""

We'd talked about a version of emit that took a list but never built it. It'd more complex than it sounds like.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@eyalkoren eyalkoren added the Team:obs-knowledge Meta label for Observability Knowledge team label Nov 29, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.