Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discover] Show ignored field values #115040

Merged
merged 51 commits into from
Oct 19, 2021
Merged

Conversation

timroes
Copy link
Contributor

@timroes timroes commented Oct 14, 2021

Summary

Fixes #101232

This PR adds support for the ignored_field_values API added in elastic/elasticsearch#74121. This allows now seeing values which were ignored e.g. because they were too large or too long in Discover. We track this as a bug fix, since this is how the old _source implementation in Discover worked and was often perceived as a regression bug when switching to the new fields API implementation in Discover. Those values should now show back in Discover (and CSV reporting).

Ignored field values will show a warning now in Discover and filtering will be disabled on them:

screenshot-20211018-091456

Limitations:

  • ignored values inside nested fields won't be shown by this PR. We decided not to support them by the API, since we don't have the knowledge anymore to know from which "object" inside the nested field the ignored value came, thus we can't reasonably merge them back together.
  • We decided not to support showing those values if you have a "object column" in your discover table (e.g. object as a column with belows test data). You can't run with the fields API anymore in the situation where you can add object columns, thus we consider them to only be there for backwards compatibility with old saved searches, and won't add this functionality to them.

Content of this PR

  • Enables merging ignored_field_values in the flattenHit implementation in tabifyDocs (the one that should be used nowadays) behind a parameter. I think the default behavior should stay ignoring these values.
  • Change the behavior we format whole documents in Discover to use flattenField instead of formatHit (which does not support that parameter yet).
  • Change Discover logic to no longer use the _source field formatter at all but have all the logic inside it's own plugin. Since there are no further consumers for this "pretty rendered _source" outside Discover, we could simplify the _source field formatter to a simple JSON.stringify (to help get rid of formatHit).
  • Remove formatHit and formatField completely from DataViews, since Discover and the old _source field formatter were the last consumers of this.
  • Remove the indexPattern parameter from field formatters, since it was only used by the _source field formatter.

For more details please have a look at my inline comments on this PR and the comments I added to the code.

How to test?

You will require some field values which are ignored either because they are too long or they are malformed. I've used the following commands to get different types of ignore_above ignored data:

Console commands for inserting data
PUT discover-test
{
  "mappings": {
    "properties": {
      "number": {
        "type": "long"
      },
      "text": {
        "type": "keyword"
      },
      "short": {
        "type": "keyword",
        "ignore_above": 8
      },
      "nested": {
        "type": "nested",
        "properties": {
          "short": {
            "type": "keyword",
            "ignore_above": 8
          },
          "more": {
            "type": "keyword"
          }
        }
      },
      "object": {
        "type": "object",
        "properties": {
          "short": {
            "type": "keyword",
            "ignore_above": 8
          },
          "more": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

PUT discover-test/_doc/1?refresh=true
{
  "number": 123,
  "text": "test foo bar",
  "short": ["only", "valid"],
  "nested": [
    { "short": ["only", "valid"], "more": ["some", "more"] },
    { "short": ["too long value"], "more": ["some", "more"] }
  ],
  "object": [
    { "short": ["only", "valid"], "more": ["some", "more"] },
    { "short": ["too long value"], "more": ["some", "more"] }
  ]
}

PUT discover-test/_doc/2?refresh=true
{
  "number": [123],
  "text": ["test foo bar"],
  "short": "valid",
  "nested": { "short": ["too long value"], "more": ["some", "more"] },
  "object": { "short": ["too long value"], "more": ["some", "more"] }
}

PUT discover-test/_doc/3?refresh=true
{
  "number": [],
  "text": [null, null],
  "short": "invalid one",
  "nested": [
    { "short": "this one too long too" },
    { "short": "too long value" }
  ],
  "object": [
    { "short": "this one too long too" },
    { "short": "too long value" }
  ]
}

PUT discover-test/_doc/4?refresh=true
{
  "number": null,
  "text": [""],
  "short": ["mixed", "valid and invalid"],
  "nested": [
    { "short": ["mixed", "with some valid some invalid"], "more": ["some", "more"] },
     { "short": ["mixed", "with some valid some invalid"], "more": ["some", "more"] }
  ],
  "object": [
    { "short": ["mixed", "with some valid some invalid"], "more": ["some", "more"] },
     { "short": ["mixed", "with some valid some invalid"], "more": ["some", "more"] }
  ]
}

PUT discover-test/_doc/5?refresh=true
{
  "short": """[2021-10-14T18:45:33.441+02:00][INFO ][plugins.taskManager] TaskManager is identified by the Kibana UUID: 132a44cb-715c-4b78-9e9a-3a8b56411d4b
[2021-10-14T18:45:33.512+02:00][WARN ][plugins.security.config] Session cookies will be transmitted over insecure connections. This is not recommended.
[2021-10-14T18:45:33.524+02:00][WARN ][plugins.security.config] Session cookies will be transmitted over insecure connections. This is not recommended.
[2021-10-14T18:45:33.567+02:00][INFO ][plugins.ruleRegistry] Write is disabled; not installing common resources shared between all indices
[2021-10-14T18:45:33.835+02:00][INFO ][plugins.reporting.config] Chromium sandbox provides an additional layer of protection, and is supported for Linux Arch Linux Rolling OS. Automatically enabling Chromium sandbox.
[2021-10-14T18:45:33.848+02:00][INFO ][plugins.ruleRegistry] Write is disabled; not installing resources for index .alerts-observability.uptime.alerts
[2021-10-14T18:45:33.848+02:00][INFO ][plugins.ruleRegistry] Write is disabled; not installing resources for index .alerts-observability.logs.alerts
[2021-10-14T18:45:33.848+02:00][INFO ][plugins.ruleRegistry] Write is disabled; not installing resources for index .alerts-observability.metrics.alerts
[2021-10-14T18:45:33.849+02:00][INFO ][plugins.ruleRegistry] Write is disabled; not installing resources for index .alerts-observability.apm.alerts
[2021-10-14T18:45:33.943+02:00][INFO ][savedobjects-service] Waiting until all Elasticsearch nodes are compatible with Kibana before starting saved objects migrations...
[2021-10-14T18:45:33.943+02:00][INFO ][savedobjects-service] Starting saved objects migrations
[2021-10-14T18:45:34.028+02:00][INFO ][savedobjects-service] [.kibana] INIT -> OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT. took: 77ms.
[2021-10-14T18:45:34.029+02:00][INFO ][savedobjects-service] [.kibana_task_manager] INIT -> OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT. took: 77ms.
[2021-10-14T18:45:34.095+02:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT -> OUTDATED_DOCUMENTS_SEARCH_READ. took: 67ms.
[2021-10-14T18:45:34.096+02:00][INFO ][savedobjects-service] [.kibana_task_manager] OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT -> OUTDATED_DOCUMENTS_SEARCH_READ. took: 67ms.
[2021-10-14T18:45:34.164+02:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_READ -> OUTDATED_DOCUMENTS_SEARCH_CLOSE_PIT. took: 69ms.
[2021-10-14T18:45:34.169+02:00][INFO ][savedobjects-service] [.kibana_task_manager] OUTDATED_DOCUMENTS_SEARCH_READ -> OUTDATED_DOCUMENTS_SEARCH_CLOSE_PIT. took: 73ms.
[2021-10-14T18:45:34.253+02:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_SEARCH_CLOSE_PIT -> UPDATE_TARGET_MAPPINGS. took: 89ms.
[2021-10-14T18:45:34.255+02:00][INFO ][savedobjects-service] [.kibana_task_manager] OUTDATED_DOCUMENTS_SEARCH_CLOSE_PIT -> UPDATE_TARGET_MAPPINGS. took: 86ms.
[2021-10-14T18:45:34.422+02:00][INFO ][savedobjects-service] [.kibana] UPDATE_TARGET_MAPPINGS -> UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK. took: 169ms.
[2021-10-14T18:45:34.423+02:00][INFO ][savedobjects-service] [.kibana_task_manager] UPDATE_TARGET_MAPPINGS -> UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK. took: 168ms.
[2021-10-14T18:45:34.480+02:00][INFO ][savedobjects-service] [.kibana_task_manager] UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK -> DONE. took: 57ms.
[2021-10-14T18:45:34.480+02:00][INFO ][savedobjects-service] [.kibana_task_manager] Migration completed after 528ms
[2021-10-14T18:45:34.532+02:00][INFO ][savedobjects-service] [.kibana] UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK -> DONE. took: 110ms.
[2021-10-14T18:45:34.533+02:00][INFO ][savedobjects-service] [.kibana] Migration completed after 582ms"""
}

Out of scope

Out of scope of this PR (issues to be created):

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@timroes timroes added Feature:Discover Discover Application release_note:fix v8.0.0 v7.16.0 Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. labels Oct 14, 2021
@timroes timroes added buildkite-ci auto-backport Deprecated - use backport:version if exact versions are needed labels Oct 14, 2021
Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
░░░░░░░░░░░████░░░░░░░░░░░░░░░░░░░░
░░░░░░░░░███░██░░░░░░░░░░░░░░░░░░░░
░░░░░░░░░██░░░█░░░░░░░░░░░░░░░░░░░░
░░░░░░░░░██░░░██░░░░░░░░░░░░░░░░░░░
░░░░░░░░░░██░░░███░░░░░░░░░░░░░░░░░
░░░░░░░░░░░██░░░░██░░░░░░░░░░░░░░░░
░░░░░░░░░░░██░░░░░███░░░░░░░░░░░░░░
░░░░░░░░░░░░██░░░░░░██░░░░░░░░░░░░░
░░░░░░░███████░░░░░░░██░░░░░░░░░░░░
░░░░█████░░░░░░░░░░░░░░███░██░░░░░░
░░░██░░░░░████░░░░░░░░░░██████░░░░░
░░░██░░████░░███░░░░░░░░░░░░░██░░░░
░░░██░░░░░░░░███░░░░░░░░░░░░░██░░░░
░░░░██████████░███░░░░░░░░░░░██░░░░
░░░░██░░░░░░░░████░░░░░░░░░░░██░░░░
░░░░███████████░░██░░░░░░░░░░██░░░░
░░░░░░██░░░░░░░████░░░░░██████░░░░░
░░░░░░██████████░██░░░░███░██░░░░░░
░░░░░░░░░██░░░░░████░███░░░░░░░░░░░
░░░░░░░░░█████████████░░░░░░░░░░░░░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting changes LGTM

@ryankeairns
Copy link
Contributor

Tooltips and disabling of filter actions looks good. Noticed an extra 'and' here:

Screen Shot 2021-10-18 at 8 46 19 AM

Copy link
Member

@kertal kertal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far it's looking pretty good, still trying to find the edge case that nobody ever thought of 🔎

@timroes
Copy link
Contributor Author

timroes commented Oct 19, 2021

@elasticmachine merge upstream

@@ -109,6 +115,28 @@ export function flattenHit(
flatten(hit.fields || {});
if (params?.source !== false && hit._source) {
flatten(hit._source as Record<string, any>);
} else if (params?.includeIgnoredValues && hit.ignored_field_values) {
// If enabled merge the ignored_field_values into the flattened hit. This will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I am missing something, but shouldn't there be a check if fields are being read from fields API then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignored_field_values will only be set if requesting the fields API, and thus we're treating this here the same as the rest of that flatten logic: we ignore the actual request and purely look at the response to flatten, e.g. we also merge fields above ignoring what the request might have been, and simply try to merge it if it's available.

The only case here (as stated by the comment) is, that there might be theoretically _source and fields and ignored_field_values, in which case we don't want to merge the ignored_field_values, since all those values were already in _source and thus we're only doing this logic if we didn't merge _source above.

Object.entries(hit.ignored_field_values).forEach(([fieldName, fieldValue]) => {
if (flat[fieldName]) {
// If there was already a value from the fields API, make sure we're merging both together
if (Array.isArray(flat[fieldName])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this would be more readable if it was all one if-statement:

const valueExists = !!flat[fieldName]
if (valueExists && Array.isArray(flat[fieldName])) 
....
else if (valueExists)
...
else
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings around that. I feel making the 2nd else basically dependant on the first (in the sense, that the first one is not hit and then recheck half of the condition from the first), imho does not increase readability. We also don't do this in other places, e.g. https://github.com/timroes/kibana/blob/ignored-field-values/src/plugins/discover/public/application/helpers/format_hit.ts#L55-L61 which could also be transformed the same way into:

if (displayKey && fieldsToShow.includes(key)) {
  pairs.push([displayKey, formattedValue]);
} else if (!displayKey) {
  pairs.push([key, formattedValue]);
}

So maybe we can leave that to a democratic vote :)

🚀 - Flatten this out into one dimension as suggested above
🎉 - Keep it in the current nested way

cc @kertal @dmitriynj

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I don't think it would be more readable the way you suggest, Maja, I'd like to keep it as it is" would have been a perfectly fine response

@@ -141,10 +152,9 @@ export const TableRow = ({
);
} else {
columns.forEach(function (column: string) {
// when useNewFieldsApi is true, addressing to the fields property is safe
if (useNewFieldsApi && !mapping(column) && !row.fields![column]) {
if (useNewFieldsApi && !mapping(column) && row.fields && !row.fields[column]) {
Copy link
Contributor

@majagrubic majagrubic Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik, fields API should never not return at least an empty object. maybe worth checking with the ES team to see if this scenario is intentional?

* will be merged into the flattened document. This will only have an effect if
* the `hit` has been retrieved using the `fields` option.
*/
includeIgnoredValues?: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason we don't make this default to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've talked a bit in the PR description about this. I believe that the behavior of merging a response from the fields API with ignored values together might be more of an edge-case that we need in Discover (where users are more interested "in the document" than in the fields), thus I don't think we should have that on by default. Those values coming from that ignored API might cause other weird side-effects potentially, so I think we should make merging them into the flattened object and opt-in imho.

case IgnoredReason.MALFORMED:
return multiValue
? i18n.translate('discover.docView.table.ignored.multiMalformedTooltip', {
defaultMessage: `This field has one or more malformed values that can't be searched or filtered.`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: for consistency, can this message be "One or more values in this field...", same as for the other two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @gchaps I took your recommendations here, so you might want to give your opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either format works. I went with the text "This field has one or more malformed values that can't be searched or filtered.`" because it is shorter.

Comment on lines +23 to +27
jest.mock('../../../../../kibana_services', () => ({
...jest.requireActual('../../../../../kibana_services'),
getServices: () => jest.requireActual('../../../../../__mocks__/services').discoverServiceMock,
}));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should refactor the mock to something like getServicesMock, that would also allow you to configure it, if needed ... but not in this PR .. just loud thinking ..

Copy link
Member

@kertal kertal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, tested locally and a-la-carte with Chrome, Safari, Firefox works as expected, didn't manage to break it, great work of simplifying and documenting, and overall restoring behavior of Discover the way it should be! 🥳

Copy link
Contributor

@dimaanj dimaanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested locally in Chrome.

Copy link
Contributor

@majagrubic majagrubic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this quite extensively in Chrome on Mac OS X, looks good.

@kibanamachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
dataViews 43 42 -1
discover 397 400 +3
total +2

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
data 2803 2791 -12
dataViews 541 526 -15
fieldFormats 250 246 -4
total -31

Any counts in public APIs

Total count of every any typed public API. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats any for more detailed information.

id before after diff
dataViews 6 5 -1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
discover 328.9KB 332.2KB +3.4KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
data 465.6KB 465.8KB +186.0B
dataViews 38.8KB 38.0KB -829.0B
discover 22.3KB 22.4KB +88.0B
fieldFormats 49.0KB 48.4KB -698.0B
total -1.2KB
Unknown metric groups

API count

id before after diff
data 3193 3181 -12
dataViews 683 668 -15
fieldFormats 288 284 -4
total -31

References to deprecated APIs

id before after diff
dataViews 237 231 -6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@timroes timroes merged commit e8663d4 into elastic:master Oct 19, 2021
@timroes timroes deleted the ignored-field-values branch October 19, 2021 14:43
@kibanamachine
Copy link
Contributor

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts

To backport manually run:
node scripts/backport --pr 115040

jloleysens added a commit to jloleysens/kibana that referenced this pull request Oct 19, 2021
…-link-to-kibana-app

* 'master' of github.com:elastic/kibana: (30 commits)
  Fix potential error from undefined (elastic#115562)
  [App Search, Crawler] Fix validation step panel padding/whitespace (elastic#115542)
  [Cases][Connectors] ServiceNow ITOM: MVP (elastic#114125)
  Change default session idle timeout to 8 hours. (elastic#115565)
  Upgrade EUI to v39.1.1 (elastic#114732)
  [App Search] Wired up organic results on Curation Suggestions view (elastic#114717)
  [i18n] remove i18n html extractor (elastic#115004)
  [Logs/Metrics UI] Add deprecated field configuration to Deprecations API (elastic#115103)
  [Transform] Add alerting rules management to Transform UI (elastic#115363)
  Update UI links to Fleet and Agent docs (elastic#115295)
  [ML] Adding ability to change data view in advanced job wizard (elastic#115191)
  Change deleteByNamespace to include legacy URL aliases (elastic#115459)
  [Unified Integrations] Remove and cleanup add data views (elastic#115424)
  [Discover] Show ignored field values (elastic#115040)
  [ML] Stop reading the ml.max_open_jobs node attribute (elastic#115524)
  [Discover] Improve doc viewer code in Discover (elastic#114759)
  [Security Solutions] Adds security detection rule actions as importable and exportable (elastic#115243)
  [Security Solution] [Platform] Migrate legacy actions whenever user interacts with the rule (elastic#115101)
  [Fleet] Add telemetry for integration cards (elastic#115413)
  🐛 Fix single percentile case when ES is returning no buckets (elastic#115214)
  ...

# Conflicts:
#	x-pack/plugins/reporting/public/management/__snapshots__/report_listing.test.tsx.snap
timroes pushed a commit that referenced this pull request Oct 19, 2021
* WIP replacing indexPattern.flattenHit by tabify

* Fix jest tests

* Read metaFields from index pattern

* Remove old test code

* remove unnecessary changes

* Remove flattenHitWrapper APIs

* Fix imports

* Fix missing metaFields

* Add all meta fields to allowlist

* Improve inline comments

* Move flattenHit test to new implementation

* Add deprecation comment to implementation

* WIP - Show ignored field values

* Disable filters in doc_table

* remove redundant comments

* No, it wasn't

* start warning message

* Enable ignored values in CSV reports

* Add help tooltip

* Better styling with warning plus collapsible button

* Disable filtering within table for ignored values

* Fix jest tests

* Fix types in tests

* Add more tests and documentation

* Remove comment

* Move dangerouslySetInnerHTML into helper method

* Extract document formatting into common utility

* Remove HTML source field formatter

* Move formatHit to Discover

* Change wording of ignored warning

* Add cache for formatted hits

* Remove dead type

* Fix row_formatter for objects

* Improve mobile layout

* Fix jest tests

* Fix typo

* Remove additional span again

* Change mock to revert test

* Improve tests

* More jest tests

* Fix typo

* Change wording

* Remove dead comment

Co-authored-by: Kibana Machine <[email protected]>
# Conflicts:
#	src/plugins/data_views/public/index.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed buildkite-ci Feature:Discover Discover Application release_note:fix Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Values above ignore_above mapping limit don't show up in Discover anymore
10 participants