Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Enhancement/saved object search #15175

Closed

Conversation

chrisronline
Copy link
Contributor

@chrisronline chrisronline commented Nov 27, 2017

Resolves #14729

This will be filled out more, but this is a first stab at how we can possibly make this better. The changes are fairly simple - we're just adding an unanalyzed version of the title field for all saved objects in the .kibana index. Then, we're including that field in our searching.

See this comment for the use cases I want to solve for.

The larger change in this PR is ensuring we can make the above mapping change and that it will automatically affect all existing saved objects on a minor or patch release. To accomplish that, I extended/refactored the kibana_index_patch.js file to make adding a new type of patch easier and adding the work to do the patch required for this PR.

Going tag a few people to get some thoughts

@spalger @tylersmalley @jbudz @stacey-gammon

@chrisronline chrisronline force-pushed the enhancement/saved_object_search branch from a8f51b7 to 583d4d5 Compare November 30, 2017 19:15
@chrisronline chrisronline changed the title Enhancement/saved object search [Discuss] Enhancement/saved object search Dec 4, 2017
'search',
];

export class PatchMissingTitleKeywordFields extends KibanaIndexPatch {
Copy link
Contributor

@spalger spalger Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think that subclassing is kind of a confusing way to configure something. It also makes it more difficult for plugins to add their own configurations if we decide to expose this to plugins down the road (made that mistake with FieldFormats).

What if we just exported configurations that received a context of some sort as an argument? patchKibanaIndex could create instances of KibanaIndexPatch with these configurations internally, if desired.

import { get } from 'lodash';
 
const propertiesWithTitles = [
  'index-pattern',
  'dashboard',
  'visualization',
  'search',
];

export const missingTitleKeywordFieldsPath = {
  async getUpdatedPatchMappings(context) {
    const {
      currentMappingsDsl,
      getRootProperties
    } = context;

    const properties = getRootProperties(currentMappingsDsl);
    const mappings = {};

    for (const property of propertiesWithTitles) {
      const hasKeyword = !!get(properties, `${property}.properties.title.fields.keyword`);
      if (hasKeyword) {
        continue;
      }

      const titleMapping = get(properties, `${property}.properties.title`);
      mappings[property] = {
        properties: {
          title: {
            ...titleMapping,
            fields: {
              keyword: {
                type: 'keyword',
              }
            }
          }
        }
      };
    }

    // Make sure we return a falsy value
    if (!Object.keys(mappings).length) {
      return;
    }

    return mappings;
  },
  
  async applyChanges(context) {
    const {
      log,
      callCluster,
      patchedMappings,
      indexName,
      rootEsType,
    } = context;
    
    const properties = Object.keys(patchedMappings);
    const types = properties.map(type => ({ match: { type } }));
 
    log(['info', 'elasticsearch'], {
      tmpl: `Updating by query for Saved Object types "<%= names.join('", "') %>"`,
      names: properties,
    });
 
    await callCluster('updateByQuery', {
      conflicts: 'proceed',
      index: indexName,
      type: rootEsType,
      body: {
        query: {
          bool: {
            should: types,
          },
        },
      },
    });
  }
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for the suggestion. I've been in the FieldFormat stuff recently and don't want to walk that path if I can avoid it. I'll work on this

'dashboard',
'visualization',
'search',
];
Copy link
Contributor

@spalger spalger Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not apply this to all properties with a title.keyword field?

Copy link
Contributor

@spalger spalger Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using getProperty this could be something like:

const propertiesWithTitleKeyword = Object.keys(getRootProperties(mappings))
  .filter(property => Boolean(getProperty(mappings, [property, 'title', 'keyword'])))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

const mappings = {};

for (const property of propertiesWithTitles) {
const hasKeyword = !!get(properties, `${property}.properties.title.fields.keyword`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using getProperty(this.currentMappingsDsl, [property, 'title', 'keyword'])

@@ -45,8 +45,9 @@ export function getQueryParams(mappings, type, search, searchFields) {
if (search) {
bool.must = [
{
simple_query_string: {
query_string: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain the reason for this a bit? @elastic/kibana-discovery any idea how we can prevent this from being necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need to know requirements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this so we can support letting the user prefix the query with a wildcard, (request).

So, if we had the following four visualizations stored: My Vis, My_Vis, My-Vis, MyVis, the user could query against *Vis and return all four results.

If we use simple_query_string, it seems to filter out the wildcard and only returns two results (My Vis and My-Vis).

This mirrors the search behavior you'd see in a text editor which felt natural and right to me, but it's fair to suggest we do not want to support it.

Copy link
Contributor

@spalger spalger Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, leading wildcard is generally regarded as a problem and simple_query_string probably doesn't allow it. I think that for Kibana's use case it would make sense to use a tokenizer that converts My Vis, My_Vis, My-Vis, and MyVis all into my and vis... Is that an option given the fact that we're going to update_by_query anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ngrams could be helpful https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

But I also wonder whether a leading wildcard will really be a problem over the set of vis documents. .kibana is generally going to be really small. We could add a config to turn wildcards off if we're really paranoid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not concerned about the performance of leading wildcards in saved objects, but I am concerned about switching away from simple_query_string, didn't know you could configure simple_query_string to allow leading wildcards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can support leading wildcards with just simple_query_string:
"reason": "[simple_query_string] unsupported field [allow_leading_wildcard]"

@spalger Can you help me understand the dangers of switching from simple_query_string to query_string?

The docs say:

Unlike the regular query_string query, the simple_query_string query will never throw an exception, and discards invalid parts of the query.

But it feels we easily mitigate the exception by catching it within Kibana server and returning an empty result set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the ramifications of moving from simple_query_string to query_string, but I'm almost certain it's more than just about ignoring errors... Maybe @dakrone could help us understand the impact switching might have on Kibana?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming into this with no history, so I'm not really sure exactly where this is going to be used but...

  • SQS allows you to turn features on and off using flags (not sure if this is useful to you)
  • SQS tries its hardest not to throw exceptions when using invalid query syntax, QS does no such thing and will flip out at the slightest syntax error (again, if you're programatically generating things it might be fine)
  • SQS doesn't support every feature that QS does (like the leading wildcard, as you've been discussing)
  • SQS and QS have different syntax, something like (foo | bar) + baz being different than (foo OR bar) AND baz

To weigh in my unsolicited opinion, If you are comfortable with ngrams as @Bargs recommended, that would be a much easier way to support sub-string search without resorting to fancy querying. You could even have multiple fields and search both, just boosting the non-ngram version.

Hopefully that helps, happy to clarify more if I can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information @dakrone!!

I'll continue to investigate ngrams, but it feels like, for this use case, switching from SQS to QS is a viable option still.

query: search,
minimum_should_match: search.split(' ').length,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of this to make every term required? You could just change the default_operator option if that's the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice!

@chrisronline
Copy link
Contributor Author

chrisronline commented Dec 5, 2017

I updated the description, but this is the use case I'm solving for with this PR: #14729 (comment). Feel free to suggest that this shouldn't be the goal or we should move a different direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants