ElasticSearch Support #589

stevendanna · 2015-10-29T03:22:05Z

This patchset adds support for using ElasticSearch as the chef index
backend used by the Chef Server.

To facilitate using multiple backends, chef_index is now the main entry
point for nearly all outside access to the chef_index app. The only
exception currently is the status module.

Since both solr and elasticsearch use lucene-based queries, the
query-building code has been moved into chef_index_query to facilitate
sharing. Common HTTP handling code was moved to chef_index_http.
However, I opted for a bit of duplication in some cases over complicated
dispatch since these modules are still intertangled.

Reviewer Notes

chef_index_expand still knows a bit too much about the rest of the system.
I think some more refactoring there might be in order but I've held off for now.
If you have thoughts about how to avoid ++ here: https://github.com/chef/chef-server/pull/589/files#diff-7868f3672b3dbb65c8674e2708958627R136 I would appreciate
them as I would like to avoid the concatenation there if possible.

The "total" field is not guaranteed to match the actual count of returned objects. In practice, this has been working for solr since we force an explicit commit between deletes and creates. However, these tests fail when run against CloudSearch which does not support an existing commit. NB: The total is reported directly from Solr/Cloudsearch and may contain delete nodes that are filtered from the result. However, we cannot adjust total or we risk clients hitting errors related to pagination since Solr/Cloudsearch controls pagination.

…aking

marcparadise · 2015-10-29T16:19:05Z

Re: the concatenation - from some testing, looks like our best bet will be:

make_doc_for_add(Command = #chef_idx_expand_doc{id = Id, type=Type, search_provider=elasticsearch}) ->
    MetaFieldsPL = meta_fields(Command),
    Content = [MetaFieldsPL |
               [maybe_data_bag_field(elasticsearch, Type) |
               [{<<"content">>, iolist_to_binary(make_content(elasticsearch, Command, MetaFieldsPL))}]]
              ],
    [jiffy:encode(index_data(Id)), <<"\n">>, jiffy:encode({Content}), <<"\n">>].

index_data(Id) ->
    {[{<<"index">>,
       {[ {<<"_index">>, <<"chef">>},
          {<<"_type">>, <<"object">>},
          {<<"_id">>, Id} ]}
      }]}.

tylercloke · 2015-10-29T16:43:13Z

Things generally look good to me! Pretty straightforward. I'm kind of concerned about the amount of code that feels boilerplate-y with adding a service and adding an erchef endpoint, but I'm not really sure what would be easy to abstract. Looking pretty solid.

stevendanna · 2015-10-29T16:46:04Z

I'm kind of concerned about the amount of code that feels boilerplate-y between adding a service and erchef endpoint, but I'm not really sure what would be easy to abstract.

Which bits are you concerned about in particular? I might be able to pull over some of the abstraction we were using previously.

tylercloke · 2015-10-29T16:49:57Z

I guess looking a second time I'm not sure what I'd actually abstract, but:

Make a same-y LWRP
Make a same-y preflight check
Make a same-y erchef endpoint

Just feels like a lot of steps that aren't different enough to merit how much code they actually are. It feels like we could generate a lot of that, but I'm not sure if it saves us enough / we do it often enough to merit looking into something like generators for those things.

Like I said, I'm not sure what I'd actually abstract, it just feels like a lot of code doing similar things which gives a bad smell.

stevendanna · 2015-10-29T16:53:28Z

@tylercloke Ahh. Yeah, my supreme hope is that we keep the number of backend services to a minimum so that we don't feel the need to abstract too much there. However, I agree that the Preflight validator felt pretty copypasta as I was writing it. It's interesting because it is very similar but slightly different. For instance I have preflight check for it the external-url changed, but rather than disallowing it, I just tell the user to run a reindex if they want to do that. I'll do some thinking on this point.

tylercloke · 2015-10-29T17:07:18Z

I don't think there is any serious action to take. That was just my impression reading through the last two major PRs so I thought I'd point it out.

marcparadise · 2015-10-29T17:15:56Z

We could probably look into making some custom rebar templates for the most common stuff on the erlang side

This patchset adds support for using ElasticSearch as the chef index backend used by the Chef Server. To facilitate using multiple backends, chef_index is now the main entry point for nearly all outside access to the chef_index app. The only exception currently is the status module. Since both solr and elasticsearch use lucene-based queries, the query-building code has been moved into chef_index_query to facilitate sharing. Common HTTP handling code was moved to chef_index_http. However, I opted for a bit of duplication in some cases over complicated dispatch since these modules are still intertangled. ChangeLog-Entry: Chef Server now supports Elasticsearch as a search indexing backend in addition to solr.

Rebar3 won't run these tests unless they are named after an existing module.

`chef-server-ctl reconfigure` can now correctly create the required elasticsearch search index for chef-server.

ElasticSearch Support

stevendanna added the in progress label Oct 29, 2015