Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkRequest Handling on Failure #666

Closed
sundarv85 opened this issue Dec 20, 2017 · 3 comments
Closed

BulkRequest Handling on Failure #666

sundarv85 opened this issue Dec 20, 2017 · 3 comments

Comments

@sundarv85
Copy link

sundarv85 commented Dec 20, 2017

Which version of Elastic are you using?

[x] elastic.v5 (for Elasticsearch 5.x)

v5.0.30

Please describe the actual behavior

When some bulk request fail:

ERRO[0551] Elastic Flushing Failed                       err=elastic: Error 400 (Bad Request): Validation Failed: 1: index is missing;2: index is missing;3: index is missing;4: index is missing;5: index is missing;6: index is missing;7: index is missing;8: index is missing;9: index is missing;10: index is missing;11: index is missing;12: index is missing;13: index is missing;14: index is missing;

The retry is happening forever. In this case, since the record is having an incorrect index and type, this never succeeds (of course, this shall(should) not happen in the production env)

Please describe the expected behavior

  1. Could we have a max limit for the retries configurable for bulk updates
  2. Another question is - there were 30 records in this request. Only 14 of them have failed. When the retry happens all the 30 are being retried. Could the 14 be identified and only those be retried.
  3. Another interesting behaviour is, I'm using the following code:
	afterCallback := func(id int64, requests []elastic.BulkableRequest, response *elastic.BulkResponse, err error) {
		if err != nil {
			log.WithFields(log.Fields{"id": id, "err": err.Error()}).Error("Elastic Flushing Failed")
			for k := range requests {
				src, rerr := requests[k].Source()
				if rerr == nil {
					for i := range src {
						fmt.Printf("Request[%d] Source[%d]: %s\n", k, i, src[i])
					}
				} else {
					fmt.Printf("Bulkable Request[%d] Not Returned: %s", k, rerr.Error())
				}
			}

			if response != nil {
				failed := response.Failed()
				for i := range failed {
					if failed[i].Error != nil {
						fmt.Printf("Failed[%d]: %+v", i, failed[i].Error)
					}
				}
			}
		}
	}

and I'm not seeing any Failed prints. So how to find which of these 14 were the real failures.

Any steps to reproduce the behavior?

One of my teammate had an issue during his development. So for testing purpose, I gave an incorrect type for certain records and incorrect index for some of them and added them to the bulk.

@olivere
Copy link
Owner

olivere commented Dec 27, 2017

Okay, I see. This looks a bit like we need to be able to use configurable backoff/retries for Bulk API as well (just like Scroll API).

With regards to your questions:

  1. Max limit of retries: You could do that once we have configurable backoff/retries for the Bulk API. Maybe it's worth adding a configurable backoff/retry for every API, and use the client's settings as the default.

  2. Retrying: I really don't want Elastic to be "clever" and pick up the things that "should" happen. I try to simply make Elastic a transparent layer for working with Elasticsearch. I think it's better to let the application pick the correct logic on what should happen in case of an error.

  3. Could you try to log the response and set up a unit test in a separate issue? With that we can check if there's something wrong. Either Elasticsearch changed its response structure or your code is missing something. There already is a test for the "failed" case, so I'm not sure what's wrong here...

olivere added a commit that referenced this issue Jan 8, 2018
This commit extends the `PerformRequestOptions` to pass a custom
`Retrier` per request. This is enabled for the Scroll and Bulk API for
now.

See #666 and #610
olivere added a commit that referenced this issue Jan 8, 2018
This commit backports the `PerformRequestOptions` structure from v6 and
introduces `PerformRequestWithOptions`, which allows us to extend the
list of parameters passed when executing a HTTP request.

This change allows us to now also pass a custom `Retrier` per request,
which we allow for Scroll API and Bulk API via `Retrier(...)` on the
service respectively.

Backported from d2219c2.

See also #666 and #610
@olivere
Copy link
Owner

olivere commented Jan 8, 2018

I just released 5.0.60 and 6.1.1 which both allow to set a per-request Retrier (with configurable Backoff). For now, only Bulk API and Scroll API make use of that. Use Retrier(...) on the BulkService or ScrollService to pass your own retrier for the given service.

Have you looked into the 3rd comment above? Are you able to create a test to reproduce the problem?

@olivere
Copy link
Owner

olivere commented Jan 18, 2018

Closing for inactivity. Let me know if I can do anything, and I'll re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants