Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying destination index pipelines after a processor changes the target index #57968

Closed
tvernum opened this issue Jun 11, 2020 · 11 comments · Fixed by #59522
Closed

Applying destination index pipelines after a processor changes the target index #57968

tvernum opened this issue Jun 11, 2020 · 11 comments · Fixed by #59522
Assignees
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team

Comments

@tvernum
Copy link
Contributor

tvernum commented Jun 11, 2020

When evaluating ingest pipelines, we determine the full set of pipelines upfront (taking into account default_pipeline and final_pipeline) based on the index name that is included in the incoming request.

While executing those pipelines it is possible that a processor will change the destination index , for example (trivial example)

PUT /_ingest/pipeline/redirect
{
  "description": "Redirect to another index",
  "processors": [{ "set": { "field": "_index", "value": "new-destination" } }]
}

If that happens it is possible that the new destination index has its own set of default/final pipelines, but we do not execute them.

It would be preferable if the pipelines attached to the new destination were executed, although there are some questions/issues to work through

  • if the default_pipeline redirects to a new index, should we still execute the final_pipeline of the original index?
  • should we execute the default_pipeline of the new destination, or just the final_pipeline?
  • how do we prevent the document from being infinitely redirected between indices (could we set a rule that the pipeline of the destination index may not change the destination a second time?)
@tvernum tvernum added >bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP needs:triage Requires assignment of a team area label labels Jun 11, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jun 11, 2020
@tvernum
Copy link
Contributor Author

tvernum commented Jun 11, 2020

CC: @ruflin

@ruflin
Copy link
Member

ruflin commented Jun 11, 2020

@exekias @andresrc This is relevant for our discussion to potentially use a "routing" ingest pipeline.

@martijnvg
Copy link
Member

I think we should aim to keep logic of executing pipelines after the _index has been changed simple.

Prior to performing ingest, a list of pipelines is determined. These pipelines are executed. If
at the end the _index has been changed then only the final pipeline of this index (if exists) should
be executed.

if the default_pipeline redirects to a new index, should we still execute the final_pipeline of the original index?

Given with the above logic, yes.

should we execute the default_pipeline of the new destination, or just the final_pipeline?

I think only the final pipeline should be executed.

how do we prevent the document from being infinitely redirected between indices (could we set a rule that the pipeline of the destination index may not change the destination a second time?)

I think it is fine if request or default pipelines change the index. However I think that final pipelines shouldn't be changing the index.

@gwbrown gwbrown removed the needs:triage Requires assignment of a team area label label Jul 6, 2020
@martijnvg
Copy link
Member

We discussed this issue with a broader audience and think that the re-execution logic of pipeline should be limited:

  • When a final pipeline is executed, it always should be the last pipeline being executed. Also the fact that a final pipeline can change the _index key during ingest is a bug, which needs to be addressed.
  • If the request pipeline changes the _index then this changes the default pipeline and final pipeline to be executed based on whether the new index has a default or final pipeline. If the original index was configured with a default or final pipeline, then these pipelines aren't executed.
  • If the default pipeline changes the _index then it changes the final pipeline. If there is a final pipeline for the original index then that pipeline isn't executed and if there is a final for the new index then that pipeline is executed.

The above restrictions would avoid infinitely redirections between pipelines.

@ruflin @tvernum What do you think about this?

@tvernum
Copy link
Contributor Author

tvernum commented Jul 8, 2020

Just to check: In the current proposal, if a default pipeline redirects to another index then the default pipeline on that new destination index would not be executed. Is that correct?

I do wonder whether that works for Ingest Manager - My understanding is that they use both default and final pipelines, and would want to rely on both being executed after routing to a new index.
Perhaps they could use the pipeline processor to work around that, but that assumes that the routing process knows which pipelines it needs to execute.
May be could introduce an option on the pipeline processor to lookup the default pipeline for an index and execute that... as long as there's some way for Ingest Manager/Fleet to say "please execute the default pipeline on the new destination index", even if it needs to be explicit.

@martijnvg
Copy link
Member

In the current proposal, if a default pipeline redirects to another index then the default pipeline on that new destination index would not be executed. Is that correct?

Yes, that is the idea.

I do wonder whether that works for Ingest Manager

@ruflin Can you comment on whether the proposal works for ingest manager?

@ph
Copy link
Contributor

ph commented Jul 9, 2020

@martijnvg Looking at your proposal I think this makes sense. @andresrc and @exekias do this still work in the routing idea? I think so.

@roncohen
Copy link

for ingest manager, since we control the pipelines, we could reasonably make the necessary adjustments to the final destination "final" pipelines to make this work. Sketching the solution out:

  • we have a "generic" data stream with a default/required pipeline that knows about the packages installed.
  • we send all data there, and the pipeline will redirect to other data streams by checking specific fields in the document that indicate which input/package they came from and changing _index accordingly
  • we have added the necessary parsing logic to the final pipelines of the data streams that they get directed to

One downside that comes to mind is that if a user wants to bypass the "final" pipeline for some reason, they can't. As an example, as far as I understand, reindexing will apply the "final" pipeline again, which could be problematic.

I don't want to derail the conversation too much, but I think it's worth considering if we need to take a step back and think about building the right abstractions into Elasticsearch for a routing/parsing pipeline like this instead of making do with what we have - similar to what we did with data streams - which I think will end up being incredibly valuable for our users over the long term because we now have an abstraction that fits much better with what our users are trying to achieve and it paves the way for us to deliver new features on this abstraction that would have been otherwise very difficult.

@martijnvg
Copy link
Member

One downside that comes to mind is that if a user wants to bypass the "final" pipeline for some reason, they can't. As an example, as far as I understand, reindexing will apply the "final" pipeline again, which could be problematic.

The idea of final pipelines is that it can't be bypassed, so maybe in that case a default pipeline should be used?

I don't want to derail the conversation too much, but I think it's worth considering if we need to take a step back and think about building the right abstractions into Elasticsearch for a routing/parsing pipeline like this instead of making do with what we have - similar to what we did with data streams - which I think will end up being incredibly valuable for our users over the long term because we now have an abstraction that fits much better with what our users are trying to achieve and it paves the way for us to deliver new features on this abstraction that would have been otherwise very difficult.

👍 I think it is a good idea to take a good look at the current implementation and consider alternatives for it.

probakowski added a commit that referenced this issue Jul 16, 2020
This change fixes final pipelines if destination index is changed during pipeline run:
-final pipelines can't change destination anymore, exception is thrown if they try to
-if request/default pipeline changes destination final pipeline from old index won't be executed
-if request/default pipeline changes destination and new index has final pipeline it will be executed
-default pipeline from new index won't be executed
Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly.

Closes #57968
probakowski added a commit to probakowski/elasticsearch that referenced this issue Jul 17, 2020
…59522)

This change fixes final pipelines if destination index is changed during pipeline run:
-final pipelines can't change destination anymore, exception is thrown if they try to
-if request/default pipeline changes destination final pipeline from old index won't be executed
-if request/default pipeline changes destination and new index has final pipeline it will be executed
-default pipeline from new index won't be executed
Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly.

Closes elastic#57968
probakowski added a commit that referenced this issue Jul 17, 2020
…9522) (#59746)

* Fix handling of final pipelines when destination is changed (#59522)

This change fixes final pipelines if destination index is changed during pipeline run:
-final pipelines can't change destination anymore, exception is thrown if they try to
-if request/default pipeline changes destination final pipeline from old index won't be executed
-if request/default pipeline changes destination and new index has final pipeline it will be executed
-default pipeline from new index won't be executed
Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly.

Closes #57968
probakowski added a commit to probakowski/elasticsearch that referenced this issue Jul 17, 2020
…astic#59522) (elastic#59746)

* Fix handling of final pipelines when destination is changed (elastic#59522)

This change fixes final pipelines if destination index is changed during pipeline run:
-final pipelines can't change destination anymore, exception is thrown if they try to
-if request/default pipeline changes destination final pipeline from old index won't be executed
-if request/default pipeline changes destination and new index has final pipeline it will be executed
-default pipeline from new index won't be executed
Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly.

Closes elastic#57968
probakowski added a commit that referenced this issue Jul 17, 2020
…9522) (#59756)

* Fix handling of final pipelines when destination is changed (#59522)

This change fixes final pipelines if destination index is changed during pipeline run:
-final pipelines can't change destination anymore, exception is thrown if they try to
-if request/default pipeline changes destination final pipeline from old index won't be executed
-if request/default pipeline changes destination and new index has final pipeline it will be executed
-default pipeline from new index won't be executed
Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly.

Closes #57968
@ruflin
Copy link
Member

ruflin commented Aug 17, 2020

@tvernum I file an issue around being able to specify multiple ingest pipelines in a data stream and perhaps this could also help with the problem here: #61185

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants