Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke default pipeline of new index #85931

Closed

Conversation

felixbarny
Copy link
Member

@felixbarny felixbarny commented Apr 15, 2022

When a processor changes the _index field, only the final pipeline of the new index will be executed.
This change preserves that behavior but introduces another _redirect metadata field that behaves similarly to setting _index.
The difference is when setting this field, the default pipeline of the target index will be executed. This PR also detects and prevents loops in the index redirection.

Example: both index_1 and index_2 have a default and final pipeline. the default pipeline of index_1 sets the _redirect field to index_2. This executes the pipelines index_1.default, index_2.default, index_2.final.

Required for

Previous discussions:

@felixbarny felixbarny added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.3.0 labels Apr 15, 2022
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Apr 15, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Apr 15, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @felixbarny, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:07
@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
@rahuldimri
Copy link

@felixbarny is there anything where I can help ?

@felixbarny
Copy link
Member Author

@rahuldimri thanks for asking. We're still in internal discussion on how to solve #63798. This PR is part of a prototype which may not make it in as-is.

Can I ask you why you're interested in this PR? Do you have a use case that could be solved by this? If so, what's your use case?

@csoulios csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022
@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@rjernst rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023
@felixbarny felixbarny force-pushed the invoke-default-pipeline-of-new-index branch from 323c923 to 539d6ad Compare February 21, 2023 07:58
@felixbarny felixbarny requested a review from dakrone February 21, 2023 11:07
@@ -39,6 +39,7 @@
*/
public class Metadata {
protected static final String INDEX = "_index";
protected static final String REDIRECT = "_redirect";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if _redirect should also be added to IndexRequest. I guess that could potentially expose _redirect to the _bulk API where this doesn't make much sense. It's also more like an alias to _index but with slightly different semantics.

@dakrone do you have a suggestion?

executePipelines(pipelines.iterator(), hasFinalPipeline, indexRequest, ingestDocument, documentListener);

LinkedHashSet<String> indexRecursionDetection = new LinkedHashSet<>();
indexRecursionDetection.add(indexRequest.index());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More general question: I assume during the routing, index() will always be the data_stream name for data_streams and the data_stream is only resolved to a write index after the processing happened? Asking it here because the data stream name is the one that should be part of the index recursion detection, not the write index as this could even change during ingest time I assume.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, _index isn't the internal write index of the data stream.

@felixbarny
Copy link
Member Author

Closing in favor of #94000

@felixbarny felixbarny closed this Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants