Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Overview of reindex issues with NLP #113948

Open
maxhniebergall opened this issue Oct 2, 2024 · 1 comment
Open

[ML] Overview of reindex issues with NLP #113948

maxhniebergall opened this issue Oct 2, 2024 · 1 comment
Assignees
Labels
>bug Feature:NLP Features and issues around NLP >feature :ml Machine learning Team:ML Meta label for the ML team

Comments

@maxhniebergall
Copy link
Contributor

maxhniebergall commented Oct 2, 2024

Background

Reindex allows users to create new indexes with data that is already in elasticsearch. This is especially useful for moving to semantic search because users often have already implemented text search and want to embed their existing data in a new index. Unfortunately, reindex has some flaws that make it difficult or impossible to use for larger datasets and when using machine learning models to produce embeddings.

Problems

Resiliency - Issues with failures and errors

Issues with size

Issues with performance

Issues with scroll

  • Its possible to hit the scroll limit if you have a lot of shards Empty scroll contexts don't count #86407
  • Scroll stores results in memory for a specific amount of time that isn't tied to the completion of the reindex.

Possible solutions in the works?

#27724 (comment)

@maxhniebergall maxhniebergall added :ml Machine learning >bug >feature Feature:NLP Features and issues around NLP labels Oct 2, 2024
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Feature:NLP Features and issues around NLP >feature :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

3 participants