-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing: Support automatic reindex for objects created while Solr is down (Near Realtime Search) #702
Comments
We'll compare these fields to know if Solr data is stale
Log instead of throwing an Exception.
@pdurbin This appears to have been addressed somewhat by asynch indexing. Closing but feel free to comment and reopen as needed with more detail on what need to happen. |
@kcondon this is the issue I was using to track the idea of handling indexing failure. For example, what if Solr is down? Some reindexing should happen, hopefully automatically (maybe it gets put in a queue?). If this is not a priority, I'm fine with leaving this issue closed. I defer to you on this. #2322 about fault tolerance is related and still open (addressed by pull request #2985). |
Author Name: Philip Durbin (@pdurbin)
Original Redmine Issue: 4160, https://redmine.hmdc.harvard.edu/issues/4160
Original Date: 2014-06-30
Original Assignee: Philip Durbin
Dataverse 4.0 requires "near realtime search" because the moment dataverses, datasets, or files are added, updated, or deleted the "cards" and facet counts must immediately reflect the change.
In order to support near realtime search, we must handle indexing failure and re-try the indexing operation.
As we are designing this system, we should probably consider other cases where detecting failure of a network service and re-trying is desirable, such as:
We should also considering using notifications for cases where re-indexing was attempted several times but continues to fail.
In DVN 3.x there is a method called getUnindexedStudies at https://github.com/IQSS/dvn/blob/3.6.1/DVN-root/DVN-web/src/main/java/edu/harvard/iq/dvn/core/index/IndexServiceBean.java#L1061 that uses the following query to determine which studies need to be re-indexed:
List<Study> studies = (List<Study>) em.createQuery("SELECT s from Study s where s.lastIndexTime < s.lastUpdateTime OR s.lastIndexTime is NULL").getResultList();
Another approach could be to use a database table as a queue (thought this approach could be problematic: https://blog.engineyard.com/2011/5-subtle-ways-youre-using-mysql-as-a-queue-and-why-itll-bite-you/ )
See also:
http://lucene.472066.n3.nabble.com/strategies-for-managing-Solr-indexing-failures-and-retries-td4139186.html
Related issue(s): #229
Redmine related issue(s): 3643
The text was updated successfully, but these errors were encountered: