Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltext search: Show display of index creation progress #7961

Closed
tobiasdiez opened this issue Aug 2, 2021 · 13 comments · Fixed by #7981
Closed

Fulltext search: Show display of index creation progress #7961

tobiasdiez opened this issue Aug 2, 2021 · 13 comments · Fixed by #7981

Comments

@tobiasdiez
Copy link
Member

Is your suggestion for improvement related to a problem? Please describe.
Currently, no progress or indication of the background index creation is shown. This may irritate some users because they may think that there is a performance bug in jabref due to the high usage of resources.

Describe the solution you'd like
Show an indicator of the progress (how many files to go etc) in the progress icon in the upper right corner of the screen.

@btut @koppor @calixtus

@btut
Copy link
Contributor

btut commented Aug 3, 2021

Right now the progress is shown as indeterminate, as it is quite hard to estimate (the last file in the queue could be the size of all others combined). Something like indexing file 13/42 would be easy, but not much more expressive than just the indeterminate loader IMO.

@koppor
Copy link
Member

koppor commented Aug 3, 2021

Something like indexing file 13/42 would be easy,

I like that, because user can expect that file sizes vary. Similar to the progress of a backup tool.

If users would like to have more, the progress could add the file sizes. Like an asynchronous update of the total number of files and total file size like rclone does.

@tobiasdiez
Copy link
Member Author

The currentfile / all files counter sounds like a sensible thing.
I should also add that the progress icon in the upper right corner currently does not reflect that the indexing is going on, i.e. it shows the tick although if you click on it then it shows the progress bar for "indexing pdfs".

@btut
Copy link
Contributor

btut commented Aug 3, 2021

Then let's do the currentfile / all files counter for now.

@btut
Copy link
Contributor

btut commented Aug 11, 2021

Sorry for the delay. I implemented the currentfile / all files in #7981.

I should also add that the progress icon in the upper right corner currently does not reflect that the indexing is going on, i.e. it shows the tick although if you click on it then it shows the progress bar for "indexing pdfs".

This was intended behaviour actually. I wanted it to actually be completely in the background and only have the progress shown on demand. Now you might ask why and I am afraid I don't have a satisfactory answer. I now agree that having the indicator reflect the progress of the indexing as well would be helpful. Some changes are needed for that though because currently, all index-operations are hidden in a single task. That task is just a placeholder so the indexer shows up in the list of background tasks, but the actual operations are done in tasks that are not displayed at all and just bind the progress and message to the placeholder tasks.

This was done because:

  • The index should only be accessed by a single thread at a time, so operations are queued in the IndexingTaskManager
  • I didn't want the list of background-tasks to overflow from index-operations.

This means:

  • There will always be one indexing-background-task shown in the list of background tasks and it may display whatever operation currently being done.
  • There is no way to have the individual tasks influence the progress indicator, as that shows the average process for all running background tasks, but the task shown is just the placeholder task (never running).

We could solve this by:

  • Showing all the operation-tasks individually (but still queueing them).
  • Introduce a new status property (to replace the running property). This would affect all other BackgroundTasks as well.
  • Keep the current behaviour (only show single task for each database, don't have the indicator reflect the progress).

@tobiasdiez
Copy link
Member Author

I agree that you don't want to have the index operation for each file showing up in the progress pane. It's just too much noise. Thus, I would say showing one "Indexing" task per database is perfectly fine.
But what I don't understand is why you then cannot "have the indicator reflect the progress". With #7981 you do now have an indexing task that has a progress bar that reflects the overall progress. Why not bind the indicator to the progress of this task?

@btut
Copy link
Contributor

btut commented Aug 11, 2021

I agree that you don't want to have the index operation for each file showing up in the progress pane. It's just too much noise. Thus, I would say showing one "Indexing" task per database is perfectly fine.

I guess the usual use-case is not to add lots of individual files in a single use of JabRef, is that correct? Just to clarify, when starting JabRef, there would still be just one task per database updating the index (in case pdf's changed since JabRef was last started). Then, additional tasks would be added for each new/removed pdf. So it's not terribly overwhelming, but I still think having one task per database would be better. If so, we could add a second indicator in the message, something like:
5 of 32 files added to the index. 2 operations pending
to reflect that the indexing will not be done once all files are added.

But what I don't understand is why you then cannot "have the indicator reflect the progress". With #7981 you do now have an indexing task that has a progress bar that reflects the overall progress. Why not bind the indicator to the progress of this task?

I'm struggling to explain myself here since there are two kinds progress indicators, so let's give them names for now:

  • Average progress indicator: The one shown on the top right, giving the average progress of running background tasks
  • Individual progress indicator: The one next to a background task in the menu that opens when clicking the Average progress indicator.

#7981 only updates the individual progress indicator.

The average progress indicator currently averages the individual progress of running progress tasks. The thing is, we cannot write the 'running' property. It is set by Task and since it is only a placeholder task it is never actually run and therefore false. We would need another way of filtering out finished tasks. This would be the second option in my list of ideas:

  • Introduce a new status property (to replace the running property). This would affect all other BackgroundTasks as well.

but I now realized that this would be even more difficult than expected, since for the progress-indicator we only have a list of Tasks, not BackgroundTasks, so we cannot add a property there.

@btut
Copy link
Contributor

btut commented Aug 16, 2021

@tobiasdiez do you have any more thoughts on this? I am struggling to decide how to move forward.

@tobiasdiez
Copy link
Member Author

It is set by Task and since it is only a placeholder task it is never actually run and therefore false.

Why is it only a placeholder task?

I'm not that familiar with how you implemented the index background tasks (and don't really have the time to dive into this), but maybe something like the following gives some ideas:

  • You have one global queue of pdfs that need to be indexed.
  • You have one background tasks that if run checks the queue and indexes all pdfs in the queue (sequentially, in the same thread). This task can use the size of the queue to update it's overall progress.
  • If other operations change the linked pdfs (e.g. new pdf linked, or new entry with pdf added), then they add the corresponding files to the global queue, and make sure that the background tasks is running.
  • Upon opening you may need an additional background task that runs through the entries and checks if all pdfs are indexed, and if not then put them in the queue.

You need to be a bit careful with what kind of queue you are using, since it's accessed from different threads but something like ConcurrentLinkedQueue should handle that.
I know you are currently using a different overall structure, which is working in general. So I'm not really proposing to replace everything, but maybe you can take some elements of the above suggestion and use them.

@btut
Copy link
Contributor

btut commented Aug 18, 2021

Overall I think your approach would probably be more comprehensible and easier to maintain (no need for the placeholder task), but:

You have one background tasks that if run checks the queue and indexes all pdfs in the queue (sequentially, in the same thread). This task can use the size of the queue to update it's overall progress.

The problem that I am currently trying to solve is that the 'average progress indicator' in the menu shows no task running while indexing because of the use of the placeholder task that is never running. With your suggestion, the issue would be the other way around, with the indexing task always running. It's less of a problem, since when the task is idle it's progress would be 100%, but it would still affect the average progress. ... and the more I write about this the less this sounds like an actual problem so let me try to implement your approach :)

Thanks again!

@tobiasdiez
Copy link
Member Author

The background task can also simply stop once the queue is empty, and be restarted if another PDF is added to the queue. I think this is more effective than having the background task check the queue every few milliseconds.

@btut
Copy link
Contributor

btut commented Aug 18, 2021

The background task can also simply stop once the queue is empty, and be restarted if another PDF is added to the queue.

Oh, I thought tasks can only run once.

I think this is more effective than having the background task check the queue every few milliseconds.

Definitely. I expected there to be a locking flag in CuncurrentLinkedQueue to avoid the busy wait, but stoping and restarting sounds like a great way to solve this! Thanks!

@btut
Copy link
Contributor

btut commented Aug 18, 2021

Update: I implemented your idea in d229eee. It simplifies a lot and works like a charm (except a workaround needed to avoid multiplying the tasks when restarting 5811f87)! Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants