Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ver3 and later will not cope with LARGE .bib files on 32bit win7 Laptop #1715

Closed
ajbelle opened this issue Aug 10, 2016 · 12 comments
Closed
Labels
groups search status: waiting-for-feedback The submitter or other users need to provide more information about the issue

Comments

@ajbelle
Copy link

ajbelle commented Aug 10, 2016

On my 64bit destop Win7 PPC I have used all Jab vers3 up to 3.5 with all providing a significant increase in speed over pre Ver3. My .bib file of 49.34MB has 29,933 entries. However on my Dell Latitude dual core laptop running 32bit win7 and Java Ver8.91. editing this file has not been possible since Ver3!

JabRef does display the pop-up that some entries still have no bibtexkey, but it gets no futher. The JabRef.exe Memory usage is just over 300MB, with the duo-core laptop sitting at 30% plus usage (meaning the executable is accessing two CPU.

The exception is Ver3.0 loads slowly at less than 300MB but any action such as open entry, scroll entries edit and save take CPU and memory usage to the 300+MB 30+% usage and lock up JabRef for several minutes until the next input is possible. It is not disk bound and the the total RAM used on the system is 1.2G out of 3G so it seems to be a max out on a single CPU which goes to 100% frequency and full load each lock up. Thus I conclude it is not faulty code per se, but code which is pushing the hardware beyond its limits, where this hardware is not efficiently interfacing between the two threads.

Rolling back to Ver 2.9.2 returns performance to that comparable with my 64bit desk top PC. The only significant performance monitor difference of Ver2 is that the CPU usage peaks at 25% suggesting JabRef does not attempt to use more than one CPU. The RAM usage is higher at 356MB. This is reason I believe it is JabRef related as all versions previous to V3 worked fine on this hardware, and no other known changes have been made to the laptop.

My conclusion is to increase speed Ver3 uses multiple threads which when implemented in the 32 bit versions is locking up my laptop.Would a non-multithreading preference setting for low spec PC would solve this issue?

PS: The latest version all work great and fast on the Laptop with TRIVIAL .bib files as one would generate for a single paper.

@koppor
Copy link
Member

koppor commented Aug 10, 2016

We did not change that much from 2.11.1 to 3.0. Did you try 2.10 and 2.11.1, too? Could it be that the issue was introduced in one of these versions?

We internally collect bib files of others for testing. I can't find one of yours. Would you mind sharing it with us? You can send it via email to me personally.

@ajbelle
Copy link
Author

ajbelle commented Aug 11, 2016

THX @koppor. You may be right because returning to my desktop PC I found that I had corrupted files (a small example of which I attach). If you search with a text editor on the last entry JabRef loads you will find the following entry is incomplete and is then followed by a complete version!!! This has been done by JabRef_windows-x64_3_6dev--snapshot--2016-08-10--master--4db7557.exe which I installed to regain access to GOOGLE Search on my desktop box. @#*&%^& took out my main file which I am now reconstructing :-(

This is the second time I have found an older version 2.9.2 has been able to fix something Ver3 no longer seems to be able to. Somehow JabRef_windows-x64_3_6dev--snapshot--2016-08-10--master--4db7557.exe makes a 'half corrupt version' which I managed to re-opened with 2.9.2 and save to reduce lost work. JabRef_windows-x64_3_6dev--snapshot--2016-08-10--master--4db7557.exe can open it, but saves it in a more corrupt state that not even it can open. None of the Ver3 releases can open the any mashed files, but Ver 2.9.2 could. Robustness is more important than speed to me.

Sorry it is still confusing to me. Will retest Ver3+ on the laptop with a non-corrupt file and update tomorrow. I have not put in an issue on the development release file corruption.

CorruptExample.zip

@stefan-kolb stefan-kolb added the status: waiting-for-feedback The submitter or other users need to provide more information about the issue label Aug 11, 2016
@oscargus
Copy link
Contributor

A hint here is that the incomplete entry ends just before the @ in the email address. No idea what that means, but my give a hint to those more knowledgeable in the parser/writer.

Did the previous version of you bibfile by any chance have quotation marks around that field?

@matthiasgeiger
Copy link
Member

Okay I can reproduce the behavior that breaks your file @ajbelle

Issue is tracked at #1716 as this is unrelated from the performance issue you originally reported here.

@ajbelle
Copy link
Author

ajbelle commented Aug 11, 2016

@matthiasgeiger THX and yes. I checked again last night with uncorrupted files and the behaviour reported in my first post still occurs on the 32bit Win7 duo-core (4 CPU) laptop.

@oscargus no changes to the .bib file, but matthias already identified the issue, which makes sense to me.

Only V3.0 gets the large .bib file open (I tried a slightly smaller version with the same results), but then any interaction immediately locks the processor to 25% or more for 30+ seconds as if it is processing the whole file for every user input. My question would be is "what is ver3 doing for each user input that requires extensive CPU power that earlier versions did not?" Ver 2.9.2 gets it open in 'normal' time and has no delay on trivial user inputs like opening a different entry.
That is all I can contribute as have deadlines for the next six month or fail PhD :-( JabRef is very important to me.

PS: I notice when I use mulitple cores on my I7 win7 desktop to do a six CPU CFD run the speed performance of post ver3 JabRef takes a hit, hinting at the behaviour witnessed on the laptop, suggesting the increased CPU requirement is universal. This did not happen pre Ver3.

@koppor
Copy link
Member

koppor commented Aug 11, 2016

@ajbelle Good look with our PhD! It would be really nice if you tested JabRef 2.10 and 2.11.1 available at https://sourceforge.net/projects/jabref/files/jabref/. This would help us to track down the root cause of the high CPU load.

@ajbelle
Copy link
Author

ajbelle commented Aug 15, 2016

@koppor THX Oliver. I installed 2.10 and 2.11.1 on the 32bit Win7 laptop and tested as you asked. The results were the same as Ver 2.9.2. with slight difference in timing of when things finally showed up, but operation was fine. I hope this helps.

On my main I7 machine I have noticed that with each release the amount of stuff JabRef seems to do automatically with each operation increases (Within groups it seems to re-sort on the fly, with entries moving and jumping about for a few seconds before you can do the next step). There seems to be more lag in general. This is not noticable with small .bib files like I imagine programmers test with.

As an aside, due to JabRef UI logic, I grabbed the whole 30000 entries with "select All" that I thought applied to just a group and pasted to an equivalent file!!!! It took my Jabref 3.5 I7 Win7 Desktop PC to 100% CPU usage for hours before it finished (it didn't display the "Resolve duplicates" screen it should have). I was impressed JabRef managed it at all. I have never seen JabRef consume more than two CPU :-)

@ajbelle
Copy link
Author

ajbelle commented Aug 28, 2016

Update: I migrated V3.5 to my home desktop PC is 64Bit Intel I7 Win7 home edition and the news is not quite as bad as the 32Bit lap-top but after creating a few more groups in the database this PC too started the very delayed response to user input. The only difference I can see from my Uni PC is that I only have 4GB of RAM, instead of 32GB. Checking usages indicates that it is operating within memory limits without excessive disk access.

My uninformed impression is that in order to gain speed JabRef pre-processes things like Groups every update cycle, but when you have a large .bib file and multiple groups this massively increases the memory/CPU load, locking JabRef until it is complete. If @koppor 's large .bib file doesn't exhibit this issue, it may be it has no groups. I set up 11 groups and after each new one JabRef seemed to exhibit more lag. Also, about half my entries have an Abstract or Review which effectively may cause greater load (eg: the 30k entries at 44.6Mb reduced to 20.3Mb file after removing them, and I wonder if these could be relegated to a secondary file having only the Bibtexkey, Abstract or Review as I only need them when looking in depth at a paper!? For global searching it pulls up too many false positives ).

I conclude the locking is not a "bug", but how JabRef uses up Windows resources when presented with my .bib file. Thus it is a question of whether a warning about size limits on lesser specification PCs is required, or if better programming can provide an option to select manual update instead of a complete refresh for every user input cycle.

@oscargus
Copy link
Contributor

@ajbelle You are probably correct in that we do not have that big databases with many groups. However, it is clearly in our interest to get hold of one of those, so if you could send it to @koppor (email in this profile) it would really be helpful (or maybe you already have).

Although we have generated "large" artificial databases and improved performance quite a bit based on those in some areas, they clearly didn't cover all the relevant aspects (as obvious from your performance problems).

My guess is that the review and abstract is not really causing the performance problem as, while they consume quite a bit of memory as you indicates, not that many operations are really done for them. Instead I would expect that some of the code monitoring e.g. added entries may be triggered in an inefficient way when loading such a big database and that the updates of dynamic groups maybe can be more efficiently triggered. However, these are parts where I personally have limited insight.

@ajbelle
Copy link
Author

ajbelle commented Sep 5, 2016

Thank you @oscargus.

I confirm that it is the Groups that is killing JabRef. The attached files (contains just the groups I set up) which even my 32MB RAM university PC started to show UI lags! I have had to remove the groups making JabRef not nice for the purpose I hoped and significantly increases the manual work I have to do.

The reason I have not sent my database is that it 'part belongs to a research group' and I don't have authority to share it with Oliver @koppor. I will look if I can somehow provide a version that avoids this problem in confidentially. It is full of garbage that trips up JabRef due to its import from Endnote, pulling direct from the internet and PDFs, plus my own special fields and formats ;-) JabRef crashes on me a least twice per day!

Your improved search provision anyfield and anykeyword per issue #1633 may reduce the expression complexity and hopefully CPU load of the searches.

a setting to disable auto-Group update (and maybe other background auto-updates in response to UI) for larger files is a good solution (if the update code is in a single module). Even with my cut down 32Mb 22k entry file JabRef it exhibits bad delays. Power-users would be happy to manually update changes to avoid the unworkable lag IMHO.

AJB_Groups.bib.txt

@lenhard lenhard added the groups label Dec 6, 2016
@lenhard
Copy link
Member

lenhard commented Dec 6, 2016

@ajbelle: Is this problem still present in the current 3.7? There have been some performance improvements between 3.6 and 3.7 (not really related to groups, though). I'd like to verify that we still need to work on this.

@lenhard
Copy link
Member

lenhard commented Dec 16, 2016

I am closing this now due to a lack of reaction. Feel free to reopen if the problem persists in the most recent version of JabRef.

@lenhard lenhard closed this as completed Dec 16, 2016
@koppor koppor added the search label Nov 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
groups search status: waiting-for-feedback The submitter or other users need to provide more information about the issue
Projects
None yet
Development

No branches or pull requests

6 participants