-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad performance when importing Netscape bookmarks > 3000 entries #985
Comments
I agree this is very important issue for new and serious users. I've been struggling with importing 4583 links from Diigo for almost a week now . First I had some issues with server and file sizes #969, and now I'm stuck with probable parsing errors as described #902. I also have some code in descriptions, but also various other characters that should probably be escaped. I've manually escaped all Since Delicious, Diigo and maybe other services don't properly escape problematic characters on export, would it be bad idea for Shaarli Netscape parser to just accept anything between |
Writing a parser is far from being a trivial task, especially with formats as crappy as the Netscape one. The current parser supports most standard use cases:
Which leaves us with a couple edge case when users attempt to import large dumps with funky data... I've started writing a grammar-based lexer/parser for the Netscape format at https://github.com/virtualtam/hoa-netscape-bookmark-parser ; this will take time for experimenting and testing, but it if works we'll end up with a library that's way better and more maintainable :) |
After manually escaping all Than again I've tried to split the original file (4583 links, 1098716 bytes) in two, and this time I've successfully imported it that way. Therefore I can confirm this issue. It would also be nice if Shaarli message could provide more details about the error because |
Please add this in your config file before running the import: |
I've tried that already and the output is even less informative. So here is the exact log while failing to import the file (the one that I've escaped manually, but didn't split in two smaller ones - which works).
|
Well, this logs means that no content at all is parsed after being sanitized. Maybe it's related to the PR I just made (shaarli/netscape-bookmark-parser#43). @virtualtam The performance issue comes from the history writing. I'll submit a PR to fix it. |
With large imports it has a large impact on performances and isn't really useful. Instead, write an IMPORT event, which let client using the history service resync its DB. -> 15k link import done in 6 seconds. Fixes shaarli#985
Performance issue
When importing Netscape bookmark files featuring a consequent number of entries (>3000), the following happens:
max_execution_time
(default: 30 seconds)504 Bad gateway
is raised due to the application not respondingI've observed this behaviour on the following environments:
This is an issue as the import feature can be a game changer for users looking forward to migrating their data to Shaarli (#902, #969)
Troubleshooting
I've written a script to generate random/fake bookmark dumps of configurable size: generate_netscape_bookmarks.py
Performances start getting bad as soon as the imported file features more than ~3000 entries, and only get worse beyond that point :(
The NetscapeBookmarkParser library behaves decently when parsing these files, so the root cause probably lies in the
NetscapeBookmarkUtils->import()
method, the usual suspects being either theforeach() { ... }
loop, the final disk write operation, or both.The text was updated successfully, but these errors were encountered: