-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Very long lines (or maybe very long attributes?) are silently corrupted, on parsing and immediately saving file #2201
Comments
Sadly, it appears to be triggered by long attributes, so it is not feasible to further reduce it. Please let me know if further detail, testing or a better test case is needed. |
Hi, @matkoniecz, thanks for opening up this issue, and I'm sorry that you're having trouble. I'll try to help. I can't reproduce what you're seeing at the moment without a version of the "bug.svg" file. Can you share that? One guess is that you may be running up against libxml2 size limitations. You should be able to verify that by looking at You may also want to try setting the |
It is at https://github.com/matkoniecz/nokogiri_testcase/blob/master/bug.svg Is it not fetched by
And yes, BTW, any idea how can I modify |
Ah, sorry, I was looking for a hyperlink in your original post and missed this. Thanks.
Well, again, if there's a problem parsing then it should show up in
This is covered pretty well in the tutorial I linked to, https://nokogiri.org/tutorials/parsing_an_html_xml_document.html#parse-options Try something like:
|
Maybe https://nokogiri.org/tutorials/parsing_an_html_xml_document.html should mention checking for errors if it is happening silently and may damage data? I am really surprised that it happened silently. And yes, there is
Thanks for a help! My code will now have
hopefully there is no need to check other places for silent failures. |
In the Parse Options section, you'll find this description:
If you want to change Nokogiri's behavior to raise an exception when warnings or errors are detected, set the svg_merged = Nokogiri::XML(text) { |config| config.noblanks.huge.norecover } I'm glad we were able to figure this out! |
Note I've created an issue to better document these parse options: sparklemotion/nokogiri.org#34 |
Thanks! Is it possible to at least log some errors when one hits
I noticed it, but I have not thought that Nokogiri will corrupt XML on its own :/ One of first things that I tried was checking with another tool whatever xml input is corrupt or not. |
It would be challenging to figure out whether an error registered by libxml2 during parse time is due to the size limits. Because it's hard to determine the source of the error, we handle them all the same way -- by adding them to the |
Please describe the bug
Very long lines (or maybe very long attributes?) are silently corrupted, on parsing and immediately saving file.
"/>
at end of element with long attribute is turned into"/>
Help us reproduce what you're seeing
To run this code with data triggering it at least on my computer:
Expected output that demonstrates bug (note
"/>
vs"/>
):There is also sample file not triggering it. In general once line is less outrageously long, corruption is disappearing.
Expected behavior
No corruption or at least a crash rather than a silent corruption of data.
Environment
Additional context
It is file from actual use, not some deliberately silly attempt.
I have taken ocean data from OpenStreetMap, converted to geojson and later generated SVG.
I am trying to use nokogiri to merge this SVG file with another parts of the map (as one of final steps toward burning this map on wood using laser cutter).
Lubuntu Image Viewer also refuses to work with this file, but at least clearly crashes
Inkscape opens both files without complaints.
For my use case I would be fine with crash (simplifying data input was on my todo list anyway, 10MB in one line is quite weird), but silent data corruption is irritating. Though supporting such input would be probably nice.
The text was updated successfully, but these errors were encountered: