-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
undefined method `traverse' #283
Comments
Ruby 1.9 and libxml 2.7.3 |
Hello! It looks like you're using the sanitize gem. In order for us here at Nokogiri HQ to determine what's going on, we need a reproducible test case. Can you submit a short snippet of code that reproduces what you're seeing? A link to a gist would be perfect. Thanks so much! |
Ok, I have a reproducible test case now which causes a seg fault, not just a backtrace. The problem is that I have a large (200k) text file which has the messages which cause the crash. That would be difficult to put in a Gist. I will email it to you. The crash does not happen on OSX with libxml 2.7.6 but does happen on Ubuntu Hardy with 2.7.3 (with 1.4.1 and 1.4.2). |
Upgrading to 2.7.7 with 1.4.2 and it still crashes. Same stack. |
Can you send use the test case? |
Mailed it to flavorjones. |
Were you guys ever able to reproduce this? |
Same for me. nokogiri 1.4.2 Downgrading nokogiri to 1.4.1 removes the issue. Regards, Christian |
@mperham I am not able to reproduce. Have you tried with 1.9.2-preview3? Or with trunk ruby? |
@datenimperator do you happen to have a script that will reproduce the problem? |
Yes, I get the same crash with:
on Ubuntu. OSX still works fine.
|
The following smaller test case reproduces the crash for me: Just run as 'ruby clean.rb'. It looks like one of the children iterated over on xml/node.rb line 592 is not a valid Ruby object as it crashes even if I just 'p' the child.
|
It has to do with the transformers passed into the sanitize call. Can you look at the cleaner() method in the gist and let me know if it looks correct? I'm trying to replace BR and P tags with a single space so HTML like this " Hello
|
I was able to reduce the crash on ubuntu to this:
Ruby 1.8.7, libxml 2.7.5. I will try with 2.7.7 later. I need a break now, that was hard work. :-) |
@tenderlove: thanks. I'll get on it. |
Looks like text-node-merging madness. Reproduced with both libxml 2.7.5 and 2.7.7. |
Ah, OK. This example crashes because the tree is being modified in a very particular way as it's being traversed. Inserting a text node merges the string with any adjacent text nodes, which can result in nodes disappearing. In the above example, the result will be a single text node containing "CA Sr Technical Architect", and one of the original text nodes will be freed. We should be able to work around this. Let me play with it. In the meantime, though, a great workaround would be to use xpath to replace these nodes:
|
handle merged text nodes better. closed by 2867d4b. |
it's probably worth looking at the commit to understand exactly why this case crashes. |
Thanks for the hard work! |
You are most welcome! |
Seeing this when upgrading from 1.4.1 to 1.4.2:
The text was updated successfully, but these errors were encountered: