Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

undefined method `traverse' #283

Closed
mperham opened this issue May 28, 2010 · 21 comments
Closed

undefined method `traverse' #283

mperham opened this issue May 28, 2010 · 21 comments

Comments

@mperham
Copy link

mperham commented May 28, 2010

Seeing this when upgrading from 1.4.1 to 1.4.2:

04:53:37.481200 2104 page.rb:158: undefined method `traverse' for 30912:Fixnum
04:53:37.481252 2104 page.rb:159: /home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:592:in `block in traverse'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:214:in `block in each'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:213:in `upto'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:213:in `each'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:592:in `traverse'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:592:in `block in traverse'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:214:in `block in each'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:213:in `upto'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:213:in `each'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:592:in `traverse'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:592:in `block in traverse'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:214:in `block in each'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:213:in `upto'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:213:in `each'
/home/onespot/.bundle/ruby/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:592:in `traverse'
/home/onespot/.bundle/ruby/1.9.1/gems/sanitize-1.2.1/lib/sanitize.rb:129:in `clean_node!'
/home/onespot/.bundle/ruby/1.9.1/gems/sanitize-1.2.1/lib/sanitize.rb:105:in `clean!'
@mperham
Copy link
Author

mperham commented May 28, 2010

Ruby 1.9 and libxml 2.7.3

@flavorjones
Copy link
Member

Hello!

It looks like you're using the sanitize gem. In order for us here at Nokogiri HQ to determine what's going on, we need a reproducible test case. Can you submit a short snippet of code that reproduces what you're seeing? A link to a gist would be perfect.

Thanks so much!

@mperham
Copy link
Author

mperham commented May 28, 2010

Ok, I have a reproducible test case now which causes a seg fault, not just a backtrace. The problem is that I have a large (200k) text file which has the messages which cause the crash. That would be difficult to put in a Gist. I will email it to you.

The crash does not happen on OSX with libxml 2.7.6 but does happen on Ubuntu Hardy with 2.7.3 (with 1.4.1 and 1.4.2).

@mperham
Copy link
Author

mperham commented May 28, 2010

Upgrading to 2.7.7 with 1.4.2 and it still crashes. Same stack.

@tenderlove
Copy link
Member

Can you send use the test case?

@mperham
Copy link
Author

mperham commented May 28, 2010

Mailed it to flavorjones.

@mperham
Copy link
Author

mperham commented Jun 1, 2010

Were you guys ever able to reproduce this?

@datenimperator
Copy link

Same for me.

nokogiri 1.4.2
ruby 1.8.7 (2010-01-10 patchlevel 249) [i686-darwin10]
libxml2 2.7.7

Downgrading nokogiri to 1.4.1 removes the issue. Regards,

Christian

@tenderlove
Copy link
Member

@mperham I am not able to reproduce. Have you tried with 1.9.2-preview3? Or with trunk ruby?

@tenderlove
Copy link
Member

@datenimperator do you happen to have a script that will reproduce the problem?

@mperham
Copy link
Author

mperham commented Jun 2, 2010

Yes, I get the same crash with:

rvm install 1.9.2
rvm 1.9.2
gem install nokogiri sanitize
ruby clean.rb

on Ubuntu. OSX still works fine.

$ nokogiri -v
--- 
warnings: []

nokogiri: 1.4.2
ruby: 
  version: 1.9.2
  platform: x86_64-linux
libxml: 
  binding: extension
  compiled: 2.7.7
  loaded: 2.7.7

@mperham
Copy link
Author

mperham commented Jun 2, 2010

The following smaller test case reproduces the crash for me:

http://gist.github.com/422886

Just run as 'ruby clean.rb'. It looks like one of the children iterated over on xml/node.rb line 592 is not a valid Ruby object as it crashes even if I just 'p' the child.

c:0017 p:---- s:0063 b:0063 l:000062 d:000062 CFUNC  :p
c:0016 p:0012 s:0059 b:0059 l:000d80 d:000058 BLOCK  /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:592
c:0015 p:0015 s:0056 b:0056 l:000046 d:000055 BLOCK  /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:214

@mperham
Copy link
Author

mperham commented Jun 2, 2010

It has to do with the transformers passed into the sanitize call. Can you look at the cleaner() method in the gist and let me know if it looks correct? I'm trying to replace BR and P tags with a single space so HTML like this "

Hello
World

" will look like " Hello World". This code throws a backtrace, instead of seg faulting:

n = env[:node]
txt = Nokogiri::XML::Text.new(' ', n.document)
n.children.each do |c|
  txt.add_child(c)
end
if n.children.size > 0 # Added this if so we don't add child text to empty parent text.
  txt.add_child(Nokogiri::XML::Text.new(' ', n.document))
end
{ :node => txt }

/opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:593:in `block in traverse': undefined method `traverse' for 7252:Fixnum (NoMethodError)
from /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:214:in `block in each'
from /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:213:in `upto'
from /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node_set.rb:213:in `each'
from /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:593:in `traverse'
from /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/sanitize-1.2.1/lib/sanitize.rb:129:in `clean_node!'
from /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/sanitize-1.2.1/lib/sanitize.rb:105:in `clean!'
from /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/sanitize-1.2.1/lib/sanitize.rb:98:in `clean'
from /opt/ruby-1.9.1-p378/lib/ruby/gems/1.9.1/gems/sanitize-1.2.1/lib/sanitize.rb:49:in `clean'

@tenderlove
Copy link
Member

I was able to reduce the crash on ubuntu to this:

require 'nokogiri'

html = 'CA<br/>Sr Technical Architect'

fragment = Nokogiri::HTML::DocumentFragment.parse(html)
fragment.traverse do |child|
  next unless 'br' == child.name

  child.replace Nokogiri::XML::Text.new(' ', child.document)
end

Ruby 1.8.7, libxml 2.7.5. I will try with 2.7.7 later. I need a break now, that was hard work. :-)

@flavorjones
Copy link
Member

@tenderlove: thanks. I'll get on it.

@flavorjones
Copy link
Member

Looks like text-node-merging madness. Reproduced with both libxml 2.7.5 and 2.7.7.

@flavorjones
Copy link
Member

Ah, OK. This example crashes because the tree is being modified in a very particular way as it's being traversed.

Inserting a text node merges the string with any adjacent text nodes, which can result in nodes disappearing. In the above example, the result will be a single text node containing "CA Sr Technical Architect", and one of the original text nodes will be freed.

We should be able to work around this. Let me play with it.

In the meantime, though, a great workaround would be to use xpath to replace these nodes:

fragment.xpath("//br").each { |node| node.replace Text.new(' ', node.document) }

@flavorjones
Copy link
Member

handle merged text nodes better. closed by 2867d4b.

@flavorjones
Copy link
Member

it's probably worth looking at the commit to understand exactly why this case crashes.

@mperham
Copy link
Author

mperham commented Jun 24, 2010

Thanks for the hard work!

@flavorjones
Copy link
Member

You are most welcome!

flavorjones added a commit that referenced this issue Jul 3, 2023
flavorjones added a commit that referenced this issue Jul 3, 2023
**What problem is this PR intended to solve?**

Closes #2916

Related to #283 and #595

**Have you included adequate test coverage?**

Yes

**Does this change affect the behavior of either the C or the Java
implementations?**

Brings C in line with Java behavior.
flavorjones added a commit that referenced this issue Jul 5, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants