Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel.Array() method fails to wrap text node #679

Closed
mislav opened this issue May 15, 2012 · 25 comments
Closed

Kernel.Array() method fails to wrap text node #679

mislav opened this issue May 15, 2012 · 25 comments

Comments

@mislav
Copy link

mislav commented May 15, 2012

require 'nokogiri'
doc = Nokogiri('<a>b</a>')
text_node = doc.at('/a/text()')
Array(text_node)
#=> []

I expected: [<text_node>]

# Nokogiri (1.5.2)
    ---
    warnings: []
    nokogiri: 1.5.2
    ruby:
      version: 1.9.3
      platform: x86_64-darwin11.2.0
      description: ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0]
      engine: ruby
    libxml:
      binding: extension
      compiled: 2.7.8
      loaded: 2.7.8
@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@tenderlove maybe, moving the to_ary up to Nokogiri:: XML::Node would be best since now the behaviour is inconsistent between Nokogiri classes.

Though, I would rather suggest @mislav using Array[] since it doesn't rely on objects having a method and reverting this commit.

@tenderlove
Copy link
Member

Moving this implementation of to_ary up to Node seems wrong. Not all nodes are text nodes, and not all nodes should be treated as such when converting to an array.

The other thing we could do is make to_ary return [self], which would result in Array() returning an array with the text node inside. Either way, meh.

/cc @flavorjones @ender672

@tenderlove
Copy link
Member

Oops, forgot to cc @yokolet

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

I think the problem is using a core method in a way is not intended to.

Kernel#Array()
Returns arg as an Array. First tries to call arg.to_ary, then arg.to_a.

Array::[]
Returns a new array populated with the given objects.

to_ary and to_a does not make sense in a simple object just to use parenthesis instead of brackets.

@mislav
Copy link
Author

mislav commented May 15, 2012

In case it isn't obvious, the only reason I'm using Array() in the first place is because I don't know if the passed object is a single node or an array of nodes.

Array(elements).each do |element|
  # ...
end

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@mislav at always returns nil or a Node.

@mislav
Copy link
Author

mislav commented May 15, 2012

@Serabe at is only for the purpose of demonstration of the bug. My user code is different. At several points, I want to avoid the check:

elements = [elements] unless Array === elements
elements.each ...

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@mislav what I am saying is that either it is not a bug (which I believe) or that to_ary method should be moved to Nodesince a Text node is not that different from a simple Node.

@tenderlove
Copy link
Member

@Serabe The current behavior is obviously a bug. Why would you expect that Array(node) would return an empty list?

I'm reverting my patch for now, until we get more input. :-)

@tenderlove tenderlove reopened this May 15, 2012
@tenderlove tenderlove reopened this May 15, 2012
@mislav
Copy link
Author

mislav commented May 15, 2012

That I have no opinion about. It's for you to discuss. The only thing I care about is that I don't end up with empty arrays when I use Array() with Nokogiri objects, since that breaks the principle of least surprise for me.

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@tenderlove because if you expect to Array(any_node) to return [any_node] you should add to_a to Node, not to Text.

@tenderlove
Copy link
Member

@Serabe Node already implements to_a. Here we need to implement to_ary.

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@tenderlove then docs are wrong in Kernel#Array

@tenderlove
Copy link
Member

@Serabe how so?

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@tenderlove afaik, Kernel#Array should call to_a if object does not respond to_ary or returns nil. Let me check rubyspec.

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@tenderlove the problem is that to_ain Node returns []. So it is a bug (my apologies about this).

@tenderlove
Copy link
Member

irb(main):001:0> require 'nokogiri'
=> true
irb(main):002:0> doc = Nokogiri('<a>b</a>')
=> #<Nokogiri::XML::Document:0x3fc0e5c48570 name="document" children=[#<Nokogiri::XML::Element:0x3fc0e5c481ec name="a" children=[#<Nokogiri::XML::Text:0x3fc0e5c35fb0 "b">]>]>
irb(main):003:0> text_node = doc.at('/a/text()')
=> #<Nokogiri::XML::Text:0x3fc0e5c35fb0 "b">
irb(main):004:0> text_node.to_a
=> []
irb(main):005:0>

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@tenderlove and it is the expected behaviour since to_a comes from Enumerable and each goes through attributes. Then to_ary would go in Node, in which Enumerable is being included.

@tenderlove
Copy link
Member

@Serabe no problem! 😄

Our implementation of to_a returns a list of all the attributes for a particular node. to_a doesn't really make sense in the context of a text node since a text node can never have attributes. I think in this case we either need to remove to_a on the text node, or return [self] from to_ary so that Array() works.

@Serabe
Copy link
Contributor

Serabe commented May 15, 2012

@tenderlove then I would expect any node (since any node can come from a search) to respond to to_ary as [self] not just Text. Otherwise changing the query in @mislav's code could break it.

@flavorjones
Copy link
Member

@tenderlove, @Serabe - sorry for jumping in so late.

My understanding is that, because XML::Node implements #each, and also mixes in Enumerable, then we get an implementation of #to_a. Ya?

I think we should reconsider a Node being Enumerable in Nokogiri 2.0. My strawman proposal is to remove it in favor of a better-defined Attributes class which would be Enumerable (and very Hash-y).

This would help us address some other lingering problems around, e.g., how SAX parsers handle node attributes; and the odd API we have around Node#attributes and XML::Attr.

Thoughts?

/cc @ender672 @yokolet

@Serabe
Copy link
Contributor

Serabe commented May 24, 2012

+1 I really like your idea.
El 24/05/2012 19:47, "Mike Dalessio" <
[email protected]>
escribió:

@tenderlove, @Serabe - sorry for jumping in so late.

My understanding is that, because XML::Node implements #each, and also
mixes in Enumerable, then we get an implementation of #to_a. Ya?

I think we should reconsider a Node being Enumerable in Nokogiri 2.0. My
strawman proposal is to remove it in favor of a better-defined Attributes
class which would be Enumerable (and very Hash-y).

This would help us address some other lingering problems around, e.g., how
SAX parsers handle node attributes; the odd API we have around
Node#attributes and XML::Attr.

Thoughts?

/cc @ender672 @yokolet


Reply to this email directly or view it on GitHub:
https://github.com/tenderlove/nokogiri/issues/679#issuecomment-5912872

@tenderlove
Copy link
Member

I think we should reconsider a Node being Enumerable in Nokogiri 2.0. My strawman proposal is to remove it in favor of a better-defined Attributes class which would be Enumerable (and very Hash-y).

👍

I'd like to keep the hashy syntax for nodes, but removing enumerable seems fine.

@flavorjones
Copy link
Member

Slated for 2.0

@flavorjones flavorjones removed the 2.0 label Jan 2, 2015
@flavorjones
Copy link
Member

Closing, captured in ROADMAP.md for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants