Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add global XPath functions handler #1894

Closed

Conversation

jonathanhefner
Copy link
Contributor

@jonathanhefner jonathanhefner commented Apr 17, 2019

What problem is this PR intended to solve?

Allow global definition of XPath functions, e.g.

module MyFunctions
  def regex(node_set, pattern)
    node_set.find_all { |node| node['some_attribute'] =~ /#{pattern}/ }
  end
end

Nokogiri::XML::XPathFunctions.include(MyFunctions)

node.search('.//title[regex(., "\w+")]', 'div.employee:regex("[0-9]+")')

This is a revival of #464 with a different approach. Although that PR is shown as merged, I think that's a GitHub bug. The original implementation was mbklein/nokogiri@8fb3da3 which has not been merged.

Have you included adequate test coverage?

I think so. I refrained from adding tests which would overlap with existing tests, but I would be happy to add more, if you have suggestions.

Does this change affect the C or the Java implementations?

No. Apparently, yes. The Java implementation injects XML namespaces into XPath queries based on the Ruby methods the function handler defines. Previously, that list of methods was derived from the function handler class, which omits methods delegated by SimpleDelegator. Now the list of methods is derived directly from the function handler object.

@codeclimate
Copy link

codeclimate bot commented Apr 17, 2019

Code Climate has analyzed commit c3376dd and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (80% is the threshold).

This pull request will bring the total coverage in the repository to 93.5% (0.0% change).

View more on Code Climate.

@jonathanhefner jonathanhefner force-pushed the xpath-functions branch 3 times, most recently from 28ddb33 to c3376dd Compare April 18, 2019 02:44
@@ -114,9 +118,15 @@ public IRubyObject evaluate(ThreadContext context, IRubyObject expr, IRubyObject
if (!handler.isNil()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition should now always be true (i.e. handler will always be an instance of XPathFunctions). Should I remove this if?

@flavorjones
Copy link
Member

Thank you for submitting this PR! I'll take a look as soon as I can, this is a super-busy week for me.

@flavorjones
Copy link
Member

Sorry this has gone so long without review; I'm tagging it for v1.11.0 which I'm hoping to release at the end of 2019, so will review in that timeframe.

@flavorjones flavorjones added this to the v1.11.0 milestone Dec 4, 2019
Base automatically changed from master to main January 17, 2021 21:52
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

เล่นไรกันวะ

@flavorjones flavorjones modified the milestones: v1.12.0, v1.13.0 Aug 2, 2021
@flavorjones flavorjones modified the milestones: v1.13.0, v1.14.0 Jan 6, 2022
@flavorjones
Copy link
Member

I've rebased this onto current main.

@flavorjones
Copy link
Member

@flavorjones
Copy link
Member

OK, the method lookup in the XmlXpathContext.java:evalute() block doesn't work properly. I spent a few minutes exploring how to get the public methods out of the block but couldn't quite figure it out.

Here's the failing test that needs to be added:

      def test_search_with_xpath_query_uses_global_custom_selectors_with_arguments_without_namespace
        skip("Testing fallback behavior in JRuby") unless Nokogiri.jruby?

        XPathFunctions.class_eval do
          def our_filter(*args)
            my_filter(*args)
          end
        end

        set = @xml.search('//employee/address[our_filter(., "domestic", "Yes")]', @handler)
        refute_empty(set)
        set.each do |node|
          assert_equal("Yes", node["domestic"])
        end
      end

I also want to point out that the intention is to deprecate use of custom XPath functions without the nokogiri: prefix -- see #2147. We might reasonably wait until that lands and then take another look at this.

@jonathanhefner
Copy link
Contributor Author

@flavorjones Oh wow, I forgot about this one! I am fine with waiting until after #2147 — no rush. Still, I was curious to see if I could get it working, so I pushed a commit to fix any failing tests, including the one you mentioned (test_search_with_xpath_query_uses_global_custom_selectors_with_arguments_without_namespace).

@flavorjones
Copy link
Member

Thank you! I'll take another look

@flavorjones flavorjones modified the milestones: v1.15.0, v1.14.0 Aug 31, 2022
@flavorjones flavorjones modified the milestones: v1.14.0, v1.15.0 Nov 22, 2022
@flavorjones flavorjones modified the milestones: v1.15.0, v1.16.0 Apr 28, 2023
@flavorjones
Copy link
Member

flavorjones commented Jun 24, 2024

OK, I'm sorry this has been open for so long.

After a lot of thought, I'm leaning towards not accepting this feature into Nokogiri.

I think, over the last decade, Nokogiri has moved towards the bottom of most people's stack -- that is, the majority of dependencies on Nokogiri are indirect (via rails-html-sanitizer, for example) and not direct. As a result, I've grown less enthusiastic about introducing global behavior changes, simply because I want to reduce the blast radius should there be some sort of unexpected conflict between two libraries both trying to use this functionality. (I'm imagining two libraries each trying to implement a function with a common name, and a user having no way to diagnose or work around the problem.)

I think that @mbklein's suggestion of making this into a document-specific generator decorator is a better idea from the perspective of a) explicitly mixing in behavior and b) limiting blast radius.

I'm really sorry that this decision took so long to make, and I want to say thanks very much for putting in the work on this PR to try to improve Nokogiri!

I'd be happy to keep discussing this, but I'm going to close this in the meantime.

Thank you again!

Small edits for clarity made on 2026-06-27.

@jonathanhefner
Copy link
Contributor Author

@flavorjones Thank you for explaining your thought process! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants