Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use jar-dependencies for requiring jars #1253

Closed
mkristian opened this issue Feb 21, 2015 · 20 comments
Closed

use jar-dependencies for requiring jars #1253

mkristian opened this issue Feb 21, 2015 · 20 comments
Assignees
Milestone

Comments

@mkristian
Copy link
Contributor

this gives back the control of which jars are loaded when. you can use jbundler or maven or gradle to manage the jars. actually you manage gems and jars almost in the same manner.

it also gives you a chance to work around certain classloader problems when you really need to not delegate the classloading to the parent classloader.

currently I am working on pysch to do the same
https://github.com/mkristian/psych/blob/jruby-build/Rakefile#L41
since we want have psych as default gem in jruby-9k

jruby-openssl is an example to use jar-dependencies with still packing the jars but use jar-dependencies when available (so older jrubies work as well)
https://github.com/jruby/jruby-openssl/blob/master/jruby-openssl.gemspec#L23
https://github.com/jruby/jruby-openssl/blob/master/lib/jopenssl/load.rb#L8

@mkristian
Copy link
Contributor Author

see also guyboertje/jrjackson#35

@flavorjones
Copy link
Member

Hi there,

Sorry again, but you'll need to fully explain what the problem you're trying to solve is. Not being a full-time Java developer, I have no idea what jar-dependencies is, what problem it is supposed to help, what the tradeoffs are to supporting it, nor what alternative solutions were considered.

Let's please just start with a clear description of the problem? We can then proceed to a discussion of solutions. Thanks for your patience.

@mkristian
Copy link
Contributor Author

mkristian commented Mar 3, 2015 via email

@mkristian
Copy link
Contributor Author

@jvshahid will continue my thingy from #1395 (comment) here

declaring jar dependencies inside the gemspec file

as usual there is a ruby side of things and a java side of things

ruby tools

having those jar dependencies declared inside nokogiri gemspec allows tools like jbundler or jar-dependencies to manage jars in similar manner as gems. both tools can use Jarfile to setup jars for a jruby project - see https://gist.github.com/mkristian/a46851705427f68ce310. they do obey the jars declared inside a gemspec when used with bundler's Gemfile or can lockdown you jar versions for a gem project which has nokogiri as transitive dependency.

loading jars via jar-dependencies gems also ensures that each jar is loaded only with a single version and any attempt to load the jar with another version will produce only a warning without loading the jar. just an extra safety net.

java tools

both maven and gradle have plugins which handle gems and jars in the same manner, gem and jar are just different kind of dependencies with a different packaging forma and are treated alike otherwise. but they depend on having jars declared inside the gemspec to be able to integrate them into their dependency resolution. for embedded jruby this is even more important then for a ruby project. a java application which uses embedded jruby does come with its own set of dependencies. and those dependencies from the java application are already part of the jruby runtime - as per design of jruby classloaders. here it would be important to tell jar-dependencies which jars are already loaded by the underlying container.

possible issues

of course forcing a different version of xerces or using xercesMinimal instead of xerces can cause nokogiri to produce bugs as it uses a different version then the one from nokogiri itself. on the other hand this is common situation with java projects.

what needs to be done

  • declare the jars inside the gemspec
  • load the jars via require_jar(group_id, artifact_id, version)
  • store the jars under ./lib/#{group_id}/#{artifact_id}/#{version}/#{artifact_id}-#{version}.jar

this is the manual way of doing it. look at https://github.com/mkristian/jar-dependencies/tree/master/examples/gem-with-java-extension-and-jar-dependencies/using-rake-compiler for another way of doing it.

the whole thing comes with another gem dependency but jruby-1.7.x and jruby-9k does have this extra gem already as default gem.

in short

  • declare the jars inside the gemspec - for dependency management tools
  • load jars via jar-dependencies - allow to override the default from the gem which is used by jar dependency management tools as well
  • store the jars slightly different

the only impact for the average nokogiri user is that jar-dependencies gem does get activated and used as well. once the jars are loaded there is no further impact on the nokogiri

@jvshahid
Copy link
Member

@headius what do you think about this proposal ? I personally like it but I'm worried about wide adoption and edge cases that could arise from using jbundler and jar-dependencies. your thoughts will be very helpfull.

@jvshahid
Copy link
Member

/cc @yokolet

@mkristian
Copy link
Contributor Author

@jvshahid FYI all jars which are embedded in jruby and on maven-central are loaded via jar-dependencies: jline-2.11.jar, the 2 bouncy-castle jars from jruby-openssl, and snakeyaml-1.14 for psych gem

@mkristian
Copy link
Contributor Author

any decision ?

@flavorjones
Copy link
Member

I picked this up with @jvshahid this morning. Won't make the v1.9.0 release this week; but hoping to adopt this approach for the following release.

@headius
Copy link
Contributor

headius commented Jan 14, 2020

I started looking into this and I think the main challenge is figuring out which maven coordinates to use for all the libraries.

I started trying to sort it out but some of the version information is lost to time. I am using https://search.maven.org to look for artifacts.

  • isorelax.jar: This library appears to have been abandoned by original authors and now the maven coordinates are all over the place. A number of additional groupId's have pushed versions of this library and it's unclear if any of them qualify as canonical. The version number specified in the jar is "20041111" which does not appear to correspond to any published maven artifact.
  • jing.jar: There's no version information in the jar or git commits, but the main artifact appears to be published by the com.thaiopensource groupId, which has only one version: "20091111". There are other artifacts pushed by other organizations.
  • nekodtd.jar and nekohtml.jar: No version information in the jar, but there are pushed artifacts under the nekohtml organization. However both jars have been updated with forked versions from @jvshahid that do not appear to have ever been pushed to Maven Central.
  • serializer.jar: This appears to be "xalan:serializer:2.7.2".
  • xalan.jar: This appears to be "xalan:xalan:2.7.2".
  • xalanImpl.jar: I'm not sure where the filename comes from, but based on the contents of the jar and the commit logs this appears to be "xerces:xerces:2.12.0"
  • xml-apis.jar: This one has peculiar lineage. Version 1.4.01 is mentioned for several components within this jar, which would point us toward "xml-apis:xml-apis:1.4.01", but the SAX component within the jar claims a version of 2.0.2. The xml-apis artifact 2.0.2 was released in 2005, but the 1.4.01 version was released in 2011. 😳

Without at least getting the @jvshahid forked artifacts pushed to Maven Central, I'm not sure this can move forward (unless we only jar-deps the ones we can figure out?)

@headius
Copy link
Contributor

headius commented Jan 14, 2020

The following patch appears to produce a gem that installs all the jar dependencies. Note that these versions may or may not be right based on my previous comment (and the @jvshahid versions are not in here at all):

diff --git a/Rakefile b/Rakefile
index b81d6fde..e23374ef 100644
--- a/Rakefile
+++ b/Rakefile
@@ -133,7 +133,23 @@ HOE = Hoe.spec 'nokogiri' do
   ]
   self.clean_globs += Dir.glob("ports/*").reject { |d| d =~ %r{/archives$} }
 
-  unless java?
+  if java?
+    self.extra_deps += [
+      ['jar-dependencies', "~> 0.4.0"],
+    ]
+    def self.add_dependencies
+      super
+
+      spec.requirements << 'jar com.sun.xml.bind.jaxb, isorelax, 20090621' # unknown where to find original 20041111
+      spec.requirements << 'jar com.thaiopensource, jing, 20091111'
+      spec.requirements << 'jar nekohtml, nekodtd, 0.1.11' # FIXME,  not using jvshahid's fork!
+      spec.requirements << 'jar nekohtml, nekohtml, 1.9.6.2' # FIXME,  not using jvshahid's fork!
+      spec.requirements << 'jar xalan, serializer, 2.7.2'
+      spec.requirements << 'jar xalan, xalan, 2.7.2'
+      spec.requirements << 'jar xerces, xercesImpl, 2.12.0'
+      spec.requirements << 'jar xml-apis, xml-apis, 1.4.01'
+    end
+  else
     self.extra_deps += [
       ["mini_portile2",    "~> 2.4.0"], # keep version in sync with extconf.rb
     ]

I also had to make the following change for compatibility with JRuby 9.2.9, which added ThreadContext to the RubyBasicObject implementation of to_a. Without that, this will not compile (because it tries to specify a more general return type):

diff --git a/ext/java/nokogiri/XmlNodeSet.java b/ext/java/nokogiri/XmlNodeSet.java
index ea8b031a..931ab3d7 100644
--- a/ext/java/nokogiri/XmlNodeSet.java
+++ b/ext/java/nokogiri/XmlNodeSet.java
@@ -39,6 +39,7 @@ import static nokogiri.internals.NokogiriHelpers.nodeListToRubyArray;
 import java.util.Arrays;
 
 import org.jruby.Ruby;
+import org.jruby.RubyArray;
 import org.jruby.RubyClass;
 import org.jruby.RubyFixnum;
 import org.jruby.RubyObject;
@@ -392,7 +393,7 @@ outer:
     }
 
     @JRubyMethod(name = {"to_a", "to_ary"})
-    public IRubyObject to_a(ThreadContext context) {
+    public RubyArray to_a(ThreadContext context) {
         return context.runtime.newArrayNoCopy(nodes);
     }
 

I will discuss with @enebo if there's any compatibility issue here. I'm unsure if this affects runtime; the JVM tends not to care about such typing issues. The original method should probably return IRubyObject though, since it's possible to get a nil from some classes.

These changes are least get the dependencies into the jar, and subsequently into the installed gem:

$ ls ../jruby/lib/ruby/gems/shared/gems/nokogiri-1.10.4-java/lib/
com              isorelax.jar     nekodtd.jar      nekohtml.jar     nokogiri         nokogiri_jars.rb xalan            xerces           xml-apis         xsd
isorelax         jing.jar         nekohtml         net              nokogiri.rb      serializer.jar   xalan.jar        xercesImpl.jar   xml-apis.jar

The non-jar elements here are directories based on the maven coordinates of each library. The original jars are still in place because I'm not sure how to use jar-dependencies at dev time. Help, @mkristian?

@headius
Copy link
Contributor

headius commented Jan 14, 2020

The patch to correct this return type problem on JRuby is here: https://gist.github.com/headius/511632d8f3b6aeb2348462b42556b410

However I would prefer to keep the more specific type and modify Nokogiri to return the same type.

Note that this was not really an incompatible API change in JRuby...Nokogiri just happened to already override with a different return type a method we added in 9.2.9.

@mkristian
Copy link
Contributor Author

@headius great to revive this PR. I have to see what needs. but using the jars we can from maven central is already a start. the modified once we can require manually.

@headius
Copy link
Contributor

headius commented Jan 14, 2020

@mkristian I could push the jar-deps stuff to a branch to get this going. We can leave out the jars that we don't have proper versions for. Next step would be getting dev-time working with the jar-deps dependencies.

@mkristian
Copy link
Contributor Author

@headius a branch would be great

headius added a commit to headius/nokogiri that referenced this issue Jan 14, 2020
See sparklemotion#1253. The commented versions can't be found on Maven Central.
@headius
Copy link
Contributor

headius commented Jan 14, 2020

@mkristian PR is at #1967.

@jvshahid
Copy link
Member

  • nekodtd.jar and nekohtml.jar: No version information in the jar, but there are pushed artifacts under the nekohtml organization. However both jars have been updated with forked versions from @jvshahid that do not appear to have ever been pushed to Maven Central.

Any ideas about what I should use for the groupId? If we decide to publish those under a new groupId, e.g., org.nokogiri, I believe we will have to maintain some Maven credentials and a signing key. @flavorjones Are you ok with maintaining an extra set of creds? I am guessing you already use some credentials to publish to rubygems.

@headius
Copy link
Contributor

headius commented Jan 15, 2020

@jvshahid If you haven't seen it already, here's the guide I use for all my Maven projects: https://central.sonatype.org/pages/ossrh-guide.html

Once you've got it bootstrapped it's not too bad to maintain. Your local PGP/GPG key should be sufficient for signing...it's really just to make sure that you're the one who pushed that resource.

As for groupId, the Sonatype "rules" really want you to use a domain name. Since these are your changes to those libraries, I'd recommend you use your own domain name, e.g. com.jvshahid:nekohtml.

I have not looked into the neko projects to see if they're still maintained; if they are, it would be best to get your changes upstreamed.

@jvshahid
Copy link
Member

Cool, I will look into it this weekend.

mkristian pushed a commit to mkristian/nokogiri that referenced this issue Feb 3, 2020
See sparklemotion#1253. The commented versions can't be found on Maven Central.
@flavorjones flavorjones modified the milestones: v1.12.0, v1.13.0 Aug 2, 2021
@flavorjones flavorjones modified the milestones: v1.13.0, v1.14.0 Jan 6, 2022
flavorjones added a commit that referenced this issue Jan 19, 2022
Addresses part of #1253

These deps are managed using jar-dependencies:

- isorelax
- jing
- serializer
- xalan
- xercesImpl
- xml-apis

Note that the following versions were changed:

- jing: com.thaiopensource:jing:20091111 → nu.validator:jing:20200702VNU
- xercesImpl: xerces:xercesImpl:2.12.0 → xerces:xercesImpl:2.12.1

Note that the following new dependences were introduced:

- Saxon-HE 9.6.0-4 (via nu.validator:jing:20200702VNU)

Finally, note that we still use a local jar for nekohtml and nekodtd, which is not the ideal end
state.
@flavorjones
Copy link
Member

#2432 introduced jar-dependencies. ✔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants