Skip to content
This repository has been archived by the owner on Apr 14, 2021. It is now read-only.

installing github-sourced gems with multiple jobs breaks under JRuby #3174

Closed
jjb opened this issue Sep 14, 2014 · 46 comments
Closed

installing github-sourced gems with multiple jobs breaks under JRuby #3174

jjb opened this issue Sep 14, 2014 · 46 comments

Comments

@jjb
Copy link
Contributor

jjb commented Sep 14, 2014

I've been experiencing this problem for several weeks with both bundler 1.6 and 1.7.

I've observed it happening under JRuby 1.7.13, 1.7.14, and 1.7.15. I haven't tried other versions. Anecdotally I think it started happening more often with 1.7.14/15. But that might be totally wrong.

The problem: when using -j1, bundle install always succeeds. When using a higher value, it will sometimes fail and sometimes succeed. On Heroku where I believe -j4 is used, bundle install succeeds about 1/6 times. On my dev machine experimenting with -j16 and -j32, it seems to break all the time (I have not done a large volume of experiments).

The failure is not always with the same gem. But it is always with a github-sourced gem. I haven't experimented to see if it will happen with any git-sourced gem, or only github-sourced gems.

The variance in failure includes on Heroku where AFAIK the starting state is always identical (I don't think their cache is updated unless a push succeeds).

Here is one example of a failure:

...many lines of successful gem installs...
Using rails_autolink 1.1.6
Using rolify 3.4.0 from git://github.com/jjb/rolify.git (at do-not-require-db-when-precompiling)
Using dullard 0.1.0 from git://github.com/jjb/dullard.git (at jruby-fixes)
Using metainspector 2.3.0 from git://github.com/jjb/metainspector.git (at typhoeus)
Using activerecord-session_store 0.1.0 from git://github.com/medstro/activerecord-session_store.git (at 2c1c092)

Gem::Package::TarWriter::FileOverflow: You tried to feed more data than fits in the file.

Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension.

    /Users/john/.rbenv/versions/jruby-1.7.13/bin/jruby extconf.rb 
Error opening script file: /Users/john/.rbenv/versions/jruby-1.7.13/lib/ruby/gems/shared/bundler/gems/devise-f0738098f043/extconf.rb (No such file or directory)


Gem files will remain installed in /Users/john/.rbenv/versions/jruby-1.7.13/lib/ruby/gems/shared/gems/capybara-webkit-1.3.0 for inspection.
Results logged to /Users/john/.rbenv/versions/jruby-1.7.13/lib/ruby/gems/shared/gems/capybara-webkit-1.3.0/./gem_make.out

Errno::ENOENT: No such file or directory - No such file or directory - .travis.yml
Using sass-rails 4.0.1 from git://github.com/rails/sass-rails.git (at master)
An error occurred while installing action_dispatch-gz_static (0.0.3), and
Bundler cannot continue.
Make sure that `gem install action_dispatch-gz_static -v '0.0.3'` succeeds
before bundling.

and another

...many lines of successful gem installs...
Using metamagic 3.0.3
Using informant-rails 0.7.0
Using rails_autolink 1.1.6
Using carmen-rails 1.0.1
Using rails-env-favicon 0.0.5
Installing dotenv 0.10.0

Errno::ENOENT: No such file or directory - No such file or directory - spec/fixtures/guardian.co.uk.response
Using rolify 3.4.0 from git://github.com/jjb/rolify.git (at do-not-require-db-when-precompiling)

Errno::ENOENT: No such file or directory - No such file or directory - lib/active_record/session_store/session.rb
Using action_dispatch-gz_static 0.0.3 from git://github.com/medstro/action_dispatch-gz_static.git (at master)
Installing dotenv-rails 0.10.0
Using devise 3.2.4 from git://github.com/medstro/devise.git (at f073809)
Using sass-rails 4.0.1 from git://github.com/rails/sass-rails.git (at master)
An error occurred while installing metainspector (2.3.0), and Bundler cannot
continue.
Make sure that `gem install metainspector -v '2.3.0'` succeeds before bundling.

Let me know if you want more info. If you would like my Gemfile and Gemfile.lock, I can email them to you.

@jjb jjb changed the title using bundler to install github-sourced gems with multiple jobs (-j[N]) breaks under JRuby installing github-sourced gems with multiple jobs breaks under JRuby Sep 14, 2014
@jjb
Copy link
Contributor Author

jjb commented Sep 26, 2014

@indirect any thoughts on this? ever heard of similar problem from other jruby users?

@indirect
Copy link
Member

nope, never seen another report like this one. those look like file access errors or disk errors, not installation errors. does it always work without multiple jobs? does it always fail with the latest version of Rubygems?

@mkristian
Copy link
Contributor

@jjb could you try jruby-1.7.16 which has some regression fixes regarding 1.7.13, 1.7.14, and 1.7.15

@jjb
Copy link
Contributor Author

jjb commented Sep 29, 2014

@indirect yes it always works without multiple jobs. i am using rubygems 2.1.9

@mkristian tried it with 1.7.16, same problem

@indirect
Copy link
Member

That makes it sound like some kind of weird threaded race condition to me, but I've never seen in reported anywhere else. Maybe the downloaded gem files are somehow getting their IO crossed?

@jjb
Copy link
Contributor Author

jjb commented Sep 29, 2014

@indirect

Yep, I imagine it's something like that. Or maybe something as simple as bundler using the same directory to download and/or build gems sources from git, so that there are simply files being overwritten and such?

Stupid question: what does bundler use for git? The local OS git binary, or a library? (if the former, then maybe the installed version makes a difference)

Would you like my Gemfile so you can recreate the problem? If so give me an email address.

@indirect
Copy link
Member

Sure, a Gemfile would be helpful. Please send it to [email protected].

On Mon, Sep 29, 2014 at 8:24 AM, John Bachir [email protected]
wrote:

@indirect
Yep, I imagine it's something like that. Or maybe something as simple as bundler using the same directory to download and/or build gems sources from git, so that there are simply files being overwritten and such?
Stupid question: what does bundler use for git? The local OS git binary, or a library? (if the former, then maybe the installed version makes a difference)

Would you like my Gemfile so you can recreate the problem? If so give me an email address.

Reply to this email directly or view it on GitHub:
#3174 (comment)

@jjb
Copy link
Contributor Author

jjb commented Sep 29, 2014

@indirect sent. Thanks for your attention. Hopefully it's a real bug and not something I've overlooked.

@splattael
Copy link

Hi,

I'm also experiencing a similar problem bundling Rails with bundle -j4 on jruby-1.7.16.
Bundling with -j1 always works!

Reproduction:

  • git clone https://github.com/rails/rails
  • cd rails
  • bundler -j4

The error message change per run.

I hope this helps.

@TimMoore
Copy link
Contributor

TimMoore commented Oct 3, 2014

Hi all,

Can those of you experiencing this problem try again with Rubygems v2.4.2 please? (Run gem update --system to install it.)

There was a thread-safety issue around gem installation fixed in that release rubygems/rubygems#1005

It would be awesome if you could report back whether that fixes the problem or not. Thanks!

@splattael
Copy link

@TimMoore

I've upgraded to Rubygems v2.4.2 but it still fails:

$ cd rails
$ gem -v
2.4.2
$ bundle -j4
...
Using activesupport 4.2.0.beta2 from source at .
Using rails-deprecated_sanitizer 1.0.3
Using globalid 0.3.0
Using delayed_job 4.0.4
Using qu-redis 0.2.0 from git://github.com/bkeepers/qu.git (at master)
Using activejob 4.2.0.beta2 from source at .


activemodel at /home/peter/devel/_private/rails/activemodel did not have a valid gemspec.
This prevents bundler from installing bins or native extensions, but that may not affect its functionality.
Errno::ENOENT: No such file or directory - No such file or directory
The validation message from Rubygems was:
  ["CHANGELOG.md", "MIT-LICENSE", "README.rdoc", "lib/active_model.rb", "lib/active_model/attribute_methods.rb", "lib/active_model/callbacks.rb", "lib/active_model/conversion.rb", "lib/active_model/dirty.rb", "lib/active_model/errors.rb", "lib/active_model/forbidden_attributes_protection.rb", "lib/active_model/gem_version.rb", "lib/active_model/lint.rb", "lib/active_model/locale/en.yml", "lib/active_model/model.rb", "lib/active_model/naming.rb", "lib/active_model/railtie.rb", "lib/active_model/secure_password.rb", "lib/active_model/serialization.rb", "lib/active_model/serializers/json.rb", "lib/active_model/serializers/xml.rb", "lib/active_model/test_case.rb", "lib/active_model/translation.rb", "lib/active_model/validations.rb", "lib/active_model/validations/absence.rb", "lib/active_model/validations/acceptance.rb", "lib/active_model/validations/callbacks.rb", "lib/active_model/validations/clusivity.rb", "lib/active_model/validations/confirmation.rb", "lib/active_model/validations/exclusion.rb", "lib/active_model/validations/format.rb", "lib/active_model/validations/inclusion.rb", "lib/active_model/validations/length.rb", "lib/active_model/validations/numericality.rb", "lib/active_model/validations/presence.rb", "lib/active_model/validations/validates.rb", "lib/active_model/validations/with.rb", "lib/active_model/validator.rb", "lib/active_model/version.rb"] are not files
Using activemodel 4.2.0.beta2 from source at .
Using activerecord 4.2.0.beta2 from source at .
An error occurred while installing rails-dom-testing (1.0.3), and Bundler cannot
continue.
Make sure that `gem install rails-dom-testing -v '1.0.3'` succeeds before
bundling.

@jjb
Copy link
Contributor Author

jjb commented Oct 3, 2014

@TimMoore same here. let me know if you want me to email you my Gemfile and Gemfile.lock.

@TimMoore
Copy link
Contributor

TimMoore commented Oct 4, 2014

Thanks for confirming... I'll try to reproduce it myself.

@splattael
Copy link

After 8ff36b8 bundle -j4 also fails on MRI.

Reproduction: #3174 (comment)

@indirect
Copy link
Member

indirect commented Oct 7, 2014

You need to update rubygems to a version that is threadsafe.

On Mon, Oct 6, 2014 at 5:31 AM, Peter Suschlik [email protected]
wrote:

After 8ff36b8 bundle -j4 also fails on MRI.

Reproduction: #3174 (comment)

Reply to this email directly or view it on GitHub:
#3174 (comment)

@jjb
Copy link
Contributor Author

jjb commented Oct 7, 2014

@indirect

Both @splattael and I reproduced the problem under rubygems 2.4.2 (see above)

@splattael
Copy link

@indirect I've just verified that even with Rubygems 2.4.2 Rails fails to bundle on MRI using bundle -j4 after 8ff36b8.

@jjb
Copy link
Contributor Author

jjb commented Oct 8, 2014

@splattael if you are on heroku, maybe you'd like to chime in on heroku/heroku-buildpack-ruby#306

@splattael
Copy link

@jjb Sadly, I don't use heroku but I like to idea to configure bundler jobs via envvars!

@schneems
Copy link
Contributor

I was able to reproduce this issue (i believe) when install using multiple jobs. Using this Gemfile: https://gist.github.com/schneems/98f2d73692cc7422a3db

Here is the verbose output: https://gist.github.com/schneems/cad1866a5fcd65a22f79

$ bundle install -j4 --verbose

# ...

Using travis-core 0.0.1 from git://github.com/travis-ci/travis-core.git (at master)
2: travis-core (0.0.1) from /Users/richardschneeman/.gem/jruby/1.9.3/bundler/gems/travis-core-e57fe2ccd925/travis-core.gemspec
An error occurred while installing travis-sidekiqs (0.0.1), and Bundler cannot
continue.
Make sure that `gem install travis-sidekiqs -v '0.0.1'` succeeds before
bundling.

Here's the line in the backtrace:

0:  travis-support (0.0.1) from /Users/richardschneeman/.gem/jruby/1.9.3/bundler/gems/travis-support-40365216662f/travis-support.gemspec
Errno::ENOENT: No such file or directory - No such file or directory
org/jruby/RubyFile.java:848:in `stat'
/Users/richardschneeman/.rubies/jruby-1.7.16/lib/ruby/shared/rubygems/specification.rb:2535:in `validate_permissions'
org/jruby/RubyArray.java:1613:in `each'

I'm able to get rid of the problem $ bundle install -j1 --verbose so this definitely looks like a race condition.

Here is my script to delete all gems and re-install

Reproduce

Download Gemfile

$ cd /tmp
$ mkdir jruby-race-condition; cd jruby-race-condition
$ curl https://gist.githubusercontent.com/schneems/98f2d73692cc7422a3db/raw/913bc1dc670e6e711a88ddcfa7d3ba0f218a0323/jruby-issues.rb > Gemfile

Make sure you're using jruby

$ chruby jruby

Run this

$ bundle install -j4 --verbose

If the issue doesn't reproduce clear your gems and retry

$ ruby -e "
  list = \`gem list\`
  list.each_line do |line|
    next unless line.match(/\(/)
    name = line.split('(').first
    cmd  = %Q{ gem uninstall #{name} --force --executables }
    puts cmd
    \`#{cmd}\`
  end
"

$ gem install bundler
$ bundle install -j4 --verbose

@indirect
Copy link
Member

Thanks for reporting this! I’ve known for a while that we have a theoretical race condition, but not really been able to figure out how to reproduce it.

On Nov 12, 2014, at 11:21 AM, Richard Schneeman [email protected] wrote:

I was able to reproduce this issue (i believe) when install using multiple jobs. Using this Gemfile: https://gist.github.com/schneems/98f2d73692cc7422a3db https://gist.github.com/schneems/98f2d73692cc7422a3db
Here is the verbose output: https://gist.github.com/schneems/cad1866a5fcd65a22f79 https://gist.github.com/schneems/cad1866a5fcd65a22f79
$ bundle install -j4 --verbose

...

Using travis-core 0.0.1 from git://github.com/travis-ci/travis-core.git (at master)
2: travis-core (0.0.1) from /Users/richardschneeman/.gem/jruby/1.9.3/bundler/gems/travis-core-e57fe2ccd925/travis-core.gemspec
An error occurred while installing travis-sidekiqs (0.0.1), and Bundler cannot
continue.
Make sure that gem install travis-sidekiqs -v '0.0.1' succeeds before
bundling.
Here's the line in the backtrace:

0: travis-support (0.0.1) from /Users/richardschneeman/.gem/jruby/1.9.3/bundler/gems/travis-support-40365216662f/travis-support.gemspec
Errno::ENOENT: No such file or directory - No such file or directory
org/jruby/RubyFile.java:848:in stat' /Users/richardschneeman/.rubies/jruby-1.7.16/lib/ruby/shared/rubygems/specification.rb:2535:invalidate_permissions'
org/jruby/RubyArray.java:1613:in `each'

I'm able to get rid of the problem $ bundle install -j1 --verbose so this definitely looks like a race condition.

Here is my script to delete all gems and re-install

Reproduce

Download Gemfile

$ cd /tmp
$ mkdir jruby-race-condition; cd jruby-race-condition
$ curl https://gist.githubusercontent.com/schneems/98f2d73692cc7422a3db/raw/913bc1dc670e6e711a88ddcfa7d3ba0f218a0323/jruby-issues.rb > Gemfile
Make sure you're using jruby

$ chruby jruby
Run this

$ bundle install -j4 --verbose
If the issue doesn't reproduce clear your gems and retry

$ ruby -e "
list = gem list
list.each_line do |line|
next unless line.match(/(/)
name = line.split('('').first
cmd = %Q{ gem uninstall #{name} --force --executables }
puts cmd
result = #{cmd}
puts result unless $?.success?
end
"

$ gem install bundler
$ bundle install -j4 --verbose

Reply to this email directly or view it on GitHub #3174 (comment).

@headius
Copy link
Contributor

headius commented Nov 12, 2014

Does this reproduce on e.g. Rubinius, since they have native threading too?

@indirect
Copy link
Member

There is at least one race that is very easily reproducible on Rubinius but hasn't been on JRuby; I'm not sure if this is the same race.

On Wed, Nov 12, 2014 at 11:29 AM, Charles Oliver Nutter
[email protected] wrote:

Does this reproduce on e.g. Rubinius, since they have native threading too?

Reply to this email directly or view it on GitHub:
#3174 (comment)

@brixen
Copy link
Contributor

brixen commented Nov 12, 2014

@indirect what race is that? Is it documented in ticket?

@headius
Copy link
Contributor

headius commented Nov 12, 2014

I will be at RubyConf and would love to sit down and sort out remaining threading issues in RG and Bundler.

@indirect
Copy link
Member

@brixen it's a race in Bundler and Rubygems, not a race in Rubinius, and IIRC we fixed it by patching Rubygems.

@headius that would be great. I'll be there too.

@jjb
Copy link
Contributor Author

jjb commented Nov 12, 2014

@schneems

If the issue doesn't reproduce clear your gems and retry

You can also use a higher number of jobs (j16 instead of j4) to increase likelihood of reproduction (at least I conjectured that this was the case, and have never seen it not reproduce with j16)

@Who828
Copy link
Contributor

Who828 commented Nov 17, 2014

@headius I am game for fixing this issue, let me know if you want to sit down and fix the issue.

indirect added a commit that referenced this issue Nov 18, 2014
This wraps the generation of git gem binstubs in the same exclusive lock
that is used by installing threads. Since generating binstubs _also_
generates the entire gem, it seems like there were install threads
stepping on each other while writing the gem itself and while writing
the gem to add the binstubs.

Ultimately, I think we should stop writing out all the files for each
gem twice, which seems to be what is happening here. It’s more important
to fix the exceptions first, though. :)
indirect added a commit that referenced this issue Nov 18, 2014
This wraps the generation of git gem binstubs in the same exclusive lock
that is used by installing threads. Since generating binstubs _also_
generates the entire gem, it seems like there were install threads
stepping on each other while writing the gem itself and while writing
the gem to add the binstubs.

Ultimately, I think we should stop writing out all the files for each
gem twice, which seems to be what is happening here. It’s more important
to fix the exceptions first, though. :)
@indirect
Copy link
Member

@jjb @schneems @splattael together with @Who828, we added a tentative fix for this issue to the 1-7-stable branch. You can try out the fix by downloading this test gem or running Bundler from master. Please let us know if that fixes the issue for you!

@Who828
Copy link
Contributor

Who828 commented Nov 18, 2014

Just wanted add that you might want to uninstall all the git gems before trying out the fix.

@jjb
Copy link
Contributor Author

jjb commented Nov 18, 2014

My test shows that 1.8.0.pre.e212e4 does not fix the problem.

@Who828 should I uninstall all the git gems in order to ensure that the problem is reproduced, or because previous state will make the test fail?

@Who828
Copy link
Contributor

Who828 commented Nov 18, 2014

@jjb The previous state will have an effect.

@jjb
Copy link
Contributor Author

jjb commented Nov 18, 2014

For good measure I blew away my git gems and still have the problem.

If anyone would like me to email you my Gemfile/Gemfile.lock, let me know!

@Who828
Copy link
Contributor

Who828 commented Nov 18, 2014

@jjb Can you send it to my email id? (It's on my profile)

@jjb
Copy link
Contributor Author

jjb commented Nov 18, 2014

@Who828 ah okay, I wasn't expecting that.

To uninstall them I deleted everything in this directory:

~/.rbenv/versions/jruby-1.7.16/lib/ruby/gems/shared/bundler/gems

Is that sufficient or is there some metadata or working files elsewhere that is the culprit?

@Who828
Copy link
Contributor

Who828 commented Nov 18, 2014

delete everything here

~/.rbenv/versions/jruby-1.7.16/lib/ruby/gems

@headius
Copy link
Contributor

headius commented Nov 18, 2014

I patched JRuby's jruby-1_7 branch (jruby/jruby@b74fe44) to include the filename in that ENOENT error. Perhaps that will help a bit.

@jjb
Copy link
Contributor Author

jjb commented Nov 18, 2014

Okay, blew away all gems. Still seeing this

Errno::ENOENT: No such file or directory - No such file or directory
/Users/john/.rbenv/versions/jruby-1.7.16/lib/ruby/shared/rubygems/ext/ext_conf_builder.rb:43 warning: Tempfile#unlink or delete called on open file; ignoring

but am also getting other gem install problems when building native extensions for capybara-webkit. It's nothing I've seen recently, so I would hypothesize that it is related.

@jjb
Copy link
Contributor Author

jjb commented Nov 18, 2014

possibly more data points: on a whim i updated my rubygems from 2.1 to 2.4, uninstalled all gems again (gem uninstall --all -Ix), installed bundler (gem install bundler-1.8.0.pre.e212e4.gem), installed my project's gems (bundle install -j32) and got a failure installing sass, although my sass isn't git-sourced. but my sass-rails is.

@jjb
Copy link
Contributor Author

jjb commented Nov 18, 2014

Can you send it to my email id? (It's on my profile)

@Who828 sent

@Who828
Copy link
Contributor

Who828 commented Nov 18, 2014

@schneems @headius @indirect @jjb I think I have fixed the issue with multiple git source gems by ensuring that directory remains the same for throughout the installation.

Please use the bundler from this branch https://github.com/bundler/bundler/tree/race_condition to test it out.

@jjb the capybara-webkit is an native extension built error and it's not related to this issue. Also, I didn't get to try out this new fix on your Gemfile (I tested with @schneems Gemfile) but I am sure it will work. (just clear old gems once).

@splattael
Copy link

@Who828 I can confirm your fix.

Bundling the Rails repo using race_condition branch works like a charm!

Thank you 💚

@schneems
Copy link
Contributor

I couldn't reproduce the issue with the latest master. So....works for me ❤️ 👍 ❤️

@Who828 Who828 closed this as completed Nov 18, 2014
@jjb
Copy link
Contributor Author

jjb commented Nov 19, 2014

It appears there is something wrong with my environment, so I'm unable to test this.

With both 1.7.6 and master, I keep getting this error for seemingly random gems:

Gem::RemoteFetcher::FetchError: Received fatal alert: bad_record_mac (https://rubygems.org/gems/tilt-1.4.1.gem)

If anyone happens to know what is causing this then I can test bundler master.

Regarding 16b0e16, my $0.02: seems like a project like bundler should have tests for such significant changes. Bundler has a lot of tests elsewhere — perhaps there is a tendency to not test the concurrency features? I've seen this before in projects, where concurrency features are introduced with enthusiasm but the testing methods don't exist yet, and then going forward there is a sort of tacit understanding that those features don't need to be tested.

If there is interest in improved tests in this area I could try to contribute. Let me know what you think.

@indirect
Copy link
Member

If you can contribute tests that deterministicly check whether the concurrent code works, that sounds great! I have no idea how to do that. :)

On Tue, Nov 18, 2014 at 9:15 PM, John Bachir [email protected]
wrote:

It appears there is something wrong with my environment, so I'm unable to test this.
I keep getting this error for seemingly random gems:
Gem::RemoteFetcher::FetchError: Received fatal alert: bad_record_mac (https://rubygems.org/gems/tilt-1.4.1.gem)
If anyone happens to know what is causing this then I can test bundler master.
Regarding 16b0e16, my $0.02: seems like a project like bundler should have tests for such significant changes. Bundler has pretty a lot of tests elsewhere — perhaps there is a tendency to not test the concurrency features? I've seen this before in projects, where concurrency features are introduced with enthusiasm but the testing methods don't exist yet, and then going forward there is a sort of tacit understanding that those features don't need to be tested.

If there is interest in improved tests in this area I could try to contribute. Let me know what you think.

Reply to this email directly or view it on GitHub:
#3174 (comment)

@jjb
Copy link
Contributor Author

jjb commented Nov 19, 2014

it's true that for code like @chdir_monitor ||= Monitor.new the only way to test it is by firing up a few dozen threads to brute force it.

For other types of tests, one can can write a driver which simulates race conditions. Admittedly, one can then go down the path of endlessly refactoring one's code with dependency injection until it's impossible to discern its purpose anymore...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants