Skip to content
This repository has been archived by the owner on Apr 14, 2021. It is now read-only.

Regex parsing of YAML in config leads to 2^n growth of config file size #3261

Closed
rpdillon opened this issue Nov 19, 2014 · 7 comments
Closed

Comments

@rpdillon
Copy link

Bundler writes the settings -j4 to a config file. If that config file does not have the setting already written, Bundler reads the file using a regex, but writes it using the YAML module.

For long, multi-line environment variables, such as those written by the Heroku RGeo buildpack this leads to the the number of single quotes delimiting the variable to double on each write. This circumstance arises in cases such as a Heroku deploy.

After 27 such deploys, the size of the resulting .bundle/config file is 128MB, which consumes ~2G of RAM, wedging a 512MB instance and preventing startup of the app.

This gist demonstrates the issue in isolation, using code pulled directly from Bundler.

Presumably, YAML is parsed using a regex to improve startup time in the common read-only case. The change was made in this commit.

The issue is not present when the YAML in the config file is parsed using the YAML module.

@indirect
Copy link
Member

Sadly, the regex is used because it is impossible to load the YAML gem. :( We'll need to fix the regex rather than use the YAML library to parse the file. Thanks for reporting this!

rpdillon added a commit to apartmentlist/heroku-buildpack-ruby that referenced this issue Nov 19, 2014
This is a medium-term fix that will resolve a bug in the way Bundler
reads config data.

rubygems/bundler#3261
@indirect
Copy link
Member

Unrelated to this bug, the RGeo buildpack overwrites the config file. How is this bug causing the config file to grow?

@rpdillon
Copy link
Author

It does appear that RGeo overwrites the config file! When used in a multi-buildpack, along with the Heroku Ruby buildpack, the overwrite is not respected, since the Ruby buildpack loads from a cache instead. When the cache is purged, the RGeo settings get written as expected, and on subsequent deploys are ignored in favor of the cached config file.

The work-around I'm testing right now is to prevent the Ruby buildpack from loading the config from cache. The change I made is here, though it's obviously not ideal. I've just tested it on Heroku and it resolves the issue, at least in our configuration.

@rpdillon
Copy link
Author

I was under the impression that YAML was in the Ruby standard library (though it does need to be required). Out of curiosity, what prevented its use in Bundler when reading the config?

@indirect
Copy link
Member

The YAML standard library is provided by a vendored copy of the Psych gem. Since Gemfiles may contain versions of Psych other than the version that is vendored with Ruby, Bundler is unable to require ‘YAML’ until after the Gemfile has been set up, which means we aren’t able to use YAML to parse the config file.

On Nov 19, 2014, at 1:24 PM, Rick Dillon [email protected] wrote:

I was under the impression that YAML was in the Ruby standard library (though it does need to be required). Out of curiosity, what prevented its use in Bundler when reading the config?


Reply to this email directly or view it on GitHub #3261 (comment).

@rpdillon
Copy link
Author

That makes sense. Fairly subtle.

@schneems
Copy link
Contributor

Thanks for reporting

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants