Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keg_relocate: cache files rewritten during brew bottle #1253

Merged
merged 7 commits into from
Oct 24, 2016
Merged

keg_relocate: cache files rewritten during brew bottle #1253

merged 7 commits into from
Oct 24, 2016

Conversation

jawshooah
Copy link
Contributor

@jawshooah jawshooah commented Oct 10, 2016

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew tests with your changes locally?

brew bottle replaces instances of the Homebrew prefix, cellar, and repository with placeholders in all text files. Cache these files in INSTALL_RECEIPT.json so that we don't have to check every single text file for placeholders on install.

Also add a couple of other optimizations to speed up building bottles.

Fixes #1250.

@BrewTestBot BrewTestBot added the in progress Maintainers are working on this label Oct 10, 2016
@jawshooah
Copy link
Contributor Author

@zmwangx This cuts the install time for mysql on my machine down from ~100s to ~10s.

@@ -777,13 +777,18 @@ def pour
end

keg = Keg.new(formula.prefix)
tab_file = formula.prefix.join(Tab::FILENAME)
# Skip the cache since the receipt will be rewritten
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this comment, can you elaborate a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Tab.from_file and Tab.for_keg both cache the resulting Tab instance so that when either is called with the same path, the same instance is returned. Tab.from_file_content bypasses the cache. This is desirable here because keg.relocate_text_files will rewrite the tab file, and we want to load in the rewritten file before modifying it and writing it out again a few lines down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to nuke the cache before rereading rather than use a method that's not using the cache (and should probably be private as a result).

def relocate_text_files(old_prefix, new_prefix, old_cellar, new_cellar,
old_repository, new_repository)
files = text_files | libtool_files
def relocate_text_files(old_prefix, new_prefix, old_cellar, new_cellar, # rubocop:disable Metrics/ParameterLists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth passing in a hash instead of disabling this cop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll clean this up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -75,8 +78,8 @@ def text_files
path.find do |pn|
next if pn.symlink? || pn.directory?
next if Metafiles::EXTENSIONS.include? pn.extname
if Utils.popen_read("/usr/bin/file", "--brief", pn).include?("text") ||
pn.text_executable?
if pn.text_executable? ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea on flipping these 👍

class Metafiles
# https://github.com/github/markup#markups
EXTENSIONS = %w[
.adoc .asc .asciidoc .creole .html .markdown .md .mdown .mediawiki .mkdn
.org .pod .rdoc .rst .rtf .textile .txt .wiki
].freeze
].to_set.freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a frozen array I'm not sure I see the advantages of making it a set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O(1) lookup versus O(n). If order doesn't matter and there are no duplicates, why use an array when you could use a set?

Copy link
Contributor Author

@jawshooah jawshooah Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that Keg#text_files queries this collection once for every file, so bringing the complexity of that search down is worthwhile if we care about brew bottle runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithmic speedup is always nice but given Sets aren't strictly interchangeable with arrays I figure it's usually worth using arrays unless profiling shows using sets produces a measurable speedup. Based on your previous comments it sounds like it does so 👍

@@ -205,6 +206,7 @@ def bottle_formula(f)
tab.poured_from_bottle = false
tab.HEAD = nil
tab.time = nil
tab.changed_files = changed_files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with putting it in the tab is that this may be a very long list. How many values are there for e.g. mysql?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my INSTALL_RECEIPT.json for mysql:

{
  "used_options": [],
  "unused_options": [
    "--with-test",
    "--with-embedded",
    "--with-archive-storage-engine",
    "--with-blackhole-storage-engine",
    "--with-local-infile",
    "--with-debug"
  ],
  "built_as_bottle": true,
  "poured_from_bottle": true,
  "changed_files": [
    "INSTALL_RECEIPT.json",
    "bin/mysql_config",
    "bin/mysqld_multi",
    "bin/mysqld_safe",
    "homebrew.mxcl.mysql.plist",
    "include/mysql/my_config.h",
    "lib/pkgconfig/mysqlclient.pc",
    "mysql-test/Makefile",
    "mysql-test/suite/ndb/include/have_java.inc",
    "share/doc/mysql/INFO_BIN",
    "share/man/man1/myisam_ftdump.1",
    "share/man/man1/mysql.server.1",
    "share/man/man1/mysql_config.1",
    "share/man/man1/mysql_install_db.1",
    "share/man/man1/mysqld_multi.1",
    "share/man/man1/mysqld_safe.1",
    "share/man/man1/mysqldumpslow.1",
    "support-files/mysql-log-rotate",
    "support-files/mysql.server",
    "support-files/mysqld_multi.server"
  ],
  "time": 1476101424,
  "source_modified_time": 1472126415,
  "HEAD": null,
  "stdlib": "libcxx",
  "compiler": "clang",
  "runtime_dependencies": [],
  "source": {
    "path": "/usr/local/Library/Taps/homebrew/homebrew-core/Formula/mysql.rb",
    "tap": "homebrew/core",
    "spec": "stable",
    "versions": {
      "stable": "5.7.15",
      "devel": null,
      "head": null,
      "version_scheme": 0
    }
  }
}

Not too bad, I think. Would you rather output to a separate file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah that seems fine, thanks 👍

@MikeMcQuaid
Copy link
Member

Loving the work here so far, great job @jawshooah!

if Utils.popen_read("/usr/bin/file", "--brief", pn).include?("text") ||
pn.text_executable?
if pn.text_executable? ||
Utils.popen_read("/usr/bin/file", "--brief", pn).include?("text")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be faster running this only once for all files?

files = path.find.reject { |pn| 
  pn.symlink? || pn.directory || Metafiles::EXTENSIONS.include?(pn.extname)
}

text_file_indices = Utils.popen_read("/usr/bin/file", "--no-dereference", "--brief", *files)
                         .lines
                         .map.with_index { |line, i| line.include?("text") ? i : nil }
                         .compact

text_files = files.values_at(*text_file_indices)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reitermarkus Some local testing looks like it should be, good catch. There's probably a maximum number of arguments we can specify but batching them may well produce a speedup 👍

Copy link
Member

@reitermarkus reitermarkus Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to get rid of

once before by using getconf ARG_MAX, but somehow got sidetracked. 😄

@ilovezfs ilovezfs added cask Homebrew Cask and removed cask Homebrew Cask labels Oct 11, 2016
@jawshooah
Copy link
Contributor Author

jawshooah commented Oct 14, 2016

@MikeMcQuaid are you still interested in this now that #1258 and friends have been merged? It still offers a significant boost in install time, ~4x from my rough testing.

@MikeMcQuaid
Copy link
Member

It still offers a significant boost in install time, ~4x from my rough testing.

This was going to be my next question! In that case: yes, definitely! Please also check a bottle with/without this data can still be poured/relocated correctly both without/with this change (i.e. that all four combinations work fine).

Great work here!

@MikeMcQuaid
Copy link
Member

Any news here @jawshooah? Shout if I can help at all.

old_repository, new_repository)
files = text_files | libtool_files
def replace_locations_with_placeholders
relocate_dynamic_linkage(HOMEBREW_PREFIX.to_s, PREFIX_PLACEHOLDER,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: might want to make the two prefix args a key/value of a hash or something? Doesn't really matter but I always get a little 😨 about things that rely on arg ordering like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact: like you did below with replacements 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a Struct? f798ba4

@jawshooah
Copy link
Contributor Author

@MikeMcQuaid commented on Oct 23, 2016, 4:34 AM EDT:

Any news here @jawshooah? Shout if I can help at all.

Just need to get around to re-verifying that everything works in the scenarios you mentioned. Nothing blocking me but time 😄

jawshooah and others added 7 commits October 24, 2016 16:21
`brew bottle` replaces instances of the Homebrew prefix, cellar, and
repository with placeholders in all text files. Cache these files in
INSTALL_RECEIPT.json so that we don't have to check every single text
file for placeholders on install.
Replace relocate_text_files with three methods that clarify intent:
replace_locations_with_placeholders, replace_placeholders_with_locations
and replace_text_in_files, the first two calling the third.
@jawshooah
Copy link
Contributor Author

@MikeMcQuaid I've verified that there are no problems installing with/without the data and with/without the change. Merging now as CI is 💚 and you've already approved the changes.

@jawshooah jawshooah merged commit 79e8cdd into Homebrew:master Oct 24, 2016
@jawshooah jawshooah deleted the perf/relocate-text-files branch October 24, 2016 20:44
@BrewTestBot BrewTestBot removed the in progress Maintainers are working on this label Oct 24, 2016
@scpeters
Copy link
Member

this may have broken the mercurial bottle for me, still investigating...

@scpeters
Copy link
Member

$ head -1 /usr/local/opt/mercurial/bin/hg
#!@@HOMEBREW_PREFIX@@/opt/python/bin/python2.7

@scpeters
Copy link
Member

I reverted to e6bce5e, and the problem went away, so I think it's an issue with this PR.

Here's what I see when I install mercurial:

brew install mercurial
==> Downloading https://homebrew.bintray.com/bottles/mercurial-3.9.2.yosemite.bottle.tar.gz
Already downloaded: /Users/jenkins/Library/Caches/Homebrew/mercurial-3.9.2.yosemite.bottle.tar.gz
==> Pouring mercurial-3.9.2.yosemite.bottle.tar.gz
Error: No such file or directory - /usr/local/opt/mercurial/bin/hg

There is a file at that location though:

$ head /usr/local/opt/mercurial/bin/hg
#!@@HOMEBREW_PREFIX@@/opt/python/bin/python2.7
#
# mercurial - scalable distributed SCM
#
# Copyright 2005-2007 Matt Mackall <[email protected]>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.

import os

@zmwangx
Copy link
Contributor

zmwangx commented Oct 25, 2016

There's a bug here with deleting pyc files in the wrong order. Causing a failure: https://bot.brew.sh/job/Homebrew%20Core/10080/version=el_capitan/testReport/junit/brew-test-bot/el_capitan/bottle_buku/. Fix in #1375.

ilovezfs added a commit that referenced this pull request Oct 25, 2016
@Homebrew Homebrew locked and limited conversation to collaborators May 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keg.relocate_text_files performance is awful
7 participants