Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ModTime check during build (#5125) #5351

Merged
merged 4 commits into from
Aug 6, 2020
Merged

Remove ModTime check during build (#5125) #5351

merged 4 commits into from
Aug 6, 2020

Conversation

aschmois
Copy link
Contributor

@aschmois aschmois commented Jul 27, 2020

As the issue (#5125) suggests, modified time makes using a stack cache in CI very difficult (if at all possible). This PR suggests making a change in how the build process marks a file as dirty.

Summary of changes:

  • When checking the build cache, always create a digest and use that to check against the cache if it exists.
    • Note: let's discuss here about also checking for size.
  • When adding unlisted files to the build cache don't check for pre build time.

Questions and comments:

  • I wasn't able to figure out the styling, hindent seems to change too many lines. I tried to keep it as stable as possible.
  • After this change mod time becomes obsolete (no uses), should we remove it? I wasn't sure how it would affect the json instance supplied to FileCacheInfo.
  • If checking against size is not required removing size would also cleanup the process a lot more (no necessary composed tuples).
  • In terms of performance I built our team's current library with about 1500 modules and there is negligible time difference between the runs. Creating the digest is pretty fast! I do however realize that I run this on a beasty machine so it would be nice to get some insight here.

@aschmois
Copy link
Contributor Author

I don't think the windows integration test failure is correct, looks like it failed in the dependencies section.

@snoyberg
Copy link
Contributor

snoyberg commented Aug 2, 2020

I would prefer to augment rather than remove the modtime check. I would want the heuristics to be something like:

  • If the file modification is older, assume that the file is the same
  • If the file modification is newer, then check the digest and make a determination

That should bypass most of the slowdown potentially instituted by making the digest checks only occur (1) on creating the cache and (2) when the file mod times are incorrect

@tfausak
Copy link
Contributor

tfausak commented Aug 4, 2020

I worked with @aschmois on this. I expected there to be a slowdown from replacing the modification time check with a digest comparison. I was surprised to find that if there was a difference, it was lost in the noise (at least for our project). Is there a large project we could test the relative performance with?

Conceptually I like the idea of only relying on the file's contents rather than its metadata. But also of course a nice concept isn't any good if it's too slow.

@snoyberg
Copy link
Contributor

snoyberg commented Aug 4, 2020

I would test on a monorepo like yesodweb/yesod. But even if testing shows no major performance impact, I'd still be worried that some users, based on type of hard drive, file system, cache settings, etc, would end up having a negative impact.

@aschmois
Copy link
Contributor Author

aschmois commented Aug 4, 2020

I can try and test on different configurations and post the findings, I have a few different devices I can try to build on.

@aschmois
Copy link
Contributor Author

aschmois commented Aug 4, 2020

With the help of @gera-cameron I ran the tests below, please let me know if something looks off about testing methods.

AWS EC2 Testing

Ran on ec2 instance types m5.large. Specifically avoiding the burstable instances to have stable build times. One test run on spinning disks and another on ssd.

Up to 3.1 GHz Intel Xeon® Platinum 8175M processors with new Intel Advanced Vector Extension (AVX-512) instruction set.

https://aws.amazon.com/ec2/instance-types/

Setup:

$ curl -sSL https://get.haskellstack.org/ | sh # install 2.3.1
$ git clone --recurse-submodules http://github.com/yesodweb/yesod
$ cd yesod && stack test && cd - # download ghc and build dependencies
$ git clone http://github.com/aschmois/stack # clone PR
$ cd stack && stack test --copy-bins --local-bin-path bin --ghc-options '-O2' && cd - # build stack and copy binary into ./bin/
$ cp stack/bin/stack yesod/stackx && chmod u+x yesod/stackx # copy modified stack binary as stackx and make it executable
$ cd stack && git checkout b5d30906ebee25df1f2532255e245d329083b623 && stack test --copy-bins --local-bin-path bin --ghc-options '-O2' && cd - # build unmodified stack
$ cp stack/bin/stack yesod/stackz && chmod u+x yesod/stackz # copy unmodified stack binary as stackx and make it executable

Test 1

Each stack binary test started from cold boot then run sequentially. We don't expect numbers to change wildly here since they all start from a clean install

Yesod Build Stack unmodified:

$ stack clean --full && TIMEFORMAT='%6R'; time ./stackz build

Yesod Build Stack modified:

$ stack clean --full && TIMEFORMAT='%6R'; time ./stackx build

Results

All results are in seconds

OS EC2 vCPU RAM EBS version x1 (no io cache) x2 x3 avg
Ubuntu 20.04 m5.large 2 8 standard (magnetic) unmodified 130.105 107.735 108.894 115.578
Ubuntu 20.04 m5.large 2 8 standard (magnetic) modified 128.374 109.875 105.994 114.738
Ubuntu 20.04 m5.large 2 8 gp2 (ssd) 100 iops unmodified 109.004 103.735 104.614 105.785
Ubuntu 20.04 m5.large 2 8 gp2 (ssd) 100 iops modified 108.405 104.605 104.024 105.678

Summary

All of these numbers are within margin of error of each other and we can assume that the same process is happening on each build. This is because digest is being calculated every time since the project was cleaned.

Test 2

This is where we expect things to show differences since no stack clean is done after the first one.

Yesod Build Stack unmodified:

$ stack clean --full && ./stackz build
$ export TIMEFORMAT='%6R'
$ time ./stackz build
$ time ./stackz build
$ time ./stackz build

Yesod Build Stack modified:

$ stack clean --full && ./stackx build
$ export TIMEFORMAT='%6R'
$ time ./stackx build
$ time ./stackx build
$ time ./stackx build

Results

All results are in seconds

OS EC2 vCPU RAM EBS version x1 x2 x3 avg
Ubuntu 20.04 m5.large 2 8 standard (magnetic) unmodified 0.614 0.614 0.604 0.611
Ubuntu 20.04 m5.large 2 8 standard (magnetic) modified 0.626 0.635 0.634 0.632
Ubuntu 20.04 m5.large 2 8 gp2 (ssd) 100 iops unmodified 0.594 0.594 0.594 0.594
Ubuntu 20.04 m5.large 2 8 gp2 (ssd) 100 iops modified 0.606 0.604 0.604 0.605

Summary

We notice a ~21ms difference in magnetic drives calculating digests on a very restricted machine and ~11ms difference in ssd drives.

Based off these results I think we should only use digests since mod times can bring more bugs (such as the CI one I ran into) and performance does not seem to be majorly affected.

@snoyberg
Copy link
Contributor

snoyberg commented Aug 5, 2020

Based on the massive communication misconnect here, I think there's a fundamental misunderstanding here. I think I understand it, but before merging, let's confirm. I said above:

I would prefer to augment rather than remove the modtime check.

followed by:

But even if testing shows no major performance impact, I'd still be worried that some users, based on type of hard drive, file system, cache settings, etc, would end up having a negative impact.

Given that despite these comments, you've moved ahead with a whole bunch of performance testing, I think you're trying to imply:

You're wrong Michael, augmenting is not an option, so let me try to convince you with overwhelming evidence that the performance impact is minimal.

Am I reading this conversation correctly?

@tfausak
Copy link
Contributor

tfausak commented Aug 5, 2020

We don't think you're wrong, and we're not trying to overwhelm you with benchmarks.

Like you, we expected the performance of this digest-based approach to be worse than the current approach based on file modification times. After working on this patch we were pleasantly surprised to find that it made effectively no impact whatsoever for our use case. Since your primary concern appeared to be performance, we tried to run a benchmark where we stacked the deck against ourselves: Large project, shared hosting, under powered machine, and spinning disks. Even with those unfavorable conditions we only saw about a 20 ms penalty, which is about 3% of the build time.

Augmenting is very much an option. However we thought that the approach presented in this PR is both conceptually simpler and results in code that's easier to read, so we figured it was worth a shot. If you're saying that this approach is dead in the water even if the performance impact is minimal, fine.

@snoyberg
Copy link
Contributor

snoyberg commented Aug 5, 2020

Sorry, I didn't mean to imply you're overwhelming me, I just meant that the evidence is overwhelmingly in favor of what you're saying.

I am slightly concerned still, but I'm willing to take this as-is and see if anything complains about their hard drives thrashing later. Thank you!

@snoyberg
Copy link
Contributor

snoyberg commented Aug 5, 2020

Sorry, just one more request: can you update the ChangeLog?

@aschmois
Copy link
Contributor Author

aschmois commented Aug 5, 2020

I'd like to try to smooth out the situation, in no way do I want to attack anyone nor do any harm to this code base. I apologise if I came out like that, I can be a little cold sometimes when discussing code. I thought making fancy benchmarks would make a better case for the code written not worse.

With that out of the way if we are moving forward with this I'd like to see if removing the mod time and size (if we don't need it as part of the digest check) is also something you think can be done I'd like to do so to cleanup the tuple composing.

An afterthought, augmenting mod time does seem to be the best option in terms of safety but I think after looking at the benchmarks being more data safe can avoid some future bugs. I've always seen horror stories around mod time checks for caching and have honestly been battling wiht this bug for over a year; not directly but it has been at the back of my mind for that long 😅 . I really want what's best for stack not any random code! Please let me know what you think.

@snoyberg
Copy link
Contributor

snoyberg commented Aug 5, 2020

Really, honestly, nothing to apologize for. I just wanted to hone in on whether there was a correctness issue I was missing, or if there was a different reason. Taylor clarified the situation to my satisfaction in the comment above.

IIUC, we're not using the filesize or timestamp at all, so please do feel free to update the PR by removing them. I'm also not a fan of anonymous tuples in general, so either removing the multiple-data-returns or creating a custom datatype if they are needed would be great.

Thanks!

Copy link
Contributor

@snoyberg snoyberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants