-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable buffer size and flat compression for file entries for SBT #375
Conversation
signed CLA, please revalidate |
@@ -46,7 +46,10 @@ object IO { | |||
val temporaryDirectory = new File(System.getProperty("java.io.tmpdir")) | |||
|
|||
/** The size of the byte or char buffer used in various methods. */ | |||
private val BufferSize = 8192 | |||
private val BufferSize = | |||
Option(System.getProperty("sbt.io.buffer.size")).map(_.toInt).getOrElse(8192) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/sbt/sbt/blob/bcf5ded572711854b9ba28cec1e30114b428380b/main/src/main/scala/sbt/internal/SysProp.scala#L81-L92, which is a mini style guide for system properties. I'd suggest name sbt.io.bufferbyte
?
In general, I wonder if we should copy-paste SysProp.scala to sbt.internal.io or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for paying attention!
I believe as is it does comply with style guidelines as articulated in SysProp.scala
// System property style:
// 1. use sbt. prefix
// 2. prefer short nouns
// 3. use dot for namespacing, and avoid making dot-separated English phrase
// 4. make active/enable properties, instead of "sbt.disable."
and also existing values such as
def taskTimings: Boolean = getOrFalse("sbt.task.timings")
def taskTimingsOnShutdown: Boolean = getOrFalse("sbt.task.timings.on.shutdown")
def taskTimingsThreshold: Long = long("sbt.task.timings.threshold", 0L)
def taskTimingsOmitPaths: Boolean = getOrFalse("sbt.task.timings.omit.paths")
that being said, it's all cosmetics, it's up to you as maintainer to define final naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste SysProp.scala to sbt.internal.io
looks a good suggestion, however this was far out of scope of my humble hack
While that might be true:
In general, parameterizing only via system property might end up creating untested code paths or worse, a build tool that behaves differently depending on the machine setup. For behavior that only affects the performance or UX it might be ok, but compression level hits a bit different, because creating a JAR is a major job of sbt. |
@jacum Not sure if this is 100% relevant but sbt zinc serializes incremental compilation dependency data called Analysis with its own GZIP compression stream (which can potentially get very large). Just want to confirm that the overhead is indeed on compression done with sbt io instead of Zinc's own Analysis compression.
Would it be possible to share some profiling data? If it takes too much hassle to present the data in a clean way, just the raw data and a very brief description of profiling setup can still be really valuable. |
Yea. This is also a good point, we recently introduce parallel gzipping, so maybe that could be affecting some environments in a negative way. |
Thanks for paying attention! All we wanted to achieve now is to prevent Unlike jars published to repositories, the cached jars are short living anyway - in the scope of the pipeline - and trading tons of cpu and pipeline runtime spent on g(un)zipping isn't definitely worth a few saved megabytes of the cache size. Hope to be able to share some figures with you soon. |
Will look into it. I think maybe we can add a settings key at sbt side either called I definitely see how this can be a pervasive issue. I was experimenting with some CI optimizations as part of my recent SWE internship and the potato CI was absolutely CPU starved, both in terms of single threaded performance and # of physical threads... Being able to avoid any CPU heavy workflow has potential for major speed up in such cases, and compression can definitely play a major role. |
@Friendseeker your insights are much appreciated. |
We noticed that in CI pipelines running sbt, substantial part of the time and resources are spent to compress and uncompress the artefacts.
This is very pronounced for the remote cache feature, which is a great way of sharing the stuff between several jobs in the same CI pipeline - however, actually storing and fetching artefacts is a heavy process.
It is aggravated by the fact that the CI pipelines usually are constrained by CPU and memory, and gzip compression is quite demanding on that. Our CI jobs could easily spend as much as half of the time pulling and pushing stuff from remote cache.
As the storage is cheap, reduced artefact size doesn't really add much of the value.
The proposed minimal change doesn't alter existing behaviour by default, but adds a way to override file copy buffer size and disable compression for the file entries created by SBT.