Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Inlinability Refinements #21754

Merged
merged 15 commits into from
Jan 14, 2019

Conversation

itaiferber
Copy link
Contributor

What's in this pull request?
Pulls back some of the aggressive inlining we did in #20225 to hopefully improve some code size measurements. Local benchmarks look good speed-wise, but I'll be using CI to measure code size improvements.

Also resolves 46780357 regarding Data.hashValue inlining (which has been replaced with Data.hash(into:)).

Commit-by-commit review is recommended, and I took the liberty of bringing some of the changes I'd made in swift-5.0-branch back into master (e.g. eliminating numericCast in favor of explicit Int conversions as requested by @airspeedswift)

@itaiferber
Copy link
Contributor Author

@swift-ci Please smoke test

@itaiferber
Copy link
Contributor Author

@swift-ci Please benchmark

@@ -300,72 +296,24 @@ internal final class _DataStorage {
_DataStorage.move(_bytes!.advanced(by: origLength), bytes, length)
}

// fast-path for appending directly from another data storage
@inlinable
func append(_ otherData: _DataStorage, startingAt start: Int, endingAt end: Int) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were simply never used.

@@ -17,7 +17,7 @@ import Darwin
#elseif os(Linux)
import Glibc

@inlinable
@inlinable // This is @inlinable as trivially computable.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I audited every usage of @inlinable and @usableFromInline and gave each one a justification for keeping it or changing it. I tried to be consistent with explanations.

@@ -62,14 +62,14 @@ internal func __NSDataIsCompact(_ data: NSData) -> Bool {

#endif

// Underlying storage representation for medium and large data.
// Inlinability strategy: methods from here should not inline into InlineSlice or LargeSlice unless trivial.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every type I audited now has a description of what inlinability we're aiming for.

func copyBytes(to pointer: UnsafeMutableRawPointer, from range: Range<Int>) {
let offsetPointer = UnsafeRawBufferPointer(start: _bytes?.advanced(by: range.lowerBound - _offset), count: Swift.min(range.upperBound - range.lowerBound, _length))
UnsafeMutableRawBufferPointer(start: pointer, count: range.upperBound - range.lowerBound).copyMemory(from: offsetPointer)
}

@inlinable
func replaceBytes(in range: NSRange, with bytes: UnsafeRawPointer?) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also never used

@phausler
Copy link
Contributor

phausler commented Jan 9, 2019

This looks good to me. I think the commentary on the functions really helps clear up why the inlines should be the way they are. +1 for maintainability.

Copy link
Contributor

@phausler phausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending the results of course.

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
StringToDataEmpty 650 700 +7.7% 0.93x (?)
Improvement
DataSetCountMedium 620 470 -24.2% 1.32x
DataReplaceMedium 4400 4000 -9.1% 1.10x (?)
ObjectiveCBridgeStubDataAppend 5884 5383 -8.5% 1.09x
DataReset 2800 2600 -7.1% 1.08x (?)

Code size: -O

TEST OLD NEW DELTA RATIO
Improvement
DataBenchmarks.o 62543 41914 -33.0% 1.49x
ObjectiveCBridgingStubs.o 27427 23204 -15.4% 1.18x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
StringToDataEmpty 600 700 +16.7% 0.86x
DataAccessBytesSmall 89 97 +9.0% 0.92x (?)
Improvement
DataSetCountMedium 620 470 -24.2% 1.32x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Improvement
DataBenchmarks.o 32955 30418 -7.7% 1.08x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DictionaryBridgeToObjC_Access 1013 1129 +11.5% 0.90x (?)
Improvement
DataReplaceMediumBuffer 10300 9500 -7.8% 1.08x (?)
DataReplaceSmallBuffer 8400 7800 -7.1% 1.08x (?)
DataReplaceMedium 4300 4000 -7.0% 1.07x (?)

Code size: -swiftlibs

TEST OLD NEW DELTA RATIO
Improvement
libswiftFoundation.dylib 1593344 1572864 -1.3% 1.01x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB
--------------

@itaiferber
Copy link
Contributor Author

Looks like the failing test is one that asserts that Data and NSData have the same hash value, which is no longer a real requirement. I'll take care of the test tomorrow.

@palimondo
Copy link
Contributor

palimondo commented Jan 10, 2019

I think we don't have any benchmarks that cover the hashValue/hasher performance here, right? I'm curious how does hasher.combine(bytes: $0) compare to CFHashBytes. cc @lorentey

@itaiferber Could we add additional benchmark(s?) that cover that (and the Data.init.Sequence.UnderestimatedCount) in separate PR?

@itaiferber
Copy link
Contributor Author

itaiferber commented Jan 10, 2019

Sure, I can put those up separately. I don't expect any significant regressions in hashing performance.

@itaiferber
Copy link
Contributor Author

@swift-ci Please smoke test

@itaiferber
Copy link
Contributor Author

@palimondo As for the 20x speedup — unfortunately, looks the initial numbers I get for the benchmark are also way, way slower than what we're getting here in CI (2 orders of magnitude off), and the improved numbers are significantly slower than what we're seeing in CI as well. Looks like this is just on my machine; unfortunately, less impressive in practice.

@itaiferber
Copy link
Contributor Author

@airspeedswift Regarding lack of use of internal throughout — I personally prefer it, but we don't tend to use it internally. I can go ahead and annotate everything, but right now I don't know how worth it that might be.

// If we've still got bytes left in the buffer (i.e. the loop ended before we filled up the buffer and cleared it out), append them.
if buffer.count > 0 {
buffer.withUnsafeBytes { _representation.append(contentsOf: $0) }
buffer.count = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll be throwing away (releasing) the buffer as you exit this block, so resetting the count is unnecessary.

// If we've still got bytes left in the buffer (i.e. the loop ended before we filled up the buffer and cleared it out), append them.
if buffer.count > 0 {
buffer.withUnsafeBytes { _representation.append(contentsOf: $0) }
buffer.count = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to reset this on exit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I realize — I was hoping to avoid future bugs in case someone tries to use buffer after this last write. I'll leave a comment instead.

var d = Data()

// d should go from .empty representation to .inline.
// Appending a small enough sequence to fit in linline should actually be copied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: linline

Itai Ferber and others added 3 commits January 11, 2019 11:17
@itaiferber
Copy link
Contributor Author

@swift-ci Please smoke test

@itaiferber
Copy link
Contributor Author

@swift-ci Please benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
Data.append.Sequence.UnderestimatedCount 5809 1206 -79.2% 4.82x
Data.init.Sequence.UnderestimatedCount 5880 1275 -78.3% 4.61x
Data.hash.Medium 174 48 -72.4% 3.62x
Data.hash.Empty 108 71 -34.3% 1.52x
DataSetCountMedium 690 530 -23.2% 1.30x
Data.hash.Small 307 238 -22.5% 1.29x
DataSetCountSmall 154 137 -11.0% 1.12x
ObjectiveCBridgeStubDataAppend 6631 5949 -10.3% 1.11x
CharacterLiteralsLarge 108 97 -10.2% 1.11x
DictionaryBridgeToObjC_Access 984 894 -9.1% 1.10x (?)
DataReplaceMedium 5000 4600 -8.0% 1.09x (?)
DataReplaceMediumBuffer 10500 9700 -7.6% 1.08x (?)
CharacterLiteralsSmall 348 325 -6.6% 1.07x (?)

Code size: -O

TEST OLD NEW DELTA RATIO
Improvement
DataBenchmarks.o 72247 52162 -27.8% 1.39x
ObjectiveCBridgingStubs.o 27427 23204 -15.4% 1.18x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
StringToDataEmpty 700 800 +14.3% 0.88x
DataAccessBytesSmall 100 108 +8.0% 0.93x (?)
Improvement
Data.init.Sequence.UnderestimatedCount 5195 1247 -76.0% 4.17x
Data.append.Sequence.UnderestimatedCount 5068 1219 -75.9% 4.16x
Data.hash.Medium 174 48 -72.4% 3.62x
Data.hash.Empty 108 71 -34.3% 1.52x
Data.hash.Small 307 238 -22.5% 1.29x
DataSetCountMedium 680 530 -22.1% 1.28x (?)
DataSetCountSmall 154 137 -11.0% 1.12x (?)

Code size: -Osize

TEST OLD NEW DELTA RATIO
Improvement
DataBenchmarks.o 41761 39738 -4.8% 1.05x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DataAppendDataLargeToLarge 38800 51400 +32.5% 0.75x (?)
SubstringFromLongString 11 12 +9.1% 0.92x (?)
Improvement
Data.hash.Medium 182 56 -69.2% 3.25x
Data.append.Sequence.UnderestimatedCount 8504 4528 -46.8% 1.88x
Data.init.Sequence.UnderestimatedCount 8581 4583 -46.6% 1.87x
DataSetCountMedium 770 560 -27.3% 1.37x (?)
Data.hash.Small 401 342 -14.7% 1.17x
Data.hash.Empty 180 157 -12.8% 1.15x
DataSetCountSmall 220 202 -8.2% 1.09x (?)

Code size: -swiftlibs

TEST OLD NEW DELTA RATIO
Improvement
libswiftFoundation.dylib 1593344 1572864 -1.3% 1.01x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@itaiferber
Copy link
Contributor Author

@swift-ci Please smoke test and merge

@itaiferber
Copy link
Contributor Author

@swift-ci Please smoke test

@palimondo
Copy link
Contributor

Is this still considered WIP? Will you be cherry-picking this for 5.0 branch?

@itaiferber
Copy link
Contributor Author

@palimondo I'm cherry-picking for swift-5.0-branch right now

@itaiferber itaiferber changed the title [WIP] Data Inlinability Refinements Data Inlinability Refinements Jan 14, 2019
itaiferber added a commit to itaiferber/swift that referenced this pull request Jan 14, 2019
…refinements

Refine Data inlinability

(cherry picked from commit 6c57610)
@palimondo
Copy link
Contributor

palimondo commented Jan 14, 2019

@itaiferber Didn't you forget to push some last commit? The code that landed here still contains unresolved issues around hash(into:) being marked @inlinable and only hashing the 80 bytes. I think @airspeedswift and @lorentey would require that for the 5.0 landing…

@itaiferber
Copy link
Contributor Author

itaiferber commented Jan 14, 2019

No, we don't intend to change that behavior in this PR — if we'd like to make the changes, we can do that separately in a different PR. Given that the Data.hash(into:) implementation is no longer inlined into clients, this is the sort of thing we can also make in the future if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants