-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BRT: Skip getting length in brt_entry_lookup() #15950
Conversation
Unlike DDT, where ZAP values may have different lengths due to compression, all BRT entries are identical 8-byte counters. It does not make sense to first fetch the length only to assert it. zap_lookup_uint64() is specifically designed to work with counters of different size and should return error if something odd found. Calling it straight allows to save some measurable CPU time. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc.
error == 0 ? (u_longlong_t)bre->bre_refcount : 0, error); | ||
} | ||
error = zap_lookup_uint64(brt->brt_mos, mos_entries, &bre->bre_offset, | ||
BRT_KEY_WORDS, 1, sizeof (bre->bre_refcount), &bre->bre_refcount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't bother me too much, but should we keep the BRT_DEBUG
in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my previous tests I've noticed that excessive per-block logging makes this code very slow in debug builds, plus wipes out other messages from the log. I was going to clean it up any way, but refactoring in this set of patches just provided me an excuse to do it.
While working on this PR, I've noticed more serious issue. BRT code specifies to ZAP functions integer_size = 1 and num_integers = 8 (sizeof (bre->bre_refcount)), that in my understanding stores/reads the value as binary data in host byte order. Same time, unlike DDT, which stores byte order in the header and doing explicit byteswap, BRT doesn't do it. Unless I miss something, it looks like BRT currently is not endian-safe. The proper solution IMO would be to specify to ZAP integer_size = 8 and num_integers = 1, that would make ZAP to store counters as big-endian and handle byte swaps, but that would change on-disk format and would probably require another feature flag. On read we could just retry reading in old format if we get EOVERFLOW, but on write we'd need to know what format to write for reader compatibility. |
As alternative to adding new feature and changing ZAP format, we could reuse bv_need_byteswap logic. It would break compatibility for any big-endian systems, if already existing, keeping little-endian just as-is. Though I am not sure it is a good way. |
I agree this doesn't look to be endian-safe unless there's some subtle cleverness here we're overlooking. Maybe @pjd can shed some light on while it was done this way initially. It'd be nice to avoid a second feature flag for this if at all possible. |
Unlike DDT, where ZAP values may have different lengths due to compression, all BRT entries are identical 8-byte counters. It does not make sense to first fetch the length only to assert it. zap_lookup_uint64() is specifically designed to work with counters of different size and should return error if something odd found. Calling it straight allows to save some measurable CPU time. Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15950
Unlike DDT, where ZAP values may have different lengths due to compression, all BRT entries are identical 8-byte counters. It does not make sense to first fetch the length only to assert it. zap_lookup_uint64() is specifically designed to work with counters of different size and should return error if something odd found. Calling it straight allows to save some measurable CPU time. Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes #15950
Unlike DDT, where ZAP values may have different lengths due to compression, all BRT entries are identical 8-byte counters. It does not make sense to first fetch the length only to assert it. zap_lookup_uint64() is specifically designed to work with counters of different size and should return error if something odd found. Calling it straight allows to save some measurable CPU time. Reviewed-by: Pawel Jakub Dawidek <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15950
Unlike DDT, where ZAP values may have different lengths due to compression, all BRT entries are identical 8-byte counters. It does not make sense to first fetch the length only to assert it. zap_lookup_uint64() is specifically designed to work with counters of different size and should return error if something odd found. Calling it straight allows to save some measurable CPU time.
Types of changes
Checklist:
Signed-off-by
.