-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server,admission: automatically infer disk names for stores #109350
server,admission: automatically infer disk names for stores #109350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @bananabrick, @irfansharif, and @sumeerbhola)
pkg/server/node.go
line 1065 at r1 (raw file):
// TODO(irfansharif): Is this platform specific? (Seems to work on M1s, // gceworkers, GCP/AWS instances with/without attached storage devices). func getDeviceIDsToNames() (map[uint64]string, error) {
Can you try instead use our upstream dependency gosigar
which has various APIs to do just this (you might need to upgrade to a later version).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @bananabrick, @knz, and @sumeerbhola)
pkg/server/node.go
line 1065 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Can you try instead use our upstream dependency
gosigar
which has various APIs to do just this (you might need to upgrade to a later version).
I'm looking at their release docs: https://pkg.go.dev/github.com/elastic/gosigar, it doesn't expose the device ID specifically needed to do this mapping. Alternatively, it's not exposing anything that obviates the need for such a device ID given we're trying to go from a store ID (where we can deduce the path to some file within the store) => device name. Any other clues? I'll look at the code to see what platform-neutral bits to cargocult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @bananabrick, @irfansharif, and @sumeerbhola)
pkg/server/node.go
line 1065 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I'm looking at their release docs: https://pkg.go.dev/github.com/elastic/gosigar, it doesn't expose the device ID specifically needed to do this mapping. Alternatively, it's not exposing anything that obviates the need for such a device ID given we're trying to go from a store ID (where we can deduce the path to some file within the store) => device name. Any other clues? I'll look at the code to see what platform-neutral bits to cargocult.
I was thinking you would use https://pkg.go.dev/github.com/cloudfoundry/gosigar#FileSystemList
We have done this in another place before. See pkg/storage/store_properties.go.
In fact I wonder if you should not simply extend this latter source file with what you need, instead of adding logic here.
Generally, pkg/server is not the right place to do OS-specific stuff (or anything relating to the OS really). If there's something you need that gosigar (and store_properties) don't help you with, let's discuss this on #server and get Ben's input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just looked at the store spec changes since @knz is reviewing the other parts
Reviewed 4 of 5 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @bananabrick, and @irfansharif)
pkg/base/store_spec.go
line 231 at r1 (raw file):
} if len(spec.DiskName) == 0 { return ProvisionedRateSpec{},
needs updating
pkg/base/store_spec.go
line 310 at r1 (raw file):
fmt.Fprint(&buffer, ",") } if len(ss.ProvisionedRateSpec.DiskName) > 0 {
this needs to change if DiskName is optional, since presumably we can now specify bandwidth and not specify disk-name
pkg/base/store_spec.go
line 362 at r1 (raw file):
// - 0.2 -> 20% of the available space // - attrs=xxx:yyy:zzz A colon separated list of optional attributes. // - provisioned-rate=disk-name=<disk-name>[:bandwidth=<bandwidth-bytes/s>] The
comment needs updating
pkg/cli/cliflags/flags.go
line 1002 at r1 (raw file):
bandwidth saturation, the "provisioned-rate" field can be specified with the optional "disk-name" and "bandwidth" parameters. It's only when the "bandwidth" parameter is specified for a given store that admission control
We could enable this solely if kvadmission.store.provisioned_bandwidth
is positive, if we can infer the disk name. But if we can't infer the disk name, we don't have a good way to tell the user that even though they may be expecting disk bandwidth enforcement, it isn't happening.
e1603e1
to
5a9c520
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @bananabrick, @knz, and @sumeerbhola)
pkg/base/store_spec.go
line 231 at r1 (raw file):
Previously, sumeerbhola wrote…
needs updating
Removed.
pkg/base/store_spec.go
line 310 at r1 (raw file):
Previously, sumeerbhola wrote…
this needs to change if DiskName is optional, since presumably we can now specify bandwidth and not specify disk-name
Done.
pkg/base/store_spec.go
line 362 at r1 (raw file):
Previously, sumeerbhola wrote…
comment needs updating
Done.
pkg/cli/cliflags/flags.go
line 1002 at r1 (raw file):
Updated the doc text to make this more explicit. I actually forgot the the cluster setting can be used to easily enable this functionality - exciting!
But if we can't infer the disk name, we don't have a good way to tell the user that even though they may be expecting disk bandwidth enforcement, it isn't happening.
I'm not sure what to do about it, I think I'll ignore this for now. Perhaps we can make use of some SQL-hinting when mucking with that specific cluster setting. I added TODO next to that setting.
pkg/server/node.go
line 1065 at r1 (raw file):
I was thinking you would use https://pkg.go.dev/github.com/cloudfoundry/gosigar#FileSystemList
These APIs don't seem to pull out the same kinds of device names we're after. DevName
from https://pkg.go.dev/github.com/cloudfoundry/gosigar#FileSystem returns blkdev=tmpfs
when running using cockroach start-single-node --store $(mktemp -d) --insecure --logtostderr
on my GCE worker for example, when what I want is s1 mapped to disk ‹sda1›
that I'm getting currently. Looking through the library code, it's populating the device names using /etc/mtab
. If I've got the vocabulary right, this is the block device name, but I'm after the physical device name. I don't see the right primitives in gosigar to do what I'm doing in this code block.
I'm not married to having this code here. I agree this being OS-specific makes it better placed outside of pkg/server so I've added a TODO saying as much. If ok with you though, I'd really like to defer doing this. Just in the interest of building on top of this PR for some roachtests I want to check in making use of this very-unused AC mechanism.
5a9c520
to
c6ee2b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @bananabrick, @knz, and @sumeerbhola)
pkg/server/node.go
line 1065 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I was thinking you would use https://pkg.go.dev/github.com/cloudfoundry/gosigar#FileSystemList
These APIs don't seem to pull out the same kinds of device names we're after.
DevName
from https://pkg.go.dev/github.com/cloudfoundry/gosigar#FileSystem returnsblkdev=tmpfs
when running usingcockroach start-single-node --store $(mktemp -d) --insecure --logtostderr
on my GCE worker for example, when what I want iss1 mapped to disk ‹sda1›
that I'm getting currently. Looking through the library code, it's populating the device names using/etc/mtab
. If I've got the vocabulary right, this is the block device name, but I'm after the physical device name. I don't see the right primitives in gosigar to do what I'm doing in this code block.I'm not married to having this code here. I agree this being OS-specific makes it better placed outside of pkg/server so I've added a TODO saying as much. If ok with you though, I'd really like to defer doing this. Just in the interest of building on top of this PR for some roachtests I want to check in making use of this very-unused AC mechanism.
Looks like the linter fought me. I put some of this logic in pkg/util/sysutil instead - PTAL.
c6ee2b4
to
e719124
Compare
Part of cockroachdb#86857. This commit eliminate the need to provide the disk-name common environments e.g. linux with store on EBS or GCP PD. To make use of AC's disk bandwidth tokens, users still need to specify the provisioned bandwidth, for now. So in a sense this machinery is still "disabled by default". They can also do this through the kvadmission.store.provisioned_bandwidth cluster setting. Next steps: - add roachtests that make use of these disk bandwidth tokens; - automatically measure provisioned bandwidth, using something like github.com/irfansharif/probe, gate behind envvars or cluster settings; - roll it out in managed environments; - roll it out elsewhere. Release note: None
e719124
to
885abad
Compare
Let's see where this conversation goes before this merges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @bananabrick, @irfansharif, and @knz)
-- commits
line 6 at r2:
nit: elminates
pkg/kv/kvserver/kvadmission/kvadmission.go
line 106 at r2 (raw file):
// don't have an easy way to communicate to the user that admission control // enforcement for disk bandwidth is not happening, despite expectations. Maybe // we surface this through SQL hints somehow?
what are "SQL hints"?
pkg/server/node.go
line 1022 at r2 (raw file):
// TODO(irfansharif): Use something like github.com/irfansharif/probe to // measure this empirically on process start, if not explicitly passed in. provisionedRate map[roachpb.StoreID]base.ProvisionedRateSpec
The code on master ensured these two maps were consistent. But if there was not a ProvisionedRateSpec
it fell back to using clusterProvisionedBandwidth
in a second place (in Node.GetPebbleMetrics
).
We need a slightly different solution now that the ProvisionedRateSpec may not provide the disk name, and the auto discovery may fail, so we can't ensure the two maps are consistent.
I think we should have an invariant that provisionedRate
has an entry for each store, even if the spec is empty. diskNameToStoreID
will be a subset, and if something is not in diskNameToStoreID
we won't be able to populate the usage (which will be 0).
pkg/server/node.go
line 1029 at r2 (raw file):
} stats = make(map[roachpb.StoreID]admission.DiskStats) for id, spec := range dsm.provisionedRate {
see my earlier comment on having an entry in provisionedRate
for each store -- in that case this code should not need to change.
pkg/server/node.go
line 1100 at r2 (raw file):
// If users specify disk names manually, compare the automatically // observed data against what they spelled out. dsm.provisionedRate[storeIdent.StoreID] = specs[i].ProvisionedRateSpec
So if len(specs[i].ProvisionedRateSpec.DiskName == 0
we are not even using the ProvisionedRateSpec
?
Can we test this by updating TestDiskStatsMap
? Same question for the next comment.
pkg/server/node.go
line 1105 at r2 (raw file):
if storeID, found := dsm.diskNameToStoreID[diskName]; found { if storeID != storeIdent.StoreID { return errors.Newf("mismatched store ID for disk %s, expected %d got %d",
I think we should log a warning and continue, and use whatever was configured, since it is possible that what is being automatically discovered is not quite right, and which is why the operator has provided an override.
pkg/util/sysutil/device_unix.go
line 27 at r2 (raw file):
// GetDeviceMap returns a map of device IDs to device names, for all physical // devices. func GetDeviceMap() (map[uint64]string, error) {
I see why the map is built once since the uint64
from GetDeviceID
doesn't allow us to narrow down this ReadDir
and related work.
But how about hiding these details behind a struct:
type DeviceNameMapper struct {
...
}
func MakeDeviceNameMapper() DeviceNameMapper {
// Do the map init
...
}
func (mapper DeviceNameMapper) GetDeviceNameForFile(filename string) (name string, ok bool) {
...
}
The stat of the filename can be done here by calling os.Stat
. We don't really want to do that stat using the vfs.FS
of the engine since there is an underlying assumption that we are not dealing with memFS
or something like that for that engine.
getDeviceMap
and geDeviceIDForFilename
would be the OS specific code, and the above struct would be shared.
Hey @sumeerbhola, @aadityasondhi: I think what's here is pretty salvageable to get disk bandwidth tokens a bit easier to run for users. But I'll close this PR - I'll leave it to y'all to resuscitate if it's still the sort of thing we want to do. |
115375: changefeedccl: reduce rebalancing memory usage from O(ranges) to O(spans) r=jayshrivastava a=jayshrivastava ### sql: count ranges per partition in PartitionSpans This change updates span partitioning to count ranges while making partitions. This allows callers to rebalance partitions based on range counts without having to iterate over the spans to count ranges. Release note: None Epic: None ### changefeedccl: reduce rebalancing memory usage from O(ranges) to O(spans) #115375 Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change rewrites it to use range iterators, reducing the memory usage to O(spans). This change also adds a randomized test to assert that all spans are accounted for after rebalancing. It also adds one more unit test. Informs: #113898 Epic: None ### changefeedccl: add rebalancing checks This change adds extra test coverage for partition rebalancing in changefeeds. It adds checks which are performed after rebalancing to assert that the output list of spans covers exactly the same keys as the input list of spans. These checks are expensive so they only run if the environment variable `COCKROACH_CHANGEFEED_TESTING_REBALANCING_CHECKS` is true. This variable is true in cdc roachtests and unit tests. Release note: None Epic: None 119885: storage: support per-store IO metrics with fine granularity r=jbowens,abarganier a=CheranMahalingam Currently, timeseries metrics are collected on a 10s interval which hides momentary spikes in IO. This commit introduces a central disk monitoring system that polls for disk stats at a 100ms interval. Additionally, the current system accumulates disk metrics across all block devices which includes noise from unrelated processes. This commit also adds support for exporting per-store IO metrics (i.e. IO stats for block devices that map to stores used by Cockroach). These changes will be followed up by a PR to remove the need for customers to specify disk names when setting the provisioned bandwidth for each store as described in #109350. Fixes: #104114, #112898. Informs: #89786. Epic: None. Release note: None. 120649: changefeedccl: avoid undefined behavior in distribution test r=wenyihu6 a=jayshrivastava The `rangeDistributionTester` would sometimes calculate log(0) when determining the node to move a range too. Most of the time, this would be some garbage value which gets ignored. Sometimes, this may return a valid node id, causing the range distribution to be wrong and failing the test failures. This change updates the tester to handle this edge case. Closes: #120470 Release note: None Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Cheran Mahalingam <[email protected]>
Part of #86857.
This commit eliminate the need to provide the disk-name common environments e.g. linux with store on EBS or GCP PD. To make use of AC's disk bandwidth tokens, users still need to specify the provisioned bandwidth, for now. So in a sense this machinery is still "disabled by default". Next steps:
Release note: None