-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: fix the demo licensing code #41029
Conversation
5eec2ce
to
be86dc5
Compare
21af54a
to
8702a72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good restructure of the demo code -- the setupTransientServers function was for sure getting too large. Making the workload/partitioning setup not asynchronous makes things conceptually simpler, but I remember doing it at your request
Reviewed 2 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @knz, and @rohany)
pkg/cli/demo.go, line 242 at r1 (raw file):
} if !success { if demoCtx.geoPartitionedReplicas {
I think it would be clearer to make licenseDone a chan err
, and send an error over to the part of the code that is doing things related to geoPartitionedReplicas, so that all that code is in one place.
pkg/cli/demo.go, line 274 at r1 (raw file):
} // If we are requested to prepartition our data spawn a goroutine to do the partitioning.
This comment should be updated now
pkg/cli/demo.go, line 288 at r1 (raw file):
defer db.Close() // Based on validation done in setup, we know that this workload has a partitioning step. if err := gen.(workload.Hookser).Hooks().Partition(db); err != nil {
i wonder if this will take a long time for the shell to startup under the --global
flag
pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 21 at r1 (raw file):
send "SELECT count(*) AS NRPARTS FROM \[SHOW PARTITIONS FROM DATABASE movr\];\r" eexpect nrparts
why doesn't this need to be in quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the workload/partitioning setup not asynchronous makes things conceptually simpler, but I remember doing it at your request
Yes, I vaguely remember this, but for having tried to debug / fix #40734 and #41024 with the async behavior, I realized that this much asynchrony makes things difficult and also not great UX. So I changed my mind.
That said, I'm still convinced license acquisition should be async in the common case. I'm only fine with the reference workload for global tests to make the initialization wait.
Also as you've seen I've added a "please wait" in there.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)
pkg/cli/demo.go, line 242 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I think it would be clearer to make licenseDone a
chan err
, and send an error over to the part of the code that is doing things related to geoPartitionedReplicas, so that all that code is in one place.
I tried this and it's not much clearer. a chan err
does not help, we'd still need to distinguish success && err == nil
, !success && err == nil
and err != nil
and that means a 3-valued thing (or bool + error, a struct).
Or maybe I was not able to figure out what you mean? If you want to guide my hand, I'll take it.
pkg/cli/demo.go, line 274 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
This comment should be updated now
Done.
pkg/cli/demo.go, line 288 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
i wonder if this will take a long time for the shell to startup under the
--global
flag
Yes it can but it's a feature: we don't want to present a prompt to a user if there's a chance that the first SQL query they're going to type (or copy-paste into it) will fail, just because it's racing with the initialization.
pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 21 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
why doesn't this need to be in quotes?
Changed for clarity.
(tip: TCL has string rules similar to the unix shell, so technically the quotes are not neede here)
Release justification: fixes a flaky test, fixes UX of main new feature Before this patch, there were multiple problems with the code: - if the license acquisition was disabled by the env var config, the error message would not be clear. - the licensing code would deadlock silently on OSS-only builds (because the license failure channel was not written in that control branch). - the error/warning messages would be interleaved on the same line as the input line (missing newline at start of message). - the test code would fail when the license server is not available. - the set up of the example database and workload would be performed asynchronously, with unclear signalling of when the user can expect to use them interactively. After this patch: - it's possible to override the license acquisition URL with COCKROACH_DEMO_LICENSE_URL, this is used in tests. - setting up the example database, partitioning and workload is done before presenting the interactive prompt. - partitioning the example database, if requested by --geo-partitioned-replicas, waits for license acquisition to complete (license acquisition remains asynchronous otherwise). - impossible configurations are reported early(earlier). For example: - OSS-only builds: ``` kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas * * ERROR: enterprise features are required for this demo, cannot run from OSS-only binary * Failed running "demo" ``` For license acquisition failures: ``` kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas error while contacting licensing server: Get https://192.168.2.170/api/license?clusterid=5548b310-14b7-46de-8c92-30605bfe95c4&kind=demo&version=v19.2: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) * * ERROR: license acquisition was unsuccessful. * Note: enterprise features are needed for --geo-partitioned-replicas. * Error: unable to acquire a license for this demo Failed running "demo" ``` Additionally, this change fixes test flakiness that arises from an unavailable license server. Release note (cli change): To enable uses of `cockroach demo` with enterprise features in firewalled network environments, it is now possible to redirect the license acquisition with the environment variable COCKROACH_DEMO_LICENSE_URL to a replacement server (for example a suitably configured HTTP proxy).
8702a72
to
66b9550
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @knz, and @rohany)
pkg/cli/demo.go, line 242 at r1 (raw file):
Previously, knz (kena) wrote…
I tried this and it's not much clearer. a
chan err
does not help, we'd still need to distinguishsuccess && err == nil
,!success && err == nil
anderr != nil
and that means a 3-valued thing (or bool + error, a struct).Or maybe I was not able to figure out what you mean? If you want to guide my hand, I'll take it.
I see. But since we are exiting the program here already, is just sending over the value of success on the channel all we need? If license acquisition failed without a "real" error, we would just want to notify the geoPartitionedReplicas code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)
pkg/cli/demo.go, line 242 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I see. But since we are exiting the program here already, is just sending over the value of success on the channel all we need? If license acquisition failed without a "real" error, we would just want to notify the geoPartitionedReplicas code.
Ok here's a specific UX question for you.
As it stands now, the code will either
- print a warning indicating the license could not be acquired, or
- print an error and exit
How do you suggest to modify the code so that the warning does not get printed, in the case where an error results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @knz, and @rohany)
pkg/cli/demo.go, line 242 at r1 (raw file):
Previously, knz (kena) wrote…
Ok here's a specific UX question for you.
As it stands now, the code will either
- print a warning indicating the license could not be acquired, or
- print an error and exit
How do you suggest to modify the code so that the warning does not get printed, in the case where an error results?
I guess to choose between that you need to check geoPartitionedReplicas anyway, so might as well put that check here.
Thanks for your time and sorry about the abundance of backs-and-forths on this. bors r=rohany |
40997: roachtest: deflake bank/{node-restart,cluster-recovery} r=irfansharif a=irfansharif Fixes #38785. Fixes #35326. Because everything roachprod does, it does through SSH, we're particularly susceptible to network delays, packet drops, etc. We've seen this before, or at least pointed to this being a problem before, over at #37001. Setting timeouts around our calls to roachprod helps to better surface these kind of errors. The underlying issue in #38785 and in #35326 is the fact that we're running roachprod commands that may (reasonably) fail due to connection issues, and we're unable to retry them safely (the underlying commands are non-idempotent). Presently we simply fail the entire test, when really we should be able to retry the commands. This is left unaddressed. Release justification: Category 1: Non-production code changes Release note: None 41029: cli: fix the demo licensing code r=rohany a=knz Fixes #40734. Fixes #41024. Release justification: fixes a flaky test, fixes UX of main new feature Before this patch, there were multiple problems with the code: - if the license acquisition was disabled by the env var config, the error message would not be clear. - the licensing code would deadlock silently on OSS-only builds (because the license failure channel was not written in that control branch). - the error/warning messages would be interleaved on the same line as the input line (missing newline at start of message). - the test code would fail when the license server is not available. - the set up of the example database and workload would be performed asynchronously, with unclear signalling of when the user can expect to use them interactively. After this patch: - it's possible to override the license acquisition URL with COCKROACH_DEMO_LICENSE_URL, this is used in tests. - setting up the example database, partitioning and workload is done before presenting the interactive prompt. - partitioning the example database, if requested by --geo-partitioned-replicas, waits for license acquisition to complete (license acquisition remains asynchronous otherwise). - impossible configurations are reported early(earlier). For example: - OSS-only builds: ``` kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas * * ERROR: enterprise features are required for this demo, cannot run from OSS-only binary * Failed running "demo" ``` For license acquisition failures: ``` kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas error while contacting licensing server: Get https://192.168.2.170/api/license?clusterid=5548b310-14b7-46de-8c92-30605bfe95c4&kind=demo&version=v19.2: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) * * ERROR: license acquisition was unsuccessful. * Note: enterprise features are needed for --geo-partitioned-replicas. * Failed running "demo" ``` Additionally, this change fixes test flakiness that arises from an unavailable license server. Release note (cli change): To enable uses of `cockroach demo` with enterprise features in firewalled network environments, it is now possible to redirect the license acquisition with the environment variable COCKROACH_DEMO_LICENSE_URL to a replacement server (for example a suitably configured HTTP proxy). Co-authored-by: irfan sharif <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Build succeeded |
Fixes #40734.
Fixes #41024.
Release justification: fixes a flaky test, fixes UX of main new feature
Before this patch, there were multiple problems with the code:
the error message would not be clear.
builds (because the license failure channel was not written in that
control branch).
the input line (missing newline at start of message).
asynchronously, with unclear signalling of when the user
can expect to use them interactively.
After this patch:
COCKROACH_DEMO_LICENSE_URL, this is used in tests.
before presenting the interactive prompt.
--geo-partitioned-replicas, waits for license acquisition to
complete (license acquisition remains asynchronous otherwise).
For example:
For license acquisition failures:
Additionally, this change fixes test flakiness that arises from an
unavailable license server.
Release note (cli change): To enable uses of
cockroach demo
withenterprise features in firewalled network environments, it is now
possible to redirect the license acquisition with the environment
variable COCKROACH_DEMO_LICENSE_URL to a replacement server (for
example a suitably configured HTTP proxy).