Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: fix the demo licensing code #41029

Merged
merged 1 commit into from
Sep 24, 2019
Merged

Conversation

knz
Copy link
Contributor

@knz knz commented Sep 24, 2019

Fixes #40734.
Fixes #41024.

Release justification: fixes a flaky test, fixes UX of main new feature

Before this patch, there were multiple problems with the code:

  • if the license acquisition was disabled by the env var config,
    the error message would not be clear.
  • the licensing code would deadlock silently on OSS-only
    builds (because the license failure channel was not written in that
    control branch).
  • the error/warning messages would be interleaved on the same line as
    the input line (missing newline at start of message).
  • the test code would fail when the license server is not available.
  • the set up of the example database and workload would be performed
    asynchronously, with unclear signalling of when the user
    can expect to use them interactively.

After this patch:

  • it's possible to override the license acquisition URL with
    COCKROACH_DEMO_LICENSE_URL, this is used in tests.
  • setting up the example database, partitioning and workload is done
    before presenting the interactive prompt.
  • partitioning the example database, if requested by
    --geo-partitioned-replicas, waits for license acquisition to
    complete (license acquisition remains asynchronous otherwise).
  • impossible configurations are reported early(earlier).

For example:

  • OSS-only builds:
kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas
*
* ERROR: enterprise features are required for this demo, cannot run from OSS-only binary
*
Failed running "demo"

For license acquisition failures:

kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas
error while contacting licensing server:
Get https://192.168.2.170/api/license?clusterid=5548b310-14b7-46de-8c92-30605bfe95c4&kind=demo&version=v19.2: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
*
* ERROR: license acquisition was unsuccessful.
* Note: enterprise features are needed for --geo-partitioned-replicas.
*
Failed running "demo"

Additionally, this change fixes test flakiness that arises from an
unavailable license server.

Release note (cli change): To enable uses of cockroach demo with
enterprise features in firewalled network environments, it is now
possible to redirect the license acquisition with the environment
variable COCKROACH_DEMO_LICENSE_URL to a replacement server (for
example a suitably configured HTTP proxy).

@knz knz requested review from jordanlewis and rohany September 24, 2019 11:57
@knz knz requested a review from a team as a code owner September 24, 2019 11:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the 20190924-demo-lic-error branch 2 times, most recently from 5eec2ce to be86dc5 Compare September 24, 2019 12:50
@knz knz force-pushed the 20190924-demo-lic-error branch 2 times, most recently from 21af54a to 8702a72 Compare September 24, 2019 14:06
Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good restructure of the demo code -- the setupTransientServers function was for sure getting too large. Making the workload/partitioning setup not asynchronous makes things conceptually simpler, but I remember doing it at your request

Reviewed 2 of 6 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @knz, and @rohany)


pkg/cli/demo.go, line 242 at r1 (raw file):

			}
			if !success {
				if demoCtx.geoPartitionedReplicas {

I think it would be clearer to make licenseDone a chan err, and send an error over to the part of the code that is doing things related to geoPartitionedReplicas, so that all that code is in one place.


pkg/cli/demo.go, line 274 at r1 (raw file):

		}

		// If we are requested to prepartition our data spawn a goroutine to do the partitioning.

This comment should be updated now


pkg/cli/demo.go, line 288 at r1 (raw file):

			defer db.Close()
			// Based on validation done in setup, we know that this workload has a partitioning step.
			if err := gen.(workload.Hookser).Hooks().Partition(db); err != nil {

i wonder if this will take a long time for the shell to startup under the --global flag


pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 21 at r1 (raw file):

send "SELECT count(*) AS NRPARTS FROM \[SHOW PARTITIONS FROM DATABASE movr\];\r"
eexpect nrparts

why doesn't this need to be in quotes?

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the workload/partitioning setup not asynchronous makes things conceptually simpler, but I remember doing it at your request

Yes, I vaguely remember this, but for having tried to debug / fix #40734 and #41024 with the async behavior, I realized that this much asynchrony makes things difficult and also not great UX. So I changed my mind.

That said, I'm still convinced license acquisition should be async in the common case. I'm only fine with the reference workload for global tests to make the initialization wait.

Also as you've seen I've added a "please wait" in there.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/cli/demo.go, line 242 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I think it would be clearer to make licenseDone a chan err, and send an error over to the part of the code that is doing things related to geoPartitionedReplicas, so that all that code is in one place.

I tried this and it's not much clearer. a chan err does not help, we'd still need to distinguish success && err == nil, !success && err == nil and err != nil and that means a 3-valued thing (or bool + error, a struct).

Or maybe I was not able to figure out what you mean? If you want to guide my hand, I'll take it.


pkg/cli/demo.go, line 274 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

This comment should be updated now

Done.


pkg/cli/demo.go, line 288 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

i wonder if this will take a long time for the shell to startup under the --global flag

Yes it can but it's a feature: we don't want to present a prompt to a user if there's a chance that the first SQL query they're going to type (or copy-paste into it) will fail, just because it's racing with the initialization.


pkg/cli/interactive_tests/test_demo_partitioning.tcl, line 21 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

why doesn't this need to be in quotes?

Changed for clarity.

(tip: TCL has string rules similar to the unix shell, so technically the quotes are not neede here)

Release justification: fixes a flaky test, fixes UX of main new feature

Before this patch, there were multiple problems with the code:

- if the license acquisition was disabled by the env var config,
  the error message would not be clear.
- the licensing code would deadlock silently on OSS-only
  builds (because the license failure channel was not written in that
  control branch).
- the error/warning messages would be interleaved on the same line as
  the input line (missing newline at start of message).
- the test code would fail when the license server is not available.
- the set up of the example database and workload would be performed
  asynchronously, with unclear signalling of when the user
  can expect to use them interactively.

After this patch:
- it's possible to override the license acquisition URL with
  COCKROACH_DEMO_LICENSE_URL, this is used in tests.
- setting up the example database, partitioning and workload is done
  before presenting the interactive prompt.
- partitioning the example database, if requested by
  --geo-partitioned-replicas, waits for license acquisition to
  complete (license acquisition remains asynchronous otherwise).
- impossible configurations are reported early(earlier).

For example:

- OSS-only builds:

```
kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas
*
* ERROR: enterprise features are required for this demo, cannot run from OSS-only binary
*
Failed running "demo"
```

For license acquisition failures:

```
kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas
error while contacting licensing server:
Get https://192.168.2.170/api/license?clusterid=5548b310-14b7-46de-8c92-30605bfe95c4&kind=demo&version=v19.2: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
*
* ERROR: license acquisition was unsuccessful.
* Note: enterprise features are needed for --geo-partitioned-replicas.
*
Error: unable to acquire a license for this demo
Failed running "demo"
```

Additionally, this change fixes test flakiness that arises from an
unavailable license server.

Release note (cli change): To enable uses of `cockroach demo` with
enterprise features in firewalled network environments, it is now
possible to redirect the license acquisition with the environment
variable COCKROACH_DEMO_LICENSE_URL to a replacement server (for
example a suitably configured HTTP proxy).
@knz knz force-pushed the 20190924-demo-lic-error branch from 8702a72 to 66b9550 Compare September 24, 2019 15:34
Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @knz, and @rohany)


pkg/cli/demo.go, line 242 at r1 (raw file):

Previously, knz (kena) wrote…

I tried this and it's not much clearer. a chan err does not help, we'd still need to distinguish success && err == nil, !success && err == nil and err != nil and that means a 3-valued thing (or bool + error, a struct).

Or maybe I was not able to figure out what you mean? If you want to guide my hand, I'll take it.

I see. But since we are exiting the program here already, is just sending over the value of success on the channel all we need? If license acquisition failed without a "real" error, we would just want to notify the geoPartitionedReplicas code.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/cli/demo.go, line 242 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I see. But since we are exiting the program here already, is just sending over the value of success on the channel all we need? If license acquisition failed without a "real" error, we would just want to notify the geoPartitionedReplicas code.

Ok here's a specific UX question for you.

As it stands now, the code will either

  • print a warning indicating the license could not be acquired, or
  • print an error and exit

How do you suggest to modify the code so that the warning does not get printed, in the case where an error results?

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @knz, and @rohany)


pkg/cli/demo.go, line 242 at r1 (raw file):

Previously, knz (kena) wrote…

Ok here's a specific UX question for you.

As it stands now, the code will either

  • print a warning indicating the license could not be acquired, or
  • print an error and exit

How do you suggest to modify the code so that the warning does not get printed, in the case where an error results?

I guess to choose between that you need to check geoPartitionedReplicas anyway, so might as well put that check here.

@knz
Copy link
Contributor Author

knz commented Sep 24, 2019

Thanks for your time and sorry about the abundance of backs-and-forths on this.

bors r=rohany

craig bot pushed a commit that referenced this pull request Sep 24, 2019
40997: roachtest: deflake bank/{node-restart,cluster-recovery} r=irfansharif a=irfansharif

Fixes #38785.
Fixes #35326.

Because everything roachprod does, it does through SSH, we're
particularly susceptible to network delays, packet drops, etc. We've
seen this before, or at least pointed to this being a problem before,
over at #37001. Setting timeouts around our calls to roachprod helps to
better surface these kind of errors.

The underlying issue in #38785 and in #35326 is the fact that we're
running roachprod commands that may (reasonably) fail due to connection
issues, and we're unable to retry them safely (the underlying commands
are non-idempotent). Presently we simply fail the entire test, when
really we should be able to retry the commands. This is left unaddressed.

Release justification: Category 1: Non-production code changes
Release note: None

41029: cli: fix the demo licensing code r=rohany a=knz

Fixes #40734.
Fixes #41024.

Release justification: fixes a flaky test, fixes UX of main new feature

Before this patch, there were multiple problems with the code:

- if the license acquisition was disabled by the env var config,
  the error message would not be clear.
- the licensing code would deadlock silently on OSS-only
  builds (because the license failure channel was not written in that
  control branch).
- the error/warning messages would be interleaved on the same line as
  the input line (missing newline at start of message).
- the test code would fail when the license server is not available.
- the set up of the example database and workload would be performed
  asynchronously, with unclear signalling of when the user
  can expect to use them interactively.

After this patch:
- it's possible to override the license acquisition URL with
  COCKROACH_DEMO_LICENSE_URL, this is used in tests.
- setting up the example database, partitioning and workload is done
  before presenting the interactive prompt.
- partitioning the example database, if requested by
  --geo-partitioned-replicas, waits for license acquisition to
  complete (license acquisition remains asynchronous otherwise).
- impossible configurations are reported early(earlier).

For example:

- OSS-only builds:

```
kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas
*
* ERROR: enterprise features are required for this demo, cannot run from OSS-only binary
*
Failed running "demo"
```

For license acquisition failures:

```
kena@kenax ~/cockroach % ./cockroach demo --geo-partitioned-replicas
error while contacting licensing server:
Get https://192.168.2.170/api/license?clusterid=5548b310-14b7-46de-8c92-30605bfe95c4&kind=demo&version=v19.2: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
*
* ERROR: license acquisition was unsuccessful.
* Note: enterprise features are needed for --geo-partitioned-replicas.
*
Failed running "demo"
```

Additionally, this change fixes test flakiness that arises from an
unavailable license server.

Release note (cli change): To enable uses of `cockroach demo` with
enterprise features in firewalled network environments, it is now
possible to redirect the license acquisition with the environment
variable COCKROACH_DEMO_LICENSE_URL to a replacement server (for
example a suitably configured HTTP proxy).

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig
Copy link
Contributor

craig bot commented Sep 24, 2019

Build succeeded

@craig craig bot merged commit 66b9550 into cockroachdb:master Sep 24, 2019
@knz knz deleted the 20190924-demo-lic-error branch September 25, 2019 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants