Race condition when executing continual tests with upgrade framework #2026

mgencur · 2021-02-17T07:26:26Z

Expected Behavior

Tests executed via the upgrade framework in https://github.com/knative/pkg/tree/master/test/upgrade properly run the "continual" tests which run in background during an upgrade.

Actual Behavior

The test framework occasionally detects a race condition when tests are executed with -race flag. The example error can be seen here: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift-knative_serverless-operator/804/pull-ci-openshift-knative-serverless-operator-main-4.6-upgrade-tests-aws-ocp-46/1361689216857149440
The stacktrace looks like this:

2021-02-16T15:47:08.236Z	INFO	upgrade/steps.go:57	3) 🔄 Starting continual tests. 3 tests are registered.
2021-02-16T15:47:08.236Z	INFO	upgrade/steps.go:60	3.1) Starting continual tests of "ProbeTest".
    testing.go:954: race detected during execution of test
2021-02-16T15:47:20.489Z	ERROR	upgrade/functions.go:41	💣🤬💔️ Upgrade suite have failed!
knative.dev/pkg/test/upgrade.(*Suite).Execute
	/go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/pkg/test/upgrade/functions.go:41
github.com/openshift-knative/serverless-operator/test/upgrade.TestServerlessUpgrade
	/go/src/github.com/openshift-knative/serverless-operator/test/upgrade/upgrade_test.go:50
testing.tRunner
	/usr/local/go/src/testing/testing.go:1039

Steps to Reproduce the Problem

Define multiple continual tests as defined below
Execute tests with -race flag

suite := pkgupgrade.Suite{
	Tests: pkgupgrade.Tests{
		PreUpgrade:    preUpgradeTests(),
		PostUpgrade:   postUpgradeTests(),
		Continual:     []pkgupgrade.BackgroundOperation{
			servingupgrade.ProbeTest(),
			servingupgrade.AutoscaleSustainingTest(),
			eventingupgrade.ContinualTest(),
		},
	},
  ...
}

Additional Info

The text was updated successfully, but these errors were encountered:

Fixes knative/pkg#2026 The actual issue is that the test context expires between individual stages run by the upgrade framework. This fix passes an external logger that survives the stages.

mgencur · 2021-02-19T12:10:58Z

More details about the root cause from verbose logs:

=== RUN   TestServerlessUpgrade/ContinualTests/SetupProbeTest
    service.go:98: Creating a new Service service upgrade-probe
    crd.go:36:  resource {<nil> <nil> <*>{&TypeMeta{Kind:,APIVersion:,} &ObjectMeta{Name:upgrade-probe,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ClusterName:,ManagedFields:[]ManagedFieldsEntry{},} {{&ObjectMeta{Name:,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ClusterName:,ManagedFields:[]ManagedFieldsEntry{},}} {<nil>}} {{0 <nil> <nil>} { } {<nil> <nil> <nil>}}} <nil>}
    service.go:116: Waiting for Service to transition to Ready. service upgrade-probe
    service.go:121: Checking to ensure Service Status is populated for Ready service
    service.go:127: Getting latest objects Created by Service
    service.go:130: Successfully created Service upgrade-probe
    spoof.go:103: Spoofing upgrade-probe-serving-tests.apps.ci-op-k6xfg5sw-113e4.origin-ci-int-aws.dev.rhcloud.com -> upgrade-probe-serving-tests.apps.ci-op-k6xfg5sw-113e4.origin-ci-int-aws.dev.rhcloud.com
    prober.go:123: Starting Route prober for http://upgrade-probe-serving-tests.apps.ci-op-k6xfg5sw-113e4.origin-ci-int-aws.dev.rhcloud.com.
==================
WARNING: DATA RACE
Read at 0x00c000edc3a3 by goroutine 97:
  testing.(*common).logDepth()
      /usr/local/go/src/testing/testing.go:710 +0xa7
  testing.(*common).log()
      /usr/local/go/src/testing/testing.go:703 +0x8f
  testing.(*common).Logf()
      /usr/local/go/src/testing/testing.go:749 +0x21
  testing.(*common).Logf-fm()
      /usr/local/go/src/testing/testing.go:749 +0x73
  knative.dev/pkg/test/spoof.New()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/pkg/test/spoof/spoof.go:103 +0x206
  knative.dev/pkg/test.NewSpoofingClient()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/pkg/test/clients.go:47 +0x1df
  knative.dev/serving/test.(*manager).Spawn.func1()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/serving/test/prober.go:142 +0x21b
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/golang.org/x/sync/errgroup/errgroup.go:57 +0x85
Previous write at 0x00c000edc3a3 by goroutine 115:
  testing.tRunner.func1()
      /usr/local/go/src/testing/testing.go:1025 +0x467
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1043 +0x20d
  knative.dev/pkg/test/upgrade.(*suiteExecution).startContinualTests.func1.1()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/pkg/test/upgrade/steps.go:67 +0x4f
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1039 +0x1eb
Goroutine 97 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/golang.org/x/sync/errgroup/errgroup.go:54 +0x73
  knative.dev/serving/test.(*manager).Spawn()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/serving/test/prober.go:141 +0x69f
  knative.dev/serving/test.RunRouteProber()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/serving/test/prober.go:228 +0x198
  knative.dev/serving/test/upgrade.ProbeTest.func1()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/serving/test/upgrade/probe.go:54 +0x71e
  knative.dev/serving/test/v1.CreateServiceReady()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/serving/test/v1/service.go:99 +0x256
  knative.dev/serving/test/upgrade.ProbeTest.func1()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/serving/test/upgrade/probe.go:45 +0x267
  knative.dev/pkg/test/upgrade.(*suiteExecution).startContinualTests.func1.1()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/pkg/test/upgrade/steps.go:67 +0x4f
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1039 +0x1eb
Goroutine 115 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:1090 +0x700
  knative.dev/pkg/test/upgrade.(*suiteExecution).startContinualTests.func1()
      /go/src/github.com/openshift-knative/serverless-operator/vendor/knative.dev/pkg/test/upgrade/steps.go:66 +0x547
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1039 +0x1eb

Fixes knative/pkg#2026 The actual issue is that the test context expires between individual stages run by the upgrade framework. This fix passes an external logger that survives the stages.

* Fix race condition with Prober logger in upgrade tests (#670) Fixes knative/pkg#2026 The actual issue is that the test context expires between individual stages run by the upgrade framework. This fix passes an external logger that survives the stages. * Only use exec probe at startup time (knative#10741) * Only use exec probe at startup time Now that StartupProbe is available, we can avoid using spawning the exec probe other than at startup time. For requests after startup this directly uses the same endpoint as the exec probe in the QP as the target of a HTTP readiness probe. Following on from this I think we may want to rework quite a bit of how our readiness probe stuff works (e.g. it'd be nice to keep the probes on the user container so failures are on the right object, and we currently ignore probes ~entirely after startup if periodSeconds>0), but this is a minimal change that should be entirely backwards-compatible and saves quite a few cpu cycles. * Use ProgressDeadline as failure timeout for startup probe - Also just drop exec probe entirely for periodSeconds > 1 since these can just use the readiness probe now. (Easier than figuring out how to do square ProgressDeadline with a custom period). * See if flag is what's making upgrades unhappy * reorganize comments * Default PeriodSeconds of the readiness probe to 1 if unset (knative#10992) Co-authored-by: Martin Gencur <[email protected]> Co-authored-by: Julian Friedman <[email protected]>

mgencur mentioned this issue Feb 17, 2021

Do not use -race flag in upgrade tests openshift-knative/serverless-operator#829

Merged

mgencur mentioned this issue Feb 19, 2021

Fix race condition with Prober logger in upgrade tests knative/serving#10815

Merged

mgencur mentioned this issue Feb 19, 2021

Fix race condition with Prober logger in upgrade tests openshift/knative-serving#670

Merged

knative-prow-robot closed this as completed in knative/serving#10815 Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition when executing continual tests with upgrade framework #2026

Race condition when executing continual tests with upgrade framework #2026

mgencur commented Feb 17, 2021 •

edited

Loading

mgencur commented Feb 19, 2021

Race condition when executing continual tests with upgrade framework #2026

Race condition when executing continual tests with upgrade framework #2026

Comments

mgencur commented Feb 17, 2021 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

mgencur commented Feb 19, 2021

mgencur commented Feb 17, 2021 •

edited

Loading