Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new Istio routerule causes small window of 503s #348

Closed
loewenstein opened this issue Mar 13, 2018 · 12 comments
Closed

Adding new Istio routerule causes small window of 503s #348

loewenstein opened this issue Mar 13, 2018 · 12 comments
Assignees
Labels
area/networking kind/bug Categorizes issue or PR as related to a bug. tracking This label is applied to Knative issues that track issues external to Knative (e.g. Istio, K8s)

Comments

@loewenstein
Copy link
Contributor

loewenstein commented Mar 13, 2018

Steps:

  1. bazel run sample/helloworld:everything.create
  2. while true; do curl -is --header 'Host:demo.myhost.net' -w "%{http_code}\n" -o /dev/null http://${SERVICE_IP}; done | grep 503
  3. bazel run sample/helloworld:updated_everything.apply

Expected:
App stays available, i.e. command from 3. stays quiet

Actual:
App is partially unavailable.

We should add a readiness check to the configuration.

@mattmoor mattmoor changed the title helloworld sample unavailable during roolout of updated_configuration.yaml helloworld sample unavailable during rollout of updated_configuration.yaml Mar 13, 2018
@mattmoor mattmoor added the kind/doc Something isn't clear label Mar 14, 2018
@loewenstein
Copy link
Contributor Author

I improved the monitoring of status changes with

#!/usr/bin/env bash

LAST_STATUS="Unknown"
CURRENT_STATUS="Unknown"

while true
do
  CURRENT_STATUS=$(curl -is --header 'Host:demo.myhost.net' -w "%{http_code}\n" -o /dev/null  http://${SERVICE_IP})
  if [ $CURRENT_STATUS != $LAST_STATUS ]; then
    echo "$(date): ${LAST_STATUS} -> ${CURRENT_STATUS}"
  fi
  LAST_STATUS=$CURRENT_STATUS
done

I added

readynessProbe:
          httpGet:
            path: /
            port: 8080
          initialDelaySeconds: 3
          periodSeconds: 3

to both configuration.yaml and updated_configuration.yaml.

However, I continue to see transitions to HTTP 503 when I switch back and forth between bazel run sample/helloworld:everything.apply and bazel run sample/helloworld:updated_everything.apply.

$ sample/helloworld/status.sh
Fri Mar 16 11:25:47 CET 2018: Unknown -> 404
Fri Mar 16 11:27:56 CET 2018: 404 -> 503
Fri Mar 16 11:28:19 CET 2018: 503 -> 200
Fri Mar 16 11:31:25 CET 2018: 200 -> 503
Fri Mar 16 11:31:25 CET 2018: 503 -> 200
Fri Mar 16 11:32:55 CET 2018: 200 -> 503
Fri Mar 16 11:32:56 CET 2018: 503 -> 200
Fri Mar 16 11:33:18 CET 2018: 200 -> 503
Fri Mar 16 11:34:55 CET 2018: 503 -> 200
Fri Mar 16 12:29:26 CET 2018: 200 -> 503
Fri Mar 16 12:29:26 CET 2018: 503 -> 200
Fri Mar 16 12:29:48 CET 2018: 200 -> 503
Fri Mar 16 12:29:50 CET 2018: 503 -> 200
Fri Mar 16 12:30:42 CET 2018: 200 -> 503
Fri Mar 16 12:30:43 CET 2018: 503 -> 200

@mattmoor Is there anything I miss? Otherwise I suppose there is more than a missing readiness probe at work. I'll dig a bit in the code if I can figure out what might be going on.

@mattmoor
Copy link
Member

That's be great.

Adding @vaikas-google @ian-mi in case they have any ideas what might be causing this.

@loewenstein
Copy link
Contributor Author

Could this simple be an Istio bug, i.e. https://docs.google.com/document/d/1UXJIqfUGkZlw1aZYSkQDYkq1TnYbOaEkohtoTS2mz1g/edit#heading=h.x9snb54sjlu9

In any case I will submit a PR adding readiness probes.

@mattmoor
Copy link
Member

I think that's most likely

@mattmoor
Copy link
Member

Looks like readYnessProbe was a typo (should be: readinessProbe). I started to look into stricter JSON validation yesterday, but Go finally got around to adding an option to disallow unknown fields to encoding/json in 1.10, which K8s isn't yet on (there are issues).

bobcatfish added a commit that referenced this issue Mar 22, 2018
Make these tweaks to the conformance tests:
* Move namespace creation to test fixture setup (it's super slow and it
  was leaving the namespace around as a side effect of the test)
* Support the case where a cluster is configured to use a resolvable
  domain, in which case we don't need to get the IP of the ingress and
  spoof the host (see #443 re. adding docs for how to set this up in a
  cluster)
* Remove readiness probe from container; this wasn't accomplishing what
  it was intended to b/c of #348 anyway

Also tried to make the conformance test docs more succinct.
@mdemirhan mdemirhan self-assigned this Mar 28, 2018
@mattmoor
Copy link
Member

/assign @tcnghia

@tcnghia
Copy link
Contributor

tcnghia commented Apr 18, 2018

This is checked in envoyproxy/envoy#2774 to fix envoyproxy/envoy#1930

@mdemirhan mdemirhan removed their assignment Apr 23, 2018
@tcnghia
Copy link
Contributor

tcnghia commented Apr 24, 2018

Even if when we change the helloworld example and only route traffic after the new revision is already created for a while, we are still hitting the same 503s.

@tcnghia
Copy link
Contributor

tcnghia commented Apr 25, 2018

I tested with Istio 7.0.1 which has envoyproxy/envoy#2774 but the issue isn't fixed. I've sent the relevant log files to Istio team for investigation.

@tcnghia tcnghia added kind/bug Categorizes issue or PR as related to a bug. tracking This label is applied to Knative issues that track issues external to Knative (e.g. Istio, K8s) and removed kind/doc Something isn't clear labels Apr 25, 2018
@tcnghia
Copy link
Contributor

tcnghia commented Apr 25, 2018

I opened istio/istio#5204 , since issue doesn't seem to be fixed in 0.7.1.

@tcnghia tcnghia changed the title helloworld sample unavailable during rollout of updated_configuration.yaml Adding new Istio routerule causes small window of 503s May 8, 2018
@tcnghia
Copy link
Contributor

tcnghia commented May 8, 2018

Reopen to track Istio issue.

@tcnghia tcnghia reopened this May 8, 2018
@josephburnett josephburnett added this to the Serving Sprint #1 milestone Jun 27, 2018
@mdemirhan
Copy link
Contributor

This is now fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/bug Categorizes issue or PR as related to a bug. tracking This label is applied to Knative issues that track issues external to Knative (e.g. Istio, K8s)
Projects
None yet
Development

No branches or pull requests

5 participants