Contour dont start, started to fail after automatic restart #959

vongohren · 2019-03-24T12:42:56Z

What steps did you take and what happened:
I did nothing, it just started to fail on prod. It happened after an automatic restart.
Dev was working fine, but then I restarted that as well, then it started to fail with the same problems. So I have not done anything for this to happen

What did you expect to happen:
I exepcted the clusters to run and the contour container to work smoothly.

Anything else you would like to add:
This is the logs i get

 time="2019-03-24T12:17:31Z" level=info msg="args: [serve --incluster]"
 E 
 time="2019-03-24T12:17:31Z" level=info msg=started context=grpc
 E 
 time="2019-03-24T12:17:31Z" level=info msg="waiting for cache sync" context=coreinformers
 E 
 time="2019-03-24T12:17:31Z" level=info msg=started context=coreinformers
 E 
 time="2019-03-24T12:17:31Z" level=info msg="waiting for cache sync" context=contourinformers
 E 
 time="2019-03-24T12:17:31Z" level=info msg=started context=contourinformers
 E 
 time="2019-03-24T12:17:31Z" level=info msg=started address="127.0.0.1:6060" context=debugsvc
 E 
 time="2019-03-24T12:17:31Z" level=info msg=started address="0.0.0.0:8000" context=metricsvc
 E 
 time="2019-03-24T12:17:31Z" level=info msg="forcing update" context=HoldoffNotifier last update=2562047h47m16.854775807s
 E 
 pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:95: Failed to list *v1beta1.TLSCertificateDelegation: tlscertificatedelegations.contour.heptio.com is forbidden: User "system:serviceaccount:heptio-contour:contour" cannot list tlscertificatedelegations.contour.heptio.com at the cluster scope E 
 pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:95: Failed to list *v1beta1.TLSCertificateDelegation: tlscertificatedelegations.contour.heptio.com is forbidden: User "system:serviceaccount:heptio-contour:contour" cannot list tlscertificatedelegations.contour.heptio.com at the cluster scope E 
 log: exiting because of error: log: cannot create log: open /tmp/contour.contour-74cc4d58c6-gt9f8.unknownuser.log.ERROR.20190324-121731.1: no such file or directory
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:183] initializing epoch 0 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312)
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:185] statically linked extensions:
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:187]   access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:190]   filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:193]   filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:196]   filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:198]   stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.statsd
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:200]   tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:203]   transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls
 E 
 [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:206]   transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls
 E 
 [2019-03-24 12:17:31.900][1][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s)
 E 
 [2019-03-24 12:17:31.904][1][critical][main] source/server/server.cc:78] error initializing configuration '/config/contour.yaml': logical_dns clusters must have a single host
 E 
 [2019-03-24 12:17:31.904][1][info][main] source/server/server.cc:437] exiting
 E

Environment:

Contour version:
contour: gcr.io/heptio-images/contour:master
Kubernetes version: (use kubectl version):
not sure, its scripted
Kubernetes installer & version:
not sure, its scripted
Cloud provider or hardware configuration:
GCloud
Machine type
n1-standard-1 (1 vCPU, 3.75 GB memory)
Total cores
3 vCPUs
Total memory
11.25 GB
OS (e.g. from /etc/os-release):
Docker, linux

The text was updated successfully, but these errors were encountered:

vongohren · 2019-03-24T13:40:39Z

This was solved by setting the container to a stable version, but this might be an important bug for the master branch, that should be fixed before next release

davecheney · 2019-03-24T15:57:45Z

Thank you for your bug report. This error is often caused when switching between contour tags without applying the updated RBAC and CRD definitions supplied in the deployment/ directory. As we add new CRD types to contour, the RBAC permissions in your cluster need to be adjusted accordingly.

…

On Sun, 24 Mar 2019 at 13:42, Snorre Lothar von Gohren Edwin < ***@***.***> wrote: *What steps did you take and what happened:* I did nothing, it just started to fail on prod. It happened after an automatic restart. Dev was working fine, but then I restarted that as well, then it started to fail with the same problems. So I have not done anything for this to happen *What did you expect to happen:* I exepcted the clusters to run and the contour container to work smoothly. *Anything else you would like to add:* This is the logs i get time="2019-03-24T12:17:31Z" level=info msg="args: [serve --incluster]" E time="2019-03-24T12:17:31Z" level=info msg=started context=grpc E time="2019-03-24T12:17:31Z" level=info msg="waiting for cache sync" context=coreinformers E time="2019-03-24T12:17:31Z" level=info msg=started context=coreinformers E time="2019-03-24T12:17:31Z" level=info msg="waiting for cache sync" context=contourinformers E time="2019-03-24T12:17:31Z" level=info msg=started context=contourinformers E time="2019-03-24T12:17:31Z" level=info msg=started address="127.0.0.1:6060" context=debugsvc E time="2019-03-24T12:17:31Z" level=info msg=started address="0.0.0.0:8000" context=metricsvc E time="2019-03-24T12:17:31Z" level=info msg="forcing update" context=HoldoffNotifier last update=2562047h47m16.854775807s E ***@***.***/tools/cache/reflector.go:95: Failed to list *v1beta1.TLSCertificateDelegation: tlscertificatedelegations.contour.heptio.com is forbidden: User "system:serviceaccount:heptio-contour:contour" cannot list tlscertificatedelegations.contour.heptio.com at the cluster scope E ***@***.***/tools/cache/reflector.go:95: Failed to list *v1beta1.TLSCertificateDelegation: tlscertificatedelegations.contour.heptio.com is forbidden: User "system:serviceaccount:heptio-contour:contour" cannot list tlscertificatedelegations.contour.heptio.com at the cluster scope E log: exiting because of error: log: cannot create log: open /tmp/contour.contour-74cc4d58c6-gt9f8.unknownuser.log.ERROR.20190324-121731.1: no such file or directory E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:183] initializing epoch 0 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312) E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:185] statically linked extensions: E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:187] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:190] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:193] filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:196] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:198] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.statsd E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:200] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:203] transport_sockets.downstream: envoy.transport_sockets.capture,raw_buffer,tls E [2019-03-24 12:17:31.892][1][info][main] source/server/server.cc:206] transport_sockets.upstream: envoy.transport_sockets.capture,raw_buffer,tls E [2019-03-24 12:17:31.900][1][info][config] source/server/configuration_impl.cc:50] loading 0 static secret(s) E [2019-03-24 12:17:31.904][1][critical][main] source/server/server.cc:78] error initializing configuration '/config/contour.yaml': logical_dns clusters must have a single host E [2019-03-24 12:17:31.904][1][info][main] source/server/server.cc:437] exiting E *Environment:* - Contour version: contour: gcr.io/heptio-images/contour:master - Kubernetes version: (use kubectl version): not sure, its scripted - Kubernetes installer & version: not sure, its scripted - Cloud provider or hardware configuration: GCloud Machine type n1-standard-1 (1 vCPU, 3.75 GB memory) Total cores 3 vCPUs Total memory 11.25 GB - OS (e.g. from /etc/os-release): Docker, linux — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#959>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAcA_dfHOzvPtmKg4HKkonrPDtc5evGks5vZ3LQgaJpZM4cFewZ> .

davecheney · 2019-03-29T09:03:51Z

@vongohren did my previous comment prove useful. Can you please update the issue if you were unable to resolve the problem.

vongohren · 2019-03-29T10:07:50Z

@davecheney hi sorry for the lack of feedback!
But acctually just switching to a stable version fixed it for me, as i mentioned in my follow up comment #959 (comment)
No troubles after updating to 0.10

So I have not traversed down your suggestion quite yet.

Related to projectcontour#959 Client-go uses glog even though Contour by itself doesn't. In the failure scenario, such as CRD types not registered, glog in client-go attempts to log to files under /tmp, which may not even exist (e.g in `scratch` Docker image) or not accessible (container started with non-root user etc.). And when that happens it'll crash the whole Contour process which should not happen. This change overrides the glog flag to let it always dumping to stderr, avoid logging to files which is not desired in container environment Signed-off-by: Qiu Yu <[email protected]>

unicell · 2019-04-09T23:04:15Z

Although the original issue reported was caused by CRD mismatch between what's been registered and what could be recognized by Contour, Contour, however, shouldn't crash when that happens. Filed a PR above to fix the crash issue.

davecheney · 2019-06-18T09:09:33Z

I’m going to close this now that #1004 has landed.

vongohren changed the title ~~Contour have started to fail after a restart~~ Contour dont start, started to fail after automatic restart Mar 24, 2019

davecheney added the blocked/needs-info Categorizes the issue or PR as blocked because there is insufficient information to advance it. label Mar 29, 2019

unicell mentioned this issue Apr 9, 2019

Always let glog write to stderr not files #1004

Merged

davecheney closed this as completed Jun 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contour dont start, started to fail after automatic restart #959

Contour dont start, started to fail after automatic restart #959

vongohren commented Mar 24, 2019

vongohren commented Mar 24, 2019

davecheney commented Mar 24, 2019 via email

davecheney commented Mar 29, 2019

vongohren commented Mar 29, 2019

unicell commented Apr 9, 2019

davecheney commented Jun 18, 2019

Contour dont start, started to fail after automatic restart #959

Contour dont start, started to fail after automatic restart #959

Comments

vongohren commented Mar 24, 2019

vongohren commented Mar 24, 2019

davecheney commented Mar 24, 2019 via email

davecheney commented Mar 29, 2019

vongohren commented Mar 29, 2019

unicell commented Apr 9, 2019

davecheney commented Jun 18, 2019