segfault due to opentracing #2222

trnl · 2018-03-20T18:43:28Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):
no

What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):
nginx, opentracing, zipkin, segfault

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

NGINX Ingress controller version: 0.10.2

Kubernetes version (use kubectl version):

> k version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-15T22:14:38Z", GoVersion:"go1.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.12", GitCommit:"3bda299a6414b4866f179921610d6738206a18fe", GitTreeState:"clean", BuildDate:"2017-12-29T08:39:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: 8 CPU, 32GB RAM, VMWare vRealize
OS (e.g. from /etc/os-release):

> cat  /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.3 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.3"

Kernel (e.g. uname -a):

> uname -a
Linux ... 3.10.0-693.11.1.el7.x86_64 #1 SMP Fri Oct 27 05:39:05 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

Install tools:
Docker, Kubernetes
Others:
nginx-ingress resource limits

          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 1000m
            memory: 2Gi

nginx-ingress configmap values

  enable-opentracing: "true"
  zipkin-collector-host: zipkin.tools
  zipkin-collector-port: '80'
  zipkin-service-name: ingress-controller

What happened:
worker process exiting due to segfault

What you expected to happen:
no segfaults, no worker restarts

How to reproduce it (as minimally and precisely as possible):
Happens almost every 2 seconds

{"log":"2018/03/20 18:34:54 [alert] 23#23: worker process 11075 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:54.142872722Z"}
{"log":"2018/03/20 18:34:54 [alert] 23#23: worker process 11109 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:54.517182591Z"}
{"log":"2018/03/20 18:34:57 [alert] 23#23: worker process 11143 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:57.923391335Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11178 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.194986234Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11212 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.557907914Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11246 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.816366263Z"}
{"log":"2018/03/20 18:34:59 [alert] 23#23: worker process 11280 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:59.19434845Z"}
{"log":"2018/03/20 18:35:02 [alert] 23#23: worker process 11314 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:02.943432758Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11349 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.206377447Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11383 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.56327926Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11417 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.81832701Z"}
{"log":"2018/03/20 18:35:08 [alert] 23#23: worker process 11451 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:08.92230984Z"}
{"log":"2018/03/20 18:35:08 [alert] 23#23: worker process 11486 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:08.979320448Z"}
{"log":"2018/03/20 18:35:13 [alert] 23#23: worker process 11520 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:13.231301619Z"}
{"log":"2018/03/20 18:35:13 [alert] 23#23: worker process 11555 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:13.791207004Z"}
{"log":"2018/03/20 18:35:14 [alert] 23#23: worker process 11589 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:14.156415018Z"}
{"log":"2018/03/20 18:35:14 [alert] 23#23: worker process 11623 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:14.5254932Z"}
{"log":"2018/03/20 18:35:18 [alert] 23#23: worker process 11657 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:18.272065288Z"}
{"log":"2018/03/20 18:35:18 [alert] 23#23: worker process 11692 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:18.53148589Z"}
{"log":"2018/03/20 18:35:19 [alert] 23#23: worker process 11726 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:19.517368566Z"}

Anything else we need to know:

> tail -f nginx-ingress-controller-3228907275-p158b_kube-system_nginx-ingress-controller-45af43a95ce0ff5989dfe5783d5457214952edc83dc5de1e6745b436c7c5e23b.log | grep 'opentracing' -C3
2018/03/20 17:51:09 [debug] 23719#23719: *680808 http script set $service_name
2018/03/20 17:51:09 [debug] 23719#23719: *680808 post rewrite phase: 4
2018/03/20 17:51:09 [debug] 23719#23719: *680808 generic phase: 5
2018/03/20 17:51:09 [debug] 23719#23719: *680808 extraced opentracing span context from request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 starting opentracing request span for 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 starting opentracing location span for \"/\"(00007FE651FDDD00) in request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 injecting opentracing span context from request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 adding opentracing header \"x-b3-traceid:09e429b7d83c3090\" in request 00007FE65CCFD050
2018/03/20 17:51:09 [alert] 23#23: worker process 23719 exited on signal 11 (core dumped)
2018/03/20 17:51:09 [debug] 23753#23753: epoll add event: fd:13 op:1 ev:00002001
2018/03/20 17:51:09 [debug] 23753#23753: epoll add event: fd:14 op:1 ev:00002001

After analysis of the coredump it's visible that the issue in the opentracing insert_header block:

#0  0x0000561139f929ec in ngx_list_push ()
#1  0x00007f7a1adbb6ab in ngx_opentracing::inject_span_context(opentracing::v1::Tracer const&, ngx_http_request_s*, opentracing::v1::SpanContext const&) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#2  0x00007f7a1adbcf81 in ngx_opentracing::OpenTracingRequestInstrumentor::OpenTracingRequestInstrumentor(ngx_http_request_s*, ngx_http_core_loc_conf_s*, ngx_opentracing::opentracing_loc_conf_t*) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#3  0x00007f7a1adbdce6 in ngx_opentracing::on_enter_block(ngx_http_request_s*) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#4  0x000056113a097a33 in ngx_http_core_generic_phase ()
#5  0x000056113a093645 in ngx_http_core_run_phases ()
#6  0x000056113a08f03e in ngx_http_process_request ()
#7  0x000056113a09068c in ngx_http_process_request_line ()
#8  0x000056113a0a6be1 in ngx_epoll_process_events.lto_priv ()
#9  0x000056113a0c45df in ngx_process_events_and_timers ()
#10 0x000056113a0a8405 in ngx_worker_process_cycle ()
#11 0x000056113a0be07f in ngx_spawn_process ()
#12 0x000056113a0a9715 in ngx_master_process_cycle ()
#13 0x0000561139f8a654 in main ()

The text was updated successfully, but these errors were encountered:

aledbf · 2018-03-20T18:46:33Z

@rnburn can you help with this?

rnburn · 2018-03-20T18:51:09Z

Yep. Will take a look.

aledbf · 2018-03-20T18:52:16Z

@trnl thank you for the detailed report.

trnl · 2018-03-20T18:57:11Z

On another cluster, I can't reproduce it, and the only difference which I noticed is the:

> uname -a
Linux ... 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

trnl · 2018-03-24T14:25:34Z

@rnburn , maybe I can provide some additional information?

rnburn · 2018-03-26T14:59:37Z

@trnl - The information is good. I can tell it's crashing here. I haven't had a chance yet to investigate further but will try to fix this week.

rnburn · 2018-03-28T19:57:40Z

I looked into this and I couldn't find any nginx documentation that says you can't modify headers_in like this line of the code does, but I inquired on nginx-devel to see if there's anything I'm missing.

Something like a smaller isolated example that reproduces the problem or a core dump could help to narrow the problem down.

aledbf · 2018-03-28T20:06:45Z

@rnburn that was fast http://mailman.nginx.org/pipermail/nginx-devel/2018-March/010988.html

aledbf · 2018-04-13T15:05:36Z

@rnburn any update on this?

rnburn · 2018-04-13T17:09:29Z

Hey @aledbf - I have plans to put in a fix based off of this answer I got on the nginx mailing list:
http://mailman.nginx.org/pipermail/nginx-devel/2018-March/011008.html

rnburn · 2018-05-17T19:18:24Z

@aledbf - I put in opentracing-contrib/nginx-opentracing#37 with a fix for this.

The fix required doing some reorganization of the module. I removed the vendor-specific modules in favor of loading in the tracer's dynamically with opentracing_load_tracer from a JSON representation of their configuration.

I can put in a PR to update ingress-nginx to do this.

I think this approach should work better for you since you can use an embedded YAML configuration for the tracer without having any code to worry about handling a tracer's specific options.

aledbf · 2018-05-17T19:32:31Z

@rnburn thank you for the update, the new json file makes sense.

I can put in a PR to update ingress-nginx to do this.

Sure. First let me update the nginx image (I am updating the modules) (#2537)

aledbf · 2018-06-21T00:54:29Z

@rnburn what's the format of the new json configuration for zipkin

rnburn · 2018-06-21T01:17:36Z

It's got a schema file for the configuration: https://github.com/rnburn/zipkin-cpp-opentracing/blob/master/zipkin_opentracing/tracer_configuration.schema.json

aledbf · 2018-06-21T20:55:42Z

@trnl can you test quay.io/aledbf/nginx-ingress-controller:0.382 ?
It contains the fix for the bug you reported.
Thanks in advance

aledbf added the kind/bug Categorizes issue or PR as related to a bug. label Mar 20, 2018

rnburn mentioned this issue May 17, 2018

Fix context propagation opentracing-contrib/nginx-opentracing#37

Merged

aledbf mentioned this issue Jun 21, 2018

Update opentracing configuration #2676

Merged

aledbf closed this as completed in #2676 Jun 21, 2018

bgagnon mentioned this issue Jul 27, 2018

Segmentation fault upon loading libjaegertracing_plugin.so #2856

Closed

rnburn mentioned this issue Jan 30, 2020

Is more_set_input_headers safe? openresty/headers-more-nginx-module#87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault due to opentracing #2222

segfault due to opentracing #2222

trnl commented Mar 20, 2018

aledbf commented Mar 20, 2018

rnburn commented Mar 20, 2018

aledbf commented Mar 20, 2018

trnl commented Mar 20, 2018

trnl commented Mar 24, 2018

rnburn commented Mar 26, 2018 •

edited

Loading

rnburn commented Mar 28, 2018

aledbf commented Mar 28, 2018

aledbf commented Apr 13, 2018

rnburn commented Apr 13, 2018

rnburn commented May 17, 2018

aledbf commented May 17, 2018 •

edited

Loading

aledbf commented Jun 21, 2018

rnburn commented Jun 21, 2018

aledbf commented Jun 21, 2018

segfault due to opentracing #2222

segfault due to opentracing #2222

Comments

trnl commented Mar 20, 2018

aledbf commented Mar 20, 2018

rnburn commented Mar 20, 2018

aledbf commented Mar 20, 2018

trnl commented Mar 20, 2018

trnl commented Mar 24, 2018

rnburn commented Mar 26, 2018 • edited Loading

rnburn commented Mar 28, 2018

aledbf commented Mar 28, 2018

aledbf commented Apr 13, 2018

rnburn commented Apr 13, 2018

rnburn commented May 17, 2018

aledbf commented May 17, 2018 • edited Loading

aledbf commented Jun 21, 2018

rnburn commented Jun 21, 2018

aledbf commented Jun 21, 2018

rnburn commented Mar 26, 2018 •

edited

Loading

aledbf commented May 17, 2018 •

edited

Loading