Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: kafka-logger, always log not found broker after kafka-logger kafka broker up. until reload apisix. #6033

Closed
guxingke opened this issue Jan 6, 2022 · 21 comments · Fixed by #8762
Labels
checking check first if this issue occurred

Comments

@guxingke
Copy link

guxingke commented Jan 6, 2022

Issue description

always log not found broker after kafka-logger kafka broker up. until reload apisix.

Environment

  • apisix version (cmd: apisix version): 2.10.1
  • OS (cmd: uname -a):Linux test 3.10.0-957.el7.x86_64 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • OpenResty / Nginx version (cmd: nginx -V or openresty -V):
    nginx version: openresty/1.19.3.2
    built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
    built with OpenSSL 1.1.1l 24 Aug 2021
    TLS SNI support enabled
    configure arguments: --prefix=/duitang/dist/sys/openresty/nginx --with-cc-opt='-O2 -DAPISIX_BASE_VER=0.0.0 -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl111/include' --add-module=../ngx_devel_kit-0.3.1 --add-module=../echo-nginx-module-0.62 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../srcache-nginx-module-0.32 --add-module=../ngx_lua-0.10.19 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../ngx_stream_lua-0.0.9 --with-ld-opt='-Wl,-rpath,/duitang/dist/sys/openresty/luajit/lib -Wl,-rpath,-L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl111/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl111/lib' --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../ngx_multi_upstream_module --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../apisix-nginx-module --add-module=/duitang/dist/opt/builder/output/openresty-1.19.3.2/../lua-var-nginx-module --with-poll_module --with-pcre-jit --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_v2_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-compat --with-stream --with-http_ssl_module
  • etcd version, if have (cmd: run curl http://127.0.0.1:9090/v1/server_info to get the info from server-info API):{"version":"2.10.1","boot_time":1641439475,"etcd_version":"3.5.0","up_time":83420,"last_report_time":1641522886,"id":"8d53dda0-5ec2-44b6-872c-70f18b0397a8","hostname":"wvr-ngxweb-0-20.duitang.net"}
  • apisix-dashboard version, if have:
  • the plugin runner version, if the issue is about a plugin runner (cmd: depended on the kind of runner):
  • luarocks version, if the issue is about installation (cmd: luarocks --version):

Steps to reproduce

  1. use kafka logger
  2. make karka borker down
  3. until apisix error log apprear like 'not found borker' logs.
  4. make kafka borker up
  5. apisix not auto recover

Actual result

after kafka borker up , send logs.

Error log

buffered messages send to kafka err: not found broker, retryable: true, ....

Expected result

notings

@tzssangglass
Copy link
Member

provide reproduction steps and environment & version information.

@guxingke
Copy link
Author

guxingke commented Jan 7, 2022

provide reproduction steps and environment & version information.

description edited

@tzssangglass
Copy link
Member

tzssangglass commented Jan 7, 2022

I need a step by step replication of how you build kafka, how you configure route, the Kafka-logger plugin, etc.

@tzssangglass
Copy link
Member

  1. make karka borker down

how to do this?

  1. make kafka borker up

Is kafka broken definitely available at this point?

In my experience, the kafka producer has a lrucache, so you can update the kafka-logger configuration with the cluster_name parameter to generate a new kafka producer?

@guxingke
Copy link
Author

guxingke commented Jan 7, 2022

the kafka broken in other case .


when i restart kafka borkers , and waiting the apisix send message auto recover. but not , about 5 minute later , relaod the apisix, it works well.


In my experience, the kafka producer has a lrucache, so you can update the kafka-logger configuration with the cluster_name parameter to generate a new kafka producer?

Maybe in the next time, I will try this solution.

@tzssangglass
Copy link
Member

when i restart kafka borkers , and waiting the apisix send message auto recover. but not , about 5 minute later , relaod the apisix, it works well.

refer to: doujiang24/lua-resty-kafka#70

maybe we need to add refresh_interval to skip this bug.

@guxingke If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned?

@guxingke
Copy link
Author

guxingke commented Jan 8, 2022

If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned?

yes, this can.

maybe we need to add refresh_interval to skip this bug.

yeah, It's best if it can recover automatically

@tzssangglass tzssangglass added the checking check first if this issue occurred label Jan 8, 2022
@tzssangglass
Copy link
Member

If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned

@guxingke I was unable to reproduce the scenario with this on the master branch.

Here is my configuration of route:

{
    "plugins": {
        "kafka-logger": {
            "broker_list": {
                "127.0.0.1": 9092
            },
            "kafka_topic": "apisix_traffic_logs",
            "key": "key1",
            "timeout": 1,
            "batch_max_size": 1
        }
    },
    "upstream": {
        "nodes": {
            "127.0.0.1:1980": 1
        },
        "type": "roundrobin"
    },
    "uri": "/hello"
}

Here is my exception log:

2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: apisix_traffic_logs, partition_id: 0, length: 1, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080

When I restarted the kafka container, it continued to push messages to kafka normally again.

Note: I did not make the kafka volume, and the restarted kafka is a new cluster.

@guxingke
Copy link
Author

my kafka logger configuration

{
  "batch_max_size": 100,
  "broker_list": {
    "10.200.32.69": 9092,
    "10.200.32.70": 9092,
    "10.200.32.71": 9092
  },
  "disable": false,
  "kafka_topic": "nginx_www"
}

and it as a global plugin.

and it use user-define log_format.


Maybe we didn't repeating it because our configuration was different.

@tzssangglass
Copy link
Member

Maybe we didn't repeating it because our configuration was different.

I need to be able to simulate the kafka broker down de scenario you mentioned.

@guxingke
Copy link
Author

maybe your kafka recover too quickly? I could not provider more information about this .😂

@tzssangglass
Copy link
Member

maybe your kafka recover too quickly? I could not provider more information about this .😂

I got connection refused, I need to be able to reproduce this exception accurately in order to fix it.

@guxingke
Copy link
Author

Retrieved the logs and found both errors. the logs like below.

2022/01/06 16:05:12 [error] 32298#32298: *518806498 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: nginx_www, partition_id: 1, lengt
h: 1, context: ngx.timer, client: y, server: x
2022/01/06 16:05:12 [error] 32296#32296: *518806551 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y, server: x
2022/01/06 16:05:13 [error] 32299#32299: *518806730 [lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y server: x
2022/01/06 16:05:13 [error] 32294#32294: *518806744 [lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true, topic: nginx_www, partition_id: 1, length: 1, context: ngx.timer, client: y, server:x

and the first time for connection refused at '16:04:44', the other at '16:05:13' .

@tzssangglass
Copy link
Member

I don't know how to reproduce not found broker 😂

@kennethToyuan
Copy link

2022/04/11 10:48:19 [error] 46#46: *680816936 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.62, server: 0.0.0.0:9080
2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080
2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080
2022/04/11 10:48:19 [error] 50#50: *680817019 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.39, server: 0.0.0.0:9080
2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080
2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080
2022/04/11 10:48:20 [error] 44#44: *680817160 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.33, server: 0.0.0.0:9080
2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080
2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080
2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080
2022/04/11 10:48:21 [error] 46#46: *680817452 [lua] client.lua:150: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 10.0.5.62, server: 0.0.0.0:9080
2022/04/11 10:48:21 [error] 46#

@jujiale
Copy link
Contributor

jujiale commented Sep 21, 2022

we also suffered such thing
[lua] producer.lua:272: buffered messages send to kafka err: not found broker, retryable: true,

@tzssangglass
Copy link
Member

@kennethToyuan @jujiale pls provide full reproduction steps, it looks like this is a mandatory bug on your environment.

@tokers
Copy link
Contributor

tokers commented Sep 22, 2022

2022/04/11 10:48:20 [error] 48#48: *680817247 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: paas_api_log_topic, partition_id: -1, length: 1, context: ngx.timer, client: 10.0.5.69, server: 0.0.0.0:9080

Does the topic exist in the kafka cluster?

@jujiale
Copy link
Contributor

jujiale commented Sep 26, 2022

thans to your reply.
I'm sure I add the tpoic and the topic is exist in kafka. what more is I find in https://github.com/doujiang24/lua-resty-kafka/issues?q=buffered+messages+send+to+kafka+err%3A+not+found+broker there is other people who suffered such situation as me. so I guess problem is in lua-resty-kafka.

@jujiale
Copy link
Contributor

jujiale commented Jan 30, 2023

If I start kafka with docker, shut down the container after starting it and start it again, can I simulate the broker down scenario you mentioned

@guxingke I was unable to reproduce the scenario with this on the master branch.

Here is my configuration of route:

{
    "plugins": {
        "kafka-logger": {
            "broker_list": {
                "127.0.0.1": 9092
            },
            "kafka_topic": "apisix_traffic_logs",
            "key": "key1",
            "timeout": 1,
            "batch_max_size": 1
        }
    },
    "upstream": {
        "nodes": {
            "127.0.0.1:1980": 1
        },
        "type": "roundrobin"
    },
    "uri": "/hello"
}

Here is my exception log:

2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] client.lua:150: refresh(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080
2022/01/10 13:36:36 [error] 18553#25478878: *43867 [lua] producer.lua:272: buffered messages send to kafka err: connection refused, retryable: true, topic: apisix_traffic_logs, partition_id: 0, length: 1, context: ngx.timer, client: 127.0.0.1, server: 0.0.0.0:9080

When I restarted the kafka container, it continued to push messages to kafka normally again.

Note: I did not make the kafka volume, and the restarted kafka is a new cluster.

I find when use kafka cluster model(such as 3 nodes). it occurs such problem. but single node seems ok.

@Zeeeeta
Copy link

Zeeeeta commented Jun 26, 2024

2024/06/26 09:43:38 [error] 49#49: *7790154 [lua] client.lua:210: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080
2024/06/26 09:43:39 [error] 49#49: *7790154 [lua] client.lua:210: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080
2024-06-26T17:43:39.074091868+08:00 2024/06/26 09:43:39 [error] 49#49: *7790154 [lua] producer.lua:272: buffered messages send to kafka err: not found topic, retryable: true, topic: apigw.logs.http-access, partition_id: -1, length: 1, context: ngx.timer, client: 172.23.8.161, server: 0.0.0.0:9080

Same issue, at first it works, then after a while (no config changed) and I try to call API again, the above errors are shown and it is unable to write to Kafka.

APISIX running on K8s with 1 APISIX, 3 etcd
Kafka is a Single broker testing node and requires SASL_PLAN auth, admin super user is used and the topic exists.

After a while it worked again, but the calls during error are not logged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checking check first if this issue occurred
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants