Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lost records, many in_tail resources (kubernetes) #1385

Closed
epcim opened this issue Jun 14, 2019 · 2 comments
Closed

lost records, many in_tail resources (kubernetes) #1385

epcim opened this issue Jun 14, 2019 · 2 comments

Comments

@epcim
Copy link
Contributor

epcim commented Jun 14, 2019

Bug Report

Describe the bug

  • using fluentbit intended for 1.2 release (ie: build from current master)
  • fluentbit reads ~20 logs from k8s pods (in_tail /var/log/containers/*.log)
  • fluentbit set to "trace" to see all events
  • OUTPUT
    • forward to fluentd instances
    • stdout for debuging purposes

I do experience one particular log (its a pod called OBELIX), not to be forwarded (and sure not to be printed to stdout) after a while <1min. I mean, when I start fluentbit or if I will restart OBELIX pod, the fluent . bit in_tail reads the file and process it as expected (kubernetes metatdata, merge_log, json parsing - all ok). Then I see can se few iterations as the processed records are FW and printed to STDOUT (Flush is 5s). Now OBELIX records are still coming ~ 3 records/10s, but I no longer see any STDOUT of parsed OBELIX log. I don't see any further out_fw event for OBELIX log file.

Most of the other apps keeps sending their records to forward, stdout.

It's hard too reproduce. It's hard to address what "tags/inputs" stop to process and what keeps going.

If I will modify in_tail to read only OBELIX logs, they do not stop to be processed, and works.. The same on my sandbox environment, just obelix logs works (hundreds of records). My homework is to test with other logs as well.

I checked in_tail .db whether the offset increases on OBELIX log, and yes it keep increasing (so reading is quite sure OK.

To Reproduce
Havent reproduced, I offer @edsiper an hangout session on live env.
The setup I use is very similar to: https://github.com/epcim/fluentbit-sandbox

To see what fluentbit is doing, I use this to filter the stdout:

kubectl logs $fluentbit0 --tail 3000 -f | grep -v 'timestamp: [0-9]*, value: [0-9]*$' | grep -v 'total: [0-9]*$' | grep -v 'output map size' | grep -v 'could not merge JSON log as requested' |grep -e task -e input -e out_fw -e obelix -e '"app"=>"obelix"'&

Expected behavior

  • READ logs are forwarded and printed to stdout
  • We shall have some check for consistency between in_tail and output/filters to be sure we are not loosing records.
  • on "debug" log level, would be nice to have possibility to print every wisdomly dropped message?

Your Environment

FROM debian:stretch as builder

# Fluent Bit version
ENV FLB_MAJOR 1
ENV FLB_MINOR 2
ENV FLB_PATCH 0
ENV FLB_VERSION 514f097c5b2be27117f7dff0465a807fa2f3a696

ENV DEBIAN_FRONTEND noninteractive

ENV FLB_TARBALL http://github.com/fluent/fluent-bit/archive/$FLB_VERSION.zip
RUN mkdir -p /fluent-bit/bin /fluent-bit/etc /fluent-bit/log /tmp/fluent-bit-master/

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
      build-essential \
      cmake \
      make \
      wget \
      unzip \
      flex \
      bison \
      libssl1.0-dev \
      libasl-dev \
      libsasl2-dev \
      pkg-config \
      libsystemd-dev \
      zlib1g-dev \
      ca-certificates \
    && wget -O "/tmp/fluent-bit-${FLB_VERSION}.zip" ${FLB_TARBALL} \
    && cd /tmp && unzip "fluent-bit-$FLB_VERSION.zip" \
    && cd "fluent-bit-$FLB_VERSION"/build/ \
    && rm -rf /tmp/fluent-bit-$FLB_VERSION/build/*

WORKDIR /tmp/fluent-bit-$FLB_VERSION/build/
RUN cmake -DFLB_DEBUG=On \
          -DFLB_TRACE=Off \
          -DFLB_JEMALLOC=On \
          -DFLB_TLS=On \
          -DFLB_SHARED_LIB=Off \
          -DFLB_EXAMPLES=Off \
          -DFLB_HTTP_SERVER=On \
          -DFLB_IN_SYSTEMD=On \
          -DFLB_OUT_KAFKA=On ..

RUN make -j $(getconf _NPROCESSORS_ONLN)
RUN install bin/fluent-bit /fluent-bit/bin/

# Configuration files
COPY conf/fluent-bit.conf \
     conf/parsers.conf \
     conf/parsers_java.conf \
     conf/parsers_extra.conf \
     conf/parsers_openstack.conf \
     conf/parsers_cinder.conf \
     conf/plugins.conf \
     /fluent-bit/etc/

FROM debian:stretch
LABEL Description="Fluent Bit docker image" Vendor="XXX" Version="0.1"

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
      curl ca-certificates \
      liblz4-1 \
      libssl1.0 \
      libasl0 \
      libsasl2-2 \
      libsystemd0 \
      zlib1g \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

COPY --from=builder /fluent-bit /fluent-bit

#
EXPOSE 2020

# Entry point
CMD ["/fluent-bit/bin/fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.conf"]
@epcim epcim changed the title lost records in complex setup lost records, many in_tail resources (kubernetes) Jun 17, 2019
@epcim
Copy link
Contributor Author

epcim commented Jun 17, 2019

Update: I do have the situation on multiple locations(deployments).

@epcim
Copy link
Contributor Author

epcim commented Oct 22, 2020

not up to date

@epcim epcim closed this as completed Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant