Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[s3] [2.31.1+] S3 preserve_data_ordering instability causing crash [under investigation] #552

Closed
PettitWesley opened this issue Feb 21, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@PettitWesley
Copy link
Contributor

PettitWesley commented Feb 21, 2023

There is a report which I have confirmed that our most recent versions introduced increased instability in the S3 output.

More details will be added when we have them. Currently what we know is that this impacts configs with the following:

preserve_data_ordering On

Mitigations

We have released new versions that revert recent S3 commits which should not be impacted by this issue:

I obtained the following stack trace for this crash:

(gdb) bt
#0  0x00007f2011b38ca0 in raise () from /lib64/libc.so.6
#1  0x00007f2011b3a148 in abort () from /lib64/libc.so.6
#2  0x000000000045599e in flb_signal_handler (signal=11) at /tmp/fluent-bit-1.9.10/src/fluent-bit.c:581
#3  <signal handler called>
#4  0x0000000000855569 in __mk_list_del (prev=0x0, next=0x0) at /tmp/fluent-bit-1.9.10/lib/monkey/include/monkey/mk_core/mk_list.h:87
#5  0x00000000008555a1 in mk_list_del (entry=0x7f200ac056e0) at /tmp/fluent-bit-1.9.10/lib/monkey/include/monkey/mk_core/mk_list.h:93
#6  0x0000000000856153 in chunk_state_sync (ch=0x7f200ac056a0) at /tmp/fluent-bit-1.9.10/lib/chunkio/src/cio_chunk.c:506
#7  0x0000000000856297 in cio_chunk_up_force (ch=0x7f200ac056a0) at /tmp/fluent-bit-1.9.10/lib/chunkio/src/cio_chunk.c:560
#8  0x0000000000858abd in cio_file_content_copy (ch=0x7f200ac056a0, out_buf=0x7f200accd338, out_size=0x7f200accd330) at /tmp/fluent-bit-1.9.10/lib/chunkio/src/cio_file.c:441
#9  0x0000000000855c15 in cio_chunk_get_content_copy (ch=0x7f200ac056a0, out_buf=0x7f200accd338, out_size=0x7f200accd330)
    at /tmp/fluent-bit-1.9.10/lib/chunkio/src/cio_chunk.c:247
#10 0x000000000064303f in flb_fstore_file_content_copy (fs=0x7f2004d593c8, fsf=0x7f200ac0b080, out_buf=0x7f200accd338, out_size=0x7f200accd330)
    at /tmp/fluent-bit-1.9.10/src/flb_fstore.c:263
#11 0x0000000000603c49 in s3_store_file_read (ctx=0x7f2004d60500, s3_file=0x7f200ac20038, out_buf=0x7f200accd338, out_size=0x7f200accd330)
    at /tmp/fluent-bit-1.9.10/plugins/out_s3/s3_store.c:423
#12 0x00000000005ff393 in construct_request_buffer (ctx=0x7f2004d60500, new_data=0x0, chunk=0x7f200ac20038, out_buf=0x7f200accd3b8, out_size=0x7f200accd3b0)
    at /tmp/fluent-bit-1.9.10/plugins/out_s3/s3.c:1366
#13 0x00000000006008b9 in send_upload_request (out_context=0x7f2004d60500, chunk=0x0, upload_file=0x7f200ac20038, m_upload_file=0x0,
    tag=0x7f200ac21ce0 "app2-firelens-b1703f6d-0fb4-47b8-b309-3d5dd0c975a9", tag_len=50) at /tmp/fluent-bit-1.9.10/plugins/out_s3/s3.c:1724
#14 0x0000000000600c4a in s3_upload_queue (config=0x7f200ee19240, out_context=0x7f2004d60500) at /tmp/fluent-bit-1.9.10/plugins/out_s3/s3.c:1790
#15 0x00000000006023ef in cb_s3_flush (event_chunk=0x7f2004e83bf8, out_flush=0x7f200ac0b980, i_ins=0x7f200ee0a280, out_context=0x7f2004d60500, config=0x7f200ee19240)
    at /tmp/fluent-bit-1.9.10/plugins/out_s3/s3.c:2272
#16 0x00000000004e6ee0 in output_pre_cb_flush () at /tmp/fluent-bit-1.9.10/include/fluent-bit/flb_output.h:522
#17 0x0000000000a4f827 in co_init () at /tmp/fluent-bit-1.9.10/lib/monkey/deps/flb_libco/amd64.c:117
#18 0x0000000000000000 in ?? ()
(gdb) quit
@PettitWesley PettitWesley changed the title [s3] [2.31.1+] S3 instability causing crash [under investigation] [s3] [2.31.1+] S3 preserve_data_ordering instability causing crash [under investigation] Feb 21, 2023
@PettitWesley PettitWesley added the bug Something isn't working label Feb 23, 2023
@PettitWesley
Copy link
Contributor Author

Please post here if you saw the same issue yourself. We are not yet certain how frequently this occurs.

Current testing suggests that turning preserve_data_ordering Off successfully mitigates/prevents the crash.

@PettitWesley
Copy link
Contributor Author

We suspect the fix here resolved it: https://github.com/aws/aws-for-fluent-bit/releases/tag/v2.31.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant