Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[destination-s3] Error uploading file to S3 compatible storage due to ETag mismatch #36035

Closed
1 task
anikoo-aka opened this issue Mar 13, 2024 · 9 comments
Closed
1 task

Comments

@anikoo-aka
Copy link

anikoo-aka commented Mar 13, 2024

Connector Name

destination-s3

Connector Version

0.5.8

What step the error happened?

During the sync

Relevant information

The S3 destination connector is complaining about the ETag mismatch. I tried uploading and verifying the ETag using s3cmd and it worked as expected.

S3 service used in the test: Linode Object Storage (S3 Compatible)

Relevant log output

2024-03-13 18:43:02 destination > 2024-03-13 18:43:02 INFO a.m.s.StreamTransferManager(complete):367 - [Manager uploading to automated-backup-test/backup/orders/2024_03_130.parquet with id 2~HiWD6tn...fGEYmtDv7]: Uploading leftover stream [Part number 1 containing 3.46 MB]
2024-03-13 18:43:02 destination > 2024-03-13 18:43:02 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to automated-backup-test/backup/orders/2024_03_130.parquet with id 2~HiWD6tn...fGEYmtDv7]: Finished uploading [Part number 1 containing 3.46 MB]
2024-03-13 18:43:02 destination > 2024-03-13 18:43:02 ERROR i.a.c.i.d.s.S3StorageOperations(uploadRecordsToBucket):138 - Failed to upload records into storage backup/orders/2024_03_13
2024-03-13 18:43:02 destination > alex.mojaki.s3upload.IntegrityCheckException: File upload completed, but integrity check failed. Expected ETag: 3a7881329c7ea3c7480b590ac8b21634-1 but actual is 
2024-03-13 18:43:02 destination >         at alex.mojaki.s3upload.StreamTransferManager.checkCompleteFileIntegrity(StreamTransferManager.java:407) ~[s3-stream-upload-2.2.2.jar:?]
2024-03-13 18:43:02 destination >         at alex.mojaki.s3upload.StreamTransferManager.complete(StreamTransferManager.java:392) ~[s3-stream-upload-2.2.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.destination.s3.S3StorageOperations.loadDataIntoBucket(S3StorageOperations.java:210) ~[airbyte-cdk-s3-destinations-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.destination.s3.S3StorageOperations.uploadRecordsToBucket(S3StorageOperations.java:134) ~[airbyte-cdk-s3-destinations-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.destination.s3.S3ConsumerFactory.lambda$flushBufferFunction$2(S3ConsumerFactory.java:128) ~[airbyte-cdk-s3-destinations-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.destination.record_buffer.SerializedBufferingStrategy.flushAllBuffers(SerializedBufferingStrategy.java:137) [airbyte-cdk-core-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer.close(BufferedStreamConsumer.java:298) [airbyte-cdk-core-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.base.FailureTrackingAirbyteMessageConsumer.close(FailureTrackingAirbyteMessageConsumer.java:82) [airbyte-cdk-core-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.base.Destination$ShimToSerializedAirbyteMessageConsumer.close(Destination.java:96) [airbyte-cdk-core-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.java:191) [airbyte-cdk-core-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.base.IntegrationRunner.run(IntegrationRunner.java:125) [airbyte-cdk-core-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.cdk.integrations.base.adaptive.AdaptiveDestinationRunner$Runner.run(AdaptiveDestinationRunner.java:102) [airbyte-cdk-core-0.10.2.jar:?]
2024-03-13 18:43:02 destination >         at io.airbyte.integrations.destination.s3.S3DestinationRunner.main(S3DestinationRunner.java:15) [io.airbyte.airbyte-integrations.connectors-destination-s3-0.50.41.jar:?]
2024-03-13 18:43:02 destination > 2024-03-13 18:43:02 INFO i.a.c.i.d.s.S3StorageOperations(uploadRecordsToBucket):128 - Retrying to upload records into storage backup/orders/2024_03_13 (1/3})
2024-03-13 18:43:02 destination > 2024-03-13 18:43:02 INFO i.a.c.i.d.s.S3DestinationConfig(createS3Client):239 - Creating S3 client...
2024-03-13 18:43:02 destination > 2024-03-13 18:43:02 INFO a.m.s.StreamTransferManager(getMultiPartOutputStreams):329 - Initiated multipart upload to automated-backup-test/backup/orders/2024_03_131.parquet with full ID 2~ZiM--1N2ORdvVY8KFj1N_XHKokesbJt
2024-03-13 18:43:02 destination > 2024-03-13 18:43:02 INFO a.m.s.MultiPartOutputStream(close):158 - Called close() on [MultipartOutputStream for parts 1 - 10000]
2024-03-13 18:43:02 destination > 2024-03-13 18:43:02 ERROR i.a.c.i.d.s.S3StorageOperations(loadDataIntoBucket):204 - Failed to load data into storage backup/orders/2024_03_13
2024-03-13 18:43:02 destination > java.io.IOException: Stream Closed

Contribute

  • Yes, I want to contribute
@anikoo-aka
Copy link
Author

Worth mentioning this issue happens regardless of the output format. I tried with Apache Avro, Parquet, and JSON but got the same error.

@marcosmarxm marcosmarxm changed the title Error uploading file to S3 compatible storage due to ETag mismatch [destination-s3] Error uploading file to S3 compatible storage due to ETag mismatch Mar 14, 2024
@nitso
Copy link

nitso commented Mar 21, 2024

I encounter same error with other S3 compatible storage.
@anikoo-aka do you use encrypted storage? I'm not used with Linode, but I've googled couples of messages regarding ETag problem with encrypted storages (1, 2)

2024-03-20 13:06:00  destination  > 2024-03-20 13:06:00  ERROR  i.a.c.i.d.s.S3StorageOperations(uploadRecordsToBucket):138 - Failed to upload records into storage pdm_export/airbyte_manufacturer_export/2024_03_20_1710939957946_
2024-03-20 13:06:00  destination  > alex.mojaki.s3upload.IntegrityCheckException: File upload completed, but integrity check failed. Expected ETag: bebd2a6d14e305ffedcd547d9841b4ef-1 but actual is 1aa89dd9c800a0137989290a557c13e0
2024-03-20 13:06:00  destination  > 	at alex.mojaki.s3upload.StreamTransferManager.checkCompleteFileIntegrity(StreamTransferManager.java:407) ~[s3-stream-upload-2.2.2.jar:?]
2024-03-20 13:06:00  destination  > 	at alex.mojaki.s3upload.StreamTransferManager.complete(StreamTransferManager.java:392) ~[s3-stream-upload-2.2.2.jar:?]

@marcosmarxm
Copy link
Member

@airbytehq/destinations can someone take a look in this issue in the next grooming section?

@nitso
Copy link

nitso commented Mar 28, 2024

FYI: version 0.3.5 works well.
I've downgraded destination-S3 to this version and could successfully sync data.

Current version 0.5.9 won't work neither.

@evantahler evantahler added the frozen Not being actively worked on label Apr 30, 2024
@PsycheShaman
Copy link

PsycheShaman commented May 15, 2024

Facing a similar issue using Ceph S3 as the destination. File is created successfully on S3 and the E-tag of the object matches the expected E-tag.

However, it seems like the integrity check is obtaining an empty string as the E-tag:

Expected ETag: 82ffabb7c40d94fe968b2f4d5ea970ff-1 but actual is <- there is no "actual" etag in the error logs

Not sure how to downgrade only the destination-S3 as suggested mby @nitso , since I've deployed Airbyte using helm. I haven't changed the helm chart version recently though, so not sure why it's causing a problem now...

@evantahler
Copy link
Contributor

Can you please confirm you are on the latest s3 destination - 0.6.1?

Would anyone here be willing to look into fixing this check, or to add an option to disable the etag match check if the vendor doesn't report etag properly?

@PsycheShaman
Copy link

PsycheShaman commented May 17, 2024

@evantahler I can confirm that 0.6.1 and 0.5.9 both give these issues and downgrading to 0.3.5 as @nitso suggested works. Have not tried other versions...

@evantahler
Copy link
Contributor

Linode is not following the S3 spec, closing this issue as we will not work on an exception at this time.

@evantahler evantahler closed this as not planned Won't fix, can't repro, duplicate, stale Aug 13, 2024
@nitso
Copy link

nitso commented Aug 16, 2024

@evantahler could you please pick exact features from S3 specs that are not followed? These are good to be documented.
Linode is not the only one storage that does not work with S3 connector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

7 participants