Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual retry logic is needed for "read: connection reset by peer" #73

Open
techdragon opened this issue Mar 9, 2023 · 1 comment
Open

Comments

@techdragon
Copy link

I get failures I can't retry (the command has the retry configured I set --s3-retry 8 --s3-retry-sleep 16 but it still terminates immediately) when trying to sync a lot of small files. The only way I get it to continue is setting --error-handling 255 to get it to ignore anything that goes wrong and skip to the next file. However this means I will always end up with an incomplete sync...

ERRO[0328] Sync error: pipeline step: 1 (LoadObjData) failed with error: object: <redacted>.jpeg sync error: read tcp 10.30.9.67:38784->52.95.134.115:443: read: connection reset by peer, terminating 
DEBU[0328] Pipeline step: ListSource finished           
DEBU[0328] Pipeline step: LoadObjData finished          
DEBU[0344] Pipeline step: UploadObj finished            
DEBU[0344] Pipeline step: Terminator finished           
DEBU[0344] All pipeline steps finished                  
DEBU[0344] Pipeline terminated                          
INFO[0344] Pipeline step finished                        ErrorObj=0 InputObj=0 InputObjSpeed=0 OutputObj=606000 OutputObjSpeed=1759.3080175990012 stepName=ListSource stepNum=0
INFO[0344] Pipeline step finished                        ErrorObj=1 InputObj=606000 InputObjSpeed=1759.3080175990012 OutputObj=605509 OutputObjSpeed=1757.8825716639499 stepName=LoadObjData stepNum=1
INFO[0344] Pipeline step finished                        ErrorObj=0 InputObj=605509 InputObjSpeed=1757.8825716639499 OutputObj=605509 OutputObjSpeed=1757.8825716639499 stepName=UploadObj stepNum=2
INFO[0344] Pipeline step finished                        ErrorObj=0 InputObj=605509 InputObjSpeed=1757.8825716639499 OutputObj=0 OutputObjSpeed=0 stepName=Terminator stepNum=3
INFO[0344] Duration: 5m44.453708798s                     durationSec=344.453707715
ERRO[0344] Sync Failed                                   status=1

I'm not actively using the go language for anything so apologies if while reading through code and working this out I have missed something and got the situation wrong, but I believe you need to manually handle the "read: connection reset by peer" errors from the AWS Go SDK. The information in aws/aws-sdk-go#2926, specifically aws/aws-sdk-go#2926 (comment) which outlines their legitimate case for why they changed the behaviour, and explains what can be done to get things to retry like they used to. In a synchronisation tool like this one, retrying these failures should be an idempotent thing to do so the custom retry handler should hopefully be pretty simple, since we want the other bucket to look like the first one there's no risk in performing the LoadObjData again and basically just trying again like we can for the other S3 errors...

@jsomeara
Copy link

bump. Error handling as a whole could be improved a lot.
image
It just skips instead of retrying even though I set up retrying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants