-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3fs IO errors (was: Error received during crawl) #772
Comments
We haven't had a chance to test with s3fs, so can't really help there specifically. However, Browsertrix Crawler actually has native support for uploading to S3-compatible storage. For security, the S3 settings are only provided via environment variables.
This will upload the WACZ to We have some docs on this, but they should be extended to include this example: A working example can be found in the tests also: You can set the --sizeLimit on the crawl, where it will upload to S3 and exit, and you can run it in a script that restarts the crawler in this way. (We use it this way in Browsertrix app with Kubernetes). At this time, only upload WACZs is supported, and the WACZ upload should stream directly to S3, without requiring any additional local disk space. Hope this helps! |
@ikreymer The Problem is, that only the S3 bucket contains the previous crawls. The crawler server has no previous data. As far as I am aware only the latest crawl would be included in the WACZ file if we did it this way. Is that correct. I did find a solution for us. With rclone (with |
Hi, I have an issue that I am hoping you can help me with.
I am trying to archive a rather large site. Because the resulting archives quickly filled up my VPS' storage, I mounted an S3 space with s3fs. But when I run the scraper I get this error.
I tried it many times and it always causes the same error. The s3fs mount is stable during that time, so it isn't just a network disconnect. Do you have any ideas what could cause this issue?
The text was updated successfully, but these errors were encountered: