-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow selection of AWS vs GCP as a source #102
Comments
@ahmedhosny Thank you for creating this issue. Yes, we certainly can provide this option to the user. At the time of implementing this, I did not think very hard about aws to gcs bucket mapping. After writing a simple query now to check aws to gcs bucket mapping, we certainly can predict which bucket the data will be present, from any given aws or gcs url. Currently our index has only aws urls, a choice thoughtfully made, as we can track how much data is being downloaded from AWS. On the other hand, we don't have a way to track GCS downloads. Regardless, with the below bucket mapping, we can predict the gcs urls from aws urls. And we do mirror data i.e aws and gcs buckets are identical clones. This is a relatively easy feature to implement, and we'll keep you posted. |
@ahmedhosny could elaborate on the background for your suggestion? I am interested if current support of download from AWS only has any significant consequences. I assumed there isn't, since data egress is free, and I would expect performance of the download should be similar from AWS and GCP. The only potential downside I could think of is if one wants to use GCP-native tools for download. Is there anything else we missed? |
@fedorov We are running @vkt1414 Thank you for the background there. We added the P.S. Very impressed with IDC and the maturity it has reached. Great work 👏 |
@ahmedhosny thank you for the clarification! Note, however, that you can use It is as simple as the following:
Behind the scenes, Happy to jump on a call to guide you and get your feedback! Sorry, documentation is behind .... |
Thanks @fedorov! Yes, we plan to switch to |
I see that there is embedded logic that selects between them here. I am curious if one, e.g. AWS, can be enforced (similar to what can be done on the IDC website when you hit "Download Images").
Re the distribution of data across AWS and GCP, it seems like the data is not mirrored across them?
The text was updated successfully, but these errors were encountered: