Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update md5sum of the s1 data file #550

Merged
merged 1 commit into from
Jun 13, 2022

Conversation

mehmetgunturkun
Copy link
Contributor

Md5 value for file BigEarthNet-S1-v1.0.tar.gz seems like incorrect which causes code to download the whole data from scratch even though it exists.

$ cat BigEarthNet-S1-v1.0.tar.gz.md5sum 
94ced73440dea8c7b9645ee738c5a172  BigEarthNet-S1-v1.0.tar.gz
$ cat BigEarthNet-S2-v1.0.tar.gz.md5sum 
5a64e9ce38deb036a435a7b59494924c  BigEarthNet-S2-v1.0.tar.gz

@github-actions github-actions bot added the datasets Geospatial or benchmark datasets label Jun 3, 2022
@adamjstewart
Copy link
Collaborator

I'm uncomfortable with the fact that we are downloading from http (not https), that the checksum has changed for no known reason, and that wget can't download this file without an error about expired certificates. Let me reach out to the BigEarthNet devs and see what is going on here.

@calebrob6
Copy link
Member

@adamjstewart, did you send an email? if not, then I can do it

@adamjstewart
Copy link
Collaborator

I did, no response yet.

adamjstewart
adamjstewart previously approved these changes Jun 11, 2022
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never got a response from the dataset authors, so I guess we'll never know why the checksum changed. Thanks for the PR! I'll switch most of our URLs from http to https in a separate PR.

@adamjstewart adamjstewart enabled auto-merge (squash) June 11, 2022 04:05
@adamjstewart adamjstewart disabled auto-merge June 11, 2022 04:10
Md5 value for file BigEarthNet-S1-v1.0.tar.gz seems like incorrect which causes code to download the whole data from scratch even though it exists.

$ cat BigEarthNet-S1-v1.0.tar.gz.md5sum 94ced73440dea8c7b9645ee738c5a172  BigEarthNet-S1-v1.0.tar.gz
$ cat BigEarthNet-S2-v1.0.tar.gz.md5sum 5a64e9ce38deb036a435a7b59494924c  BigEarthNet-S2-v1.0.tar.gz
@adamjstewart
Copy link
Collaborator

@calebrob6 can you download the dataset from the new https link and make sure you get the same checksum? I'm on wifi and it would take years.

@adamjstewart adamjstewart added this to the 0.2.2 milestone Jun 12, 2022
@calebrob6
Copy link
Member

Checking now, of note wget doesn't like the download:

ERROR: cannot verify bigearth.net's certificate, issued by ‘CN=DFN-Verein Global Issuing CA,OU=DFN-PKI,O=Verein zur Foerderung eines Deutschen Forschungsnetzes e. V.,C=DE’:
  Unable to locally verify the issuer's authority.
To connect to bigearth.net insecurely, use `--no-check-certificate'

@calebrob6
Copy link
Member

calebrob6 commented Jun 13, 2022

From the previous download I had sitting around (Nov 9th, 2021) is 94ced73440dea8c7b9645ee738c5a172. Just downloaded again from the link and also got 94ced73440dea8c7b9645ee738c5a172.

Seems like the initial md5sum we had was just wrong.

@adamjstewart adamjstewart merged commit 4b874df into microsoft:main Jun 13, 2022
@adamjstewart adamjstewart modified the milestones: 0.2.2, 0.3.0 Jul 2, 2022
@adamjstewart adamjstewart mentioned this pull request Jul 11, 2022
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
Md5 value for file BigEarthNet-S1-v1.0.tar.gz seems like incorrect which causes code to download the whole data from scratch even though it exists.

$ cat BigEarthNet-S1-v1.0.tar.gz.md5sum 94ced73440dea8c7b9645ee738c5a172  BigEarthNet-S1-v1.0.tar.gz
$ cat BigEarthNet-S2-v1.0.tar.gz.md5sum 5a64e9ce38deb036a435a7b59494924c  BigEarthNet-S2-v1.0.tar.gz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants