Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eta.core.web.download_file() failing #620

Closed
rohis06-aws opened this issue Feb 16, 2024 · 6 comments
Closed

eta.core.web.download_file() failing #620

rohis06-aws opened this issue Feb 16, 2024 · 6 comments

Comments

@rohis06-aws
Copy link
Contributor

rohis06-aws commented Feb 16, 2024

Of late, I have noticed that eta.core.web.download_file() is failing while downloading large files. For example: eta.core.web.download_file("http://data.csail.mit.edu/places/places365/test_256.tar", path=<my-path>)

This especially happens when the file is being downloaded in multiple chunks.

Output using eta.core.web.download_file():

Downloading test split from http://data.csail.mit.edu/places/places365/test_256.tar to /Users/<user>/fiftyone/places/test/data
  23% |█████████████████████████/--------------------------------------------------------------------------------------|    8.0Gb/35.3Gb [6.1m elapsed, 20.3m remaining, 23.9Mb/s]

Here's the wget output for the same:

wget http://data.csail.mit.edu/places/places365/test_256.tar
--2024-02-15 00:09:49--  http://data.csail.mit.edu/places/places365/test_256.tar
Resolving [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))... 128.52.131.233
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://data.csail.mit.edu/places/places365/test_256.tar [following]
--2024-02-15 00:09:49--  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4736829440 (4.4G) [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                    22%[=====================>                                                                              ]   1.01G  11.4MB/s    in 90s
2024-02-15 00:11:20 (11.5 MB/s) - Connection closed at byte 1085026092. Retrying.
--2024-02-15 00:11:21--  (try: 2)  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4736829440 (4.4G), 3651803348 (3.4G) remaining [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                    45%[++++++++++++++++++++++======================>                                                       ]   2.02G  14.0MB/s    in 90s
2024-02-15 00:12:52 (11.4 MB/s) - Connection closed at byte 2170783988. Retrying.
--2024-02-15 00:12:54--  (try: 3)  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4736829440 (4.4G), 2566045452 (2.4G) remaining [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                    68%[+++++++++++++++++++++++++++++++++++++++++++++======================>                                ]   3.03G  10.9MB/s    in 96s
2024-02-15 00:14:31 (10.8 MB/s) - Connection closed at byte 3256345900. Retrying.
--2024-02-15 00:14:34--  (try: 4)  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4736829440 (4.4G), 1480483540 (1.4G) remaining [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                    91%[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++======================>         ]   4.04G  14.1MB/s    in 94s
2024-02-15 00:16:08 (11.0 MB/s) - Connection closed at byte 4342093292. Retrying.
--2024-02-15 00:16:12--  (try: 5)  https://data.csail.mit.edu/places/places365/test_256.tar
Connecting to [data.csail.mit.edu](http://data.csail.mit.edu/) ([data.csail.mit.edu](http://data.csail.mit.edu/))|128.52.131.233|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 4736829440 (4.4G), 394736148 (376M) remaining [application/x-tar]
Saving to: ‘test_256.tar’
test_256.tar                                   100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++========>]   4.41G  13.4MB/s    in 31s
2024-02-15 00:16:44 (12.2 MB/s) - ‘test_256.tar’ saved [4736829440/4736829440]

Any help here would be appreciated.

@rohis06-aws rohis06-aws changed the title eta.core.web_download_file() failing eta.core.web.download_file() failing Feb 16, 2024
@rohis06-aws rohis06-aws changed the title eta.core.web.download_file() failing eta.core.web.download_file() failing Feb 16, 2024
@swheaton
Copy link
Contributor

Sorry, can you explain the failure? Does it get stuck at 23%? From your screenshot it just looks like it's in progress.

@rohis06-aws
Copy link
Contributor Author

Yes, that's correct! It essentially terminates at 23%, and the control falls to the following statement in the code.

@rohis06-aws
Copy link
Contributor Author

@swheaton, kindly let me know if you need any other details to debug the issue.

@swheaton
Copy link
Contributor

@rohis06 it seems that we just don't handle a 206 partial content response. I don't know if there is a simple resolution to it. If you'd like to look into supporting this mode of operation within eta.core.web.download_file() that would be awesome and the fastest path to resolution!
Otherwise it seems too niche of a use case to be prioritized for the core team, unfortunately.

I believe you are trying to add the Places dataset to the fiftyone zoo? (Appreciate it!) Perhaps @jacobmarks has ideas for workarounds to this issue with downloading the data?

@rohis06-aws
Copy link
Contributor Author

@swheaton, that makes sense. I'd be glad to explore how to support this mode of operation within eta.core.web.download_file()!

Yes, I'm attempting to add the Places dataset to the fiftyone Zoo! :)
Certainly, I'll reach out to @jacobmarks. One thing I'd like to bring to your attention is that the failure to download the Places dataset doesn't occur consistently. It only happens when the internet speed is <150Mbps. Otherwise, the download proceeds smoothly.

brimoor added a commit that referenced this issue Mar 1, 2024
Added support for handling 206 response in `eta.core.web.download_file()` - Fix for Issue #620
@brimoor
Copy link
Contributor

brimoor commented Mar 2, 2024

Resolved by #621

@brimoor brimoor closed this as completed Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants