Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change root directory for datasets #1740

Merged
merged 1 commit into from
May 25, 2022
Merged

Conversation

parmeet
Copy link
Contributor

@parmeet parmeet commented May 25, 2022

Following conversations in this post internally with @NicolasHug , this is the first step to change the root directory of datasets.

Follow-up:

  • Enable setting root directory programmatically once the APIs are aligned.

Copy link
Contributor

@vcm2114 vcm2114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parmeet, that's a great direction to take!
LGTM

Copy link
Contributor

@Nayef211 Nayef211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just to clarify, is ~/.cache/torch the base directory used by the core PyTorch team to store downloaded assets? And now each domain will have its own subdirectory within that base directory?

@parmeet
Copy link
Contributor Author

parmeet commented May 25, 2022

LGTM. Just to clarify, is ~/.cache/torch the base directory used by the core PyTorch team to store downloaded assets? And now each domain will have its own subdirectory within that base directory?

That's right. For instance hub base dir is ~/.cache/torch/hub

@parmeet parmeet merged commit 70fc104 into pytorch:main May 25, 2022
@parmeet parmeet deleted the datasets_home branch May 25, 2022 17:45
@@ -4,7 +4,7 @@
from torchtext import _extension # noqa: F401

_TEXT_BUCKET = "https://download.pytorch.org/models/text/"
_CACHE_DIR = os.path.expanduser("~/.torchtext/cache")
_CACHE_DIR = os.path.expanduser("~/.cache/torch/text")
Copy link
Contributor

@mthrok mthrok May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~/.cache/torch is just a default path of TORCH_HOME, which is governed by PyTorch core.
If users set TORCH_HOME to different place, this torchtext cache directory won't be affected. There is a pros and cons on either behavior. If If domain libraries are piggy-backing on it, and the change is being made for the consistency, shouldn't that be accounted? Meaning that did you settle on how this should interact with TORCH_HOME?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mthrok for the catch. I missed to take TORCH_HOME into account in this PR :(. Let me make a quick fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants