Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove compression from .nemo files #3626

Merged
merged 4 commits into from
Feb 8, 2022
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions nemo/core/connectors/save_restore_connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -384,14 +384,23 @@ def _inject_model_parallel_rank_for_ckpt(self, dirname, basename):
def _make_nemo_file_from_folder(filename, source_dir):
dirname = os.path.dirname(filename)
os.makedirs(dirname, exist_ok=True)
with tarfile.open(filename, "w:gz") as tar:
with tarfile.open(filename, "w:") as tar:
tar.add(source_dir, arcname=".")

@staticmethod
def _unpack_nemo_file(path2file: str, out_folder: str) -> str:
if not os.path.exists(path2file):
raise FileNotFoundError(f"{path2file} does not exist")
tar = tarfile.open(path2file, "r:gz")
# we start with an assumption of uncompressed tar,
# which should be true for versions 1.7.0 and above
tar_header = "r:"
try:
tar_test = tarfile.open(path2file, tar_header)
tar_test.close()
except tarfile.ReadError:
# can be older checkpoint => try compressed tar
tar_header = "r:gz"
tar = tarfile.open(path2file, tar_header)
tar.extractall(path=out_folder)
tar.close()
return out_folder
Expand Down