Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO: Replaced factories.arraay() with DNDarray #951

Closed
wants to merge 6 commits into from

Conversation

shahpratham
Copy link
Collaborator

Description

Replaced factories.array with DNDarray in io.py
Issue/s resolved: #797

Changes proposed:

  • factories.array -> dndarray.DNDarray

Type of change

enhancement

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

@mtar
Copy link
Collaborator

mtar commented Apr 2, 2022

GPU cluster tests are currently disabled on this Pull Request.

@ghost
Copy link

ghost commented Apr 2, 2022

CodeSee Review Map:

Review these changes using an interactive CodeSee Map

Review in an interactive map

View more CodeSee Maps

Legend

CodeSee Map Legend

@shahpratham
Copy link
Collaborator Author

@ClaudiaComito @mtar Please tell if any changes are required.

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shahpratham thanks a lot for jumping in and well done, the use cases are correct, we want to save us some communication time that arises with factories.array() when is_split is not None.

However gshape is the global shape of the DNDarray, and you cannot derive balanced from the process-local torch tensor. More in the comments below.

heat/core/io.py Outdated
local_tensor, dtype=dtype, is_split=0, device=device, comm=comm
resulting_tensor = DNDarray(
local_tensor,
gshape=tuple(local_tensor.shape),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gshape is supposed to be the global shape of the memory-distributed array resulting_tensor. Here you're setting it to the shape of local_tensor, which is basically a slice of the global array.

If we have all the information we need to calculate gshape without communication among processes, then we can call DNDarray(...), otherwise we need to use factories.array() and that will take care of the comm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, understood. So can I make a copy of that tensor before slicing it and pass gshape=tuple(local_tensor_copy.shape) to it?

heat/core/io.py Outdated
split=0,
device=device,
comm=comm,
balanced=local_tensor.is_balanced,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_tensor is a torch.Tensor. It doesn't "know" about being a slice of a larger distributed array.

is_balanced() is a method of the DNDarray class, it has to do with whether the memory-distributed DNDarray is distributed evenly among the available processes.

If we cannot asses load balance of the output DNDarray (I'm not sure that's the case here, I haven't checked), we set balanced = None.

heat/core/io.py Outdated
resulting_tensor = factories.array(data, dtype=dtype, is_split=1, device=device, comm=comm)
resulting_tensor = DNDarray(
data,
gshape=tuple(data.shape),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

heat/core/io.py Outdated
split=1,
device=device,
comm=comm,
balanced=data.is_balanced,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@shahpratham
Copy link
Collaborator Author

Hey @ClaudiaComito, I have made some changes, kindly review.

local_tensor, dtype=dtype, is_split=0, device=device, comm=comm
resulting_tensor = DNDarray(
local_tensor,
gshape=local_shape,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shahpratham gshape is the global shape, not the local shape.
This function reads an array that might be, say, (1billion x 1000) in size (just making the size up). The data will be distributed on many processes, i.e. each process will read only a specific subset of lines (if split=0) or columns (if split=1) out of that file, and store them in local_tensor.

So local_tensor is the process-local slice of data. Its shape, local_shape may vary (depending on number of processes for example. See communication.chunk() ). But the global shape gshape will always be (1billion x 1000).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the global shape gshape will always be (1billion x 1000).

Yes, so I need to get global shape of the csv file, after reading it, like how its done in load_hdf5 function(on line 121) and in load_netcdf function (on line 334), right?

@ClaudiaComito
Copy link
Contributor

@shahpratham now you know everything about Heat - let's merge this PR, can you update? Thanks!

@shahpratham
Copy link
Collaborator Author

yes, sorry I forgot about this.
I have created a new PR #1089 , due to some issue with my previous branch.

@ClaudiaComito ClaudiaComito added this to the Repo Clean-Up milestone Jul 31, 2023
@ClaudiaComito
Copy link
Contributor

This is addressed in #1089, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Module io: replace factories.array() with DNDarray construct where possible
3 participants