-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO: Replaced factories.arraay() with DNDarray #951
Conversation
GPU cluster tests are currently disabled on this Pull Request. |
CodeSee Review Map:Review in an interactive map View more CodeSee Maps Legend |
@ClaudiaComito @mtar Please tell if any changes are required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shahpratham thanks a lot for jumping in and well done, the use cases are correct, we want to save us some communication time that arises with factories.array()
when is_split
is not None.
However gshape
is the global shape of the DNDarray, and you cannot derive balanced
from the process-local torch tensor. More in the comments below.
heat/core/io.py
Outdated
local_tensor, dtype=dtype, is_split=0, device=device, comm=comm | ||
resulting_tensor = DNDarray( | ||
local_tensor, | ||
gshape=tuple(local_tensor.shape), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gshape
is supposed to be the global shape of the memory-distributed array resulting_tensor
. Here you're setting it to the shape of local_tensor
, which is basically a slice of the global array.
If we have all the information we need to calculate gshape
without communication among processes, then we can call DNDarray(...)
, otherwise we need to use factories.array()
and that will take care of the comm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, understood. So can I make a copy of that tensor before slicing it and pass gshape=tuple(local_tensor_copy.shape)
to it?
heat/core/io.py
Outdated
split=0, | ||
device=device, | ||
comm=comm, | ||
balanced=local_tensor.is_balanced, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_tensor
is a torch.Tensor
. It doesn't "know" about being a slice of a larger distributed array.
is_balanced()
is a method of the DNDarray
class, it has to do with whether the memory-distributed DNDarray is distributed evenly among the available processes.
If we cannot asses load balance of the output DNDarray (I'm not sure that's the case here, I haven't checked), we set balanced = None
.
heat/core/io.py
Outdated
resulting_tensor = factories.array(data, dtype=dtype, is_split=1, device=device, comm=comm) | ||
resulting_tensor = DNDarray( | ||
data, | ||
gshape=tuple(data.shape), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
heat/core/io.py
Outdated
split=1, | ||
device=device, | ||
comm=comm, | ||
balanced=data.is_balanced, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
Hey @ClaudiaComito, I have made some changes, kindly review. |
local_tensor, dtype=dtype, is_split=0, device=device, comm=comm | ||
resulting_tensor = DNDarray( | ||
local_tensor, | ||
gshape=local_shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shahpratham gshape
is the global shape, not the local shape.
This function reads an array that might be, say, (1billion x 1000) in size (just making the size up). The data will be distributed on many processes, i.e. each process will read only a specific subset of lines (if split=0) or columns (if split=1) out of that file, and store them in local_tensor
.
So local_tensor
is the process-local slice of data. Its shape, local_shape
may vary (depending on number of processes for example. See communication.chunk()
). But the global shape gshape
will always be (1billion x 1000).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the global shape gshape will always be (1billion x 1000).
Yes, so I need to get global shape of the csv file, after reading it, like how its done in load_hdf5
function(on line 121) and in load_netcdf
function (on line 334), right?
@shahpratham now you know everything about Heat - let's merge this PR, can you update? Thanks! |
yes, sorry I forgot about this. |
This is addressed in #1089, closing |
Description
Replaced factories.array with DNDarray in
io.py
Issue/s resolved: #797
Changes proposed:
Type of change
enhancement
Due Diligence
Does this change modify the behaviour of other functions? If so, which?
no