Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

features/110-split_output_randn: Adding the new code for random #114

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion heat/core/communication.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
except FileNotFoundError:
CUDA_AWARE_MPI = False


class Communication(metaclass=abc.ABCMeta):
@staticmethod
@abc.abstractmethod
Expand Down Expand Up @@ -62,6 +61,9 @@ def __init__(self, handle=MPI.COMM_WORLD):
def is_distributed(self):
return self.size > 1

def get_rank(self):
return self.handle.Get_rank()

def chunk(self, shape, split):
"""
Calculates the chunk of data that will be assigned to this compute node given a global data shape and a split
Expand Down
28 changes: 20 additions & 8 deletions heat/core/random.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,20 @@ def uniform(low=0.0, high=1.0, size=None, comm=MPI_WORLD):
return tensor(torch.Tensor(*size).uniform_(low, high), size, types.float32, None, comm)


def randn(*args, split=None, comm=MPI_WORLD):
def randn(*args, seed=None, split=None, comm=MPI_WORLD):
"""
Returns a tensor filled with random numbers from a standard normal distribution with zero mean and variance of one.

The shape of the tensor is defined by the varargs args.

Parameters
----------
d0, d1, …, dn : int, optional
d0, d1, …, dn : ints, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of the function is closely related to issue #54. What is the intended behaviour of this function? Output a number of random numbers where each of the split parts is random and not identical to the other split chunks of the other nodes? What will happen if I use the very same seed but a different node count? Will the random tensor differ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like torch's randn will not generate the same values for arrays of different sizes. This means that even though the same seed is used, the split tensor will not be the same as a tensor generated only on one process. i.e.

torch.randn((3, 3, 3))[0] != torch.randn((1, 3, 3))

One solution to this would be to generate the whole dataset then split it. But this will not scale. I do not see another way to do this.

It must be noted that the matrix generated by the ht.randn with a fixed seed (torch.manual_seed) and a fixed size produces a reproducible result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, we can leave it as proposed for now, however, for the future, I would want to actually have a different behaviour. Consider the following example:

ht.set_seed(1)
ht.randn(100, 5, 3, split=0)

Should always produce the same set of random numbers independent of the utilized nodes (for reproducibility reasons as you mentioned). This is exactly what is requested in issue #54. Obviously, this means that we would have to come up with a pseudo random generator allowing to skip to arbitrary/some fixed positions into the random sequence.

In the proposed fix, we would have a simplified randn() call. It will only provide reproducible results for the exact same node count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree. i am trying to figure a way to do this. i will move this discussion to the issue and close this request.

The dimensions of the returned array, should be all positive.

split : int, optional
axis on which to split the array across processes

Returns
-------
broadcast_shape : tuple of ints
Expand All @@ -55,22 +58,31 @@ def randn(*args, split=None, comm=MPI_WORLD):
[-1.8548, -1.2574, 0.2391, -0.3302],
[ 1.3365, -1.5212, 1.4159, -0.1671],
[ 0.1260, 1.2126, -0.0804, 0.0907]])

>>> ht.randn(4, 4, split=0)
(two processes)
tensor([[-1.1261, 0.5971, 0.2851, 0.9998],
[-1.8548, -1.2574, 0.2391, -0.3302]])
tensor([[ 1.3365, -1.5212, 1.4159, -0.1671],
[ 0.1260, 1.2126, -0.0804, 0.0907]])
"""
# check if all positional arguments are integers
if not all(isinstance(_, int) for _ in args):
raise TypeError('dimensions have to be integers')
if not all(_ > 0 for _ in args):
raise ValueError('negative dimension are not allowed')

gshape = tuple(args) if args else(1,)
gshape = tuple(args) if args else (1,)
split = stride_tricks.sanitize_axis(gshape, split)
_, lshape, _ = comm.chunk(gshape, split)
if seed:
if comm.get_rank() == 0:
torch.manual_seed(seed)
try:
torch.randn(gshape)
data = torch.randn(lshape)
except RuntimeError as exception:
# re-raise the exception to be consistent with numpy's exception interface
raise ValueError(str(exception))

# compose the local tensor
data = torch.randn(args)

return tensor(data, gshape, types.canonical_heat_type(data.dtype), split, MPI_WORLD)
# compose the local tensor/s
return tensor(data, gshape, types.canonical_heat_type(data.dtype), split, comm)
2 changes: 1 addition & 1 deletion heat/core/tensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -720,7 +720,7 @@ def __getitem__(self, key):
# TODO: test me
# TODO: sanitize input
# TODO: make me more numpy API complete
return tensor(self.__array[key], self.shape, self.split, self.__comm)
return tensor(self.__array[key], self.shape, self.dtype, self.split, self.__comm)

def __setitem__(self, key, value):
# TODO: document me
Expand Down
Loading