Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

features/110-split_output_randn: Adding the new code for random #114

Closed
wants to merge 5 commits into from

Conversation

coquelin77
Copy link
Member

see issue 110.

heads up @ClaudiaComito i mentioned this in the issue as well but this will change the tests for argmin, min, and max. This is because the random numbers generated on separate processes are not equal but they are the same every run.

@codecov-io
Copy link

codecov-io commented Feb 13, 2019

Codecov Report

Merging #114 into master will decrease coverage by 0.1%.
The diff coverage is 85.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #114      +/-   ##
==========================================
- Coverage   90.34%   90.23%   -0.11%     
==========================================
  Files          20       20              
  Lines        2735     2735              
==========================================
- Hits         2471     2468       -3     
- Misses        264      267       +3
Impacted Files Coverage Δ
heat/core/tensor.py 85.98% <0%> (ø) ⬆️
heat/core/tests/test_operations.py 100% <100%> (ø) ⬆️
heat/core/communication.py 78.46% <50%> (-0.91%) ⬇️
heat/core/random.py 81.48% <75%> (-6.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bcf571...797cd94. Read the comment docs.

@coquelin77 coquelin77 changed the title Adding the new code for random features/110-split_output_randn: Adding the new code for random Feb 13, 2019
Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

self.assertTrue((ht.argmin(random_data,axis=0)._tensor__array == random_data_split.argmin(axis=0)._tensor__array).all())
self.assertTrue((ht.argmin(random_data,axis=1)._tensor__array == random_data_split.argmin(axis=1)._tensor__array).all())
self.assertIsInstance(ht.argmin(random_data_split,axis=1),ht.tensor)
# todo: these tests now fail if random_data is split.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or fix test


self.assertTrue((ht.max(random_data,axis=0)._tensor__array[0] == random_data_split.max(axis=0)._tensor__array[0]).all())
self.assertTrue((ht.max(random_data,axis=1)._tensor__array[0] == random_data_split.max(axis=1)._tensor__array[0]).all())
# todo: these tests now fail if random_data is split.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or fix text

@@ -30,7 +30,7 @@ def randn(*args, split=None, comm=MPI_WORLD):

Parameters
----------
d0, d1, …, dn : int, optional
d0, d1, …, dn : ints, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of the function is closely related to issue #54. What is the intended behaviour of this function? Output a number of random numbers where each of the split parts is random and not identical to the other split chunks of the other nodes? What will happen if I use the very same seed but a different node count? Will the random tensor differ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like torch's randn will not generate the same values for arrays of different sizes. This means that even though the same seed is used, the split tensor will not be the same as a tensor generated only on one process. i.e.

torch.randn((3, 3, 3))[0] != torch.randn((1, 3, 3))

One solution to this would be to generate the whole dataset then split it. But this will not scale. I do not see another way to do this.

It must be noted that the matrix generated by the ht.randn with a fixed seed (torch.manual_seed) and a fixed size produces a reproducible result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, we can leave it as proposed for now, however, for the future, I would want to actually have a different behaviour. Consider the following example:

ht.set_seed(1)
ht.randn(100, 5, 3, split=0)

Should always produce the same set of random numbers independent of the utilized nodes (for reproducibility reasons as you mentioned). This is exactly what is requested in issue #54. Obviously, this means that we would have to come up with a pseudo random generator allowing to skip to arbitrary/some fixed positions into the random sequence.

In the proposed fix, we would have a simplified randn() call. It will only provide reproducible results for the exact same node count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree. i am trying to figure a way to do this. i will move this discussion to the issue and close this request.


self.assertTrue((ht.min(random_data,axis=0)._tensor__array[0] == random_data_split.min(axis=0)._tensor__array[0]).all())
self.assertTrue((ht.min(random_data,axis=1)._tensor__array[0] == random_data_split.min(axis=1)._tensor__array[0]).all())
# todo: these tests now fail if random_data is split.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or fix test

@coquelin77 coquelin77 closed this Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants