-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime Error in cdist #601
Comments
Uh, interessting. I will have a look at it, thanks for raising this |
Ok, had a look into it... It is arguable whether this is a bug. However, we should discuss whether the distance functions should allow for unbalanced arrays. This would require additional communication overhead though. @Markus-Goetz what do you say? |
I think calling balance is the way to go here, because we also want to ensure that computations are more or less equally distributed. The create folds function could be improved here. Instead of taking a global slice, leaving the last processes empty handed, we could simply do a local slice, and sticht the results back together into a global DNDarray. This way we also avoid heavy communication. |
That is what @Inzlinger is doing actually (calling the |
Sorry my bad. Did not look into it deep enough. In this case calling Correct about the balanced in, balanced out part. |
Brilliant, than I can close this.
Also, your predict function in knn.py has a balancing problem, l.65 cause it because the slicing results in some processes not having any more data. But I didn't look further into that |
Description
When running cdist on distributed tensors a runtime error occcurs.
Traceback (most recent call last):
File "demo_knn.py", line 131, in
print(verify_algorithm(X, Y, 5, 30, 5))
File "demo_knn.py", line 127, in verify_algorithm
result_y = classifier.predict(verification_x)
File "/home/jakob/neural/heat/heat/classification/knn.py", line 44, in predict
distances = ht.spatial.cdist(X, self.x)
File "/home/jakob/neural/heat/heat/spatial/distance.py", line 128, in cdist
return _dist(X, Y, _euclidian)
File "/home/jakob/neural/heat/heat/spatial/distance.py", line 383, in _dist
d._DNDarray__array[:, cols[0] : cols[1]] = d_ij
RuntimeError: The expanded size of the tensor (8) must match the existing size (11) at non-singleton dimension 1. Target sizes: [30, 8]. Tensor sizes: [27, 11]
To Reproduce
On branch https://github.com/helmholtz-analytics/heat/tree/features/556-assign_label
run "mpirun -n 4 demo_knn.py " (in folder heat/examples/classification)
Version Info
On branch https://github.com/helmholtz-analytics/heat/tree/features/556-assign_label
The text was updated successfully, but these errors were encountered: