-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to Unified Virtual Address memory copies #555
Conversation
Cool, looks good
Yep, I think we should only ever call memcpy/cudaMemcpy inside the caffe_copy wrappers to abstract away the couple of details ( |
^ Agreed. I'll take care of these in this PR and we can benchmark against |
|
@shelhamer vote for moving all |
All the Performance as measured by |
All the |
Will it always be ok to use |
There is a need for a CPU-only build and for abstracting whatever sort of device Caffe is executing on (for OpenCL support as you mentioned). I did a What do you think? |
@@ -59,6 +59,8 @@ void caffe_gpu_axpby(const int N, const Dtype alpha, const Dtype* X, | |||
template <typename Dtype> | |||
void caffe_copy(const int N, const Dtype *X, Dtype *Y); | |||
|
|||
void caffe_copy(const size_t N, const void *X, void *Y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about this overloading -- it seems a little hard to figure out whether this generic version or the Dtype version is being called since each argument of the function signature is a static_cast away from the Dtype version, and then if you accidentally call this one, the size of the copy will probably be wrong. Google style guide recommends only overloading functions when it's very clear which version will be called [1]. I'd prefer if this had some other name (caffe_void_copy
? not sure).
[1] http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Function_Overloading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point I had the void version called "caffe_transfer" since it is
only used by SyncedMem to go from CPU to GPU and back. Perhaps that's still
not a good name, since it is a copy and the source stays where it is.
I'll try to think of a name, or if you come up with one you like then
commit it.
Le samedi 28 juin 2014, Jeff Donahue [email protected] a écrit :
In include/caffe/util/math_functions.hpp:
@@ -59,6 +59,8 @@ void caffe_gpu_axpby(const int N, const Dtype alpha, const Dtype* X,
template
void caffe_copy(const int N, const Dtype *X, Dtype *Y);+void caffe_copy(const size_t N, const void *X, void *Y);
I'm a little worried about this overloading -- it seems a little hard to
figure out whether this generic version or the Dtype version is being
called since each argument of the function signature is a static_cast away
from the Dtype version, and then if you accidentally call this one, the
size of the copy will probably be wrong. Google style guide recommends only
overloading functions when it's very clear which version will be called
[1]. I'd prefer if this had some other name (caffe_void_copy? not sure).[1]
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Function_Overloading—
Reply to this email directly or view it on GitHub
https://github.com/BVLC/caffe/pull/555/files#r14324838.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided on caffe_memcpy
to distinguish it from caffe_copy
and because it has exactly the same purpose as normal memcpy
.
K, I agree the cudaMemcpy seems reasonable |
Alright, this is done. @jeffdonahue @sguada please take a look for merge. |
nice, I like the name |
In the newly created branch
The implementations of |
Host / device copies are distinguished by the virtual address of the pointers instead of explicit memcpy modes.
Do all memory copies by `cudaMemcpy` in UVA mode so that the same `caffe_copy()` interface works for all transfers. `cudaMemcpy()` is used in lieu of BLAS copies because they do not understand UVA. Drop the now unnecessary `caffe_gpu_copy()` since location of the pointers is now irrelevant to the interface.
...except for `SyncedMem` since it has no type.
Switch to Unified Virtual Address memory copies
Switch to Unified Virtual Address memory copies
Switch to Unified Virtual Address memory copies
CUDA Unified Virtual Addressing makes host-device, device-host, and device-device communication transparent by distinguishing the cases through virtual addresses of the pointers. Switching to this mode is intended as a useful abstraction for parallelism so that blob data can be transferred by the same interface regardless of source and destination.
Here all
cudaMemcpy
calls are switched tocudaMemcpyDefault
mode used by virtual addressing.A counter-argument is that this makes host / device communication less explicit and perhaps confusing. However, the pointers and our practice of
{cpu,gpu}
prefixes keep this clear. Provided we continue in the direction of device abstraction, needing to explicitly reference cpu or gpu operation should go away on its own with the exception of data layers and future host / device parallelism.To standardize the interface all
memcpy
are replaced bycaffe_copy
and allmemset
bycaffe_set
andcaffe_gpu_set
except forSyncedMem
where it's awkward. Note this meldscaffe_gpu_copy
intocaffe_copy
now that addressing is virtual.