-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
e81b673
to
91a8e3b
Compare
every body,ci test is ok . |
ci/build_windows.py
Outdated
@@ -115,6 +117,7 @@ class BuildFlavour(Enum): | |||
'-DCUDA_ARCH_NAME=Manual ' | |||
'-DCUDA_ARCH_BIN=52 ' | |||
'-DCUDA_ARCH_PTX=52 ' | |||
'-DCUDA_ARCH_LIST=5.2 ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add CUDA_ARCH_LIST
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first class cuda use CUDA_ARCH_LIST choose or set arch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it works currently. I got this warning when testing it:
Manually-specified variables were not used by the project:
CUDA_ARCH_LIST
It is fixed as part of #17031
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried it again. It only works when using the Makefile generator (or apparently the windows generator as well). But doesn't work when using the ninja generator.
It's quite strange.
Where do you expect the variable to be used? Only at https://github.com/apache/incubator-mxnet/blob/8645b9a4a220940bdccba3aeee577eb645a86b34/cmake/FirstClassLangCuda.cmake#L109 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate on the motivation on this change?
now, the huge image size causes many troubles.etc large image can't laod. some compile error.
What do you mean?
now, the PIP build use compressed fatbin by hack nvcc
How would this change address the compressed fatbin? The compression was used to make the library size smaller. However, this PR will split the dll into multiple dlls. The total size that needs to be distributed on Pypi doesn't change or does it?
,but it will cause a lot of memory,and general users have difficulty compiling on their own.
What do you mean?
Thank you!
@leezu now imageSize is too large. Windows cannot load too large DLL.So we must try to reduce the imagesize size. |
@leezu The pip size will be solved by other methods. |
@yajiedesign do you mean on Windows 32 bit? Do we have any need to support 32 bit build? Unfortunately Microsoft doesn't seem to publish statistics about the number of users running Windows 32bit. But Steam does https://store.steampowered.com/hwsurvey/ and as of November 2019 less than 1% of users run Windows 7 32bit, Windows 10 32bit. So I think building for Windows 64 bit may be an easier solution? Regarding the pip size: Yes, I agree your PR doesn't address it. That's why I wonder why you cite pip size as motivation for this PR. You can edit the motivation and remove it, given that this PR is not related. |
Regarding the memory size, in my understanding if you use multiple processes, that all load the mxnet dll, the memory among them will be shared. |
@leezu the previously reduced size method broken dll shared. |
@leezu not 32 bit. 64bit dll are also image size limit.This is determined by PE file structure.Reference https://stackoverflow.com/questions/6976693/what-is-the-maximum-size-of-a-pe-file-on-64-bit-windows |
@leezu Pip is mentioned here because the old method is use to PIP build,and solved pip pkg size problem. |
Thanks @yajiedesign. It seems one of the root causes is that Windows does not support position independent code and PE format must always use 32bit relative addressing. The reason I ask you so many questions is that a) I'm concerned that not many people are familiar with Windows, so we need to document the reasoning that went into this patch b) fundamentally the problem you attempt to solve here also affects unix systems and it would be good to have a cross-platform solution. (Thanks to ELF superiority over Windows PE, I understand that the problem is more pressing on Windows and we may need to have a Windows specific intermediate solution at first) What do you think? Maybe @samskalicky's dynamic loading of operators effort can solve this in a cross-platform way? |
Yes, it is urgent under windows. The current scheme is private and has many problems |
@yajiedesign can you provide a link for the current scheme? Or is the code not available at all? Where does it run? Which scheme do you mean? Do you mean the code used to build the windows binary at https://pypi.org/project/mxnet-cu101/1.6.0b20190926/#files How large is the binary without the current private scheme? What happens without the private scheme? |
@leezu not current scheme link.it only on my build machine. It's complicated and dirty. |
Thanks for providing more details. I think it would be good to move the files introduced here from You can also consider making the warping feature optional in the Windows build, as it adds overhead and is not needed apart from the pip build that needs to include all cuda arches. |
d7611b3
to
ab1e279
Compare
ok. |
93b4c16
to
5a31b19
Compare
ci test ok |
tools/windowsbuild/warp_dll.cpp
Outdated
|
||
int find_version() | ||
{ | ||
int known_sm[] = { 30,35,37,50,52,60,61,70,75 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make that dynamic so we don't have to update it in the future?
@marcoabreu done |
can we merge? |
add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll
8c7043f
to
fc27ce9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, thanks!
ci test ok. |
ping |
@yajiedesign Could you make a PR with backported changes needed for Windows build to 1.6 branch as you mentioned in the mail to dev@? |
@ptrendx ok |
Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures. Previously cuda architecture setting was partially broken and different options were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN CUDA_ARCH_PTX and CUDA_ARCH_LIST). Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA functionality for finding the cuda toolkit include directories and libraries. Workaround for DLL size limitation on Windows (apache#16980) * change windows build system. add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll
Switch to modern CMake CUDA handling (apache#17031) Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures. Previously cuda architecture setting was partially broken and different options were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN CUDA_ARCH_PTX and CUDA_ARCH_LIST). Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA functionality for finding the cuda toolkit include directories and libraries. Workaround for DLL size limitation on Windows (apache#16980) * change windows build system. add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll
…ranch Fix CUDNN detection for CMake build (apache#17019) Replace mxnet_option macro with standard CMAKE_DEPENDENT_OPTION (apache#17018) Switch to modern CMake CUDA handling (apache#17031) Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures. Previously cuda architecture setting was partially broken and different options were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN CUDA_ARCH_PTX and CUDA_ARCH_LIST). Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA functionality for finding the cuda toolkit include directories and libraries. Workaround for DLL size limitation on Windows (apache#16980) * change windows build system. add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll
Fix CUDNN detection for CMake build (#17019) Replace mxnet_option macro with standard CMAKE_DEPENDENT_OPTION (#17018) Switch to modern CMake CUDA handling (#17031) Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures. Previously cuda architecture setting was partially broken and different options were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN CUDA_ARCH_PTX and CUDA_ARCH_LIST). Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA functionality for finding the cuda toolkit include directories and libraries. Workaround for DLL size limitation on Windows (#16980) * change windows build system. add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll Co-authored-by: Leonard Lausen <[email protected]>
这个issue的修改可能会导致cpp_package的OpWrapperGenerator无法正常生成op.h文件, 错误码127。原因是warp_dll.cpp文件的119行HMODULE hm = LoadLibraryW(dll_name);使用了相对路径。调用OpWrapperGenerator时的dll搜索路径不是libmxnet.dll的所在目录,导致这段代码找不到依赖的dll库。这里是不是使用绝对路径更好一些。 This pr will cause the "OpwrapperGenerator.py" failed to generate "op.h" with error code 127, while building with cpp_package. The reason is use of relative dll path at "HMODULE hm = LoadLibraryW(dll_name);" in the new added file warp_dll.cpp, Line 119. "OpWrapperGenerator.py" 's path is different from libmxnet.dll, thus may cause the 127 error while calling the generator script, which infers that sub-dll's load failure in libmxnet.dll. If it's better to use absolute path here, or other strategy for solving this. dll search path reference: https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order |
Description
now, the huge image size causes many troubles.etc large image can't laod. some compile error.
now, the PIP build use compressed fatbin by hack nvcc,but it will cause a lot of memory,and general users have difficulty compiling on their own.
this pr change build system,it spilt different cuda arch to different dll.etc mxnet_30.mxnet_50.Then use a warpdll to automatically identify the gpu sm version. Load the corresponding dll. And forward all calls.
this pr also contains tools to automatically generate warp dll and will be executed automatically during compilation.