Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fails for -DUSE_CUDA=1 #5785

Closed
jmakov opened this issue Mar 15, 2023 · 10 comments
Closed

Build fails for -DUSE_CUDA=1 #5785

jmakov opened this issue Mar 15, 2023 · 10 comments

Comments

@jmakov
Copy link

jmakov commented Mar 15, 2023

Description

#5089 is marked as resolved but this is still the case trying to build in RAPIDS Docker container:

#0 153.8 /usr/include/c++/11/bits/std_function.h:435:145: note:         '_ArgTypes'
#0 153.8 /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with '...':
#0 153.8   530 |         operator=(_Functor&& __f)
#0 153.8       |                                                                                                                                                  ^ 
#0 153.8 /usr/include/c++/11/bits/std_function.h:530:146: note:         '_ArgTypes'
#0 154.6 make[2]: *** [CMakeFiles/lightgbm_objs.dir/build.make:734: CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_best_split_finder.cu.o] Error 1
#0 154.6 make[1]: *** [CMakeFiles/Makefile2:257: CMakeFiles/lightgbm_objs.dir/all] Error 2
#0 154.6 make: *** [Makefile:136: all] Error 2

Reproducible example

Environment info

LightGBM version or commit hash:

Command(s) you used to install LightGBM

mkdir /tmp/lib && cd /tmp/lib  \
    && git clone --recursive https://github.com/microsoft/LightGBM \
    && mkdir /tmp/lib/LightGBM/build && cd /tmp/lib/LightGBM/build \
    && cmake -DUSE_CUDA=1 .. && make -j \
    && pip uninstall -y lightgbm \
    && cd ../python-package/ && python setup.py install --precompile

Build in docker FROM rapidsai/rapidsai-core:23.02-cuda11.8-runtime-ubuntu22.04-py3.10
GCC 11.3

Additional Comments

@shiyu1994
Copy link
Collaborator

@jmakov Is it possible to see more error message? For example, why the compilation of cuda_best_split_finder.cu fail?

@jmakov
Copy link
Author

jmakov commented Mar 16, 2023

@shiyu1994 there seems to be only 1 type of error:

/tmp/lib/LightGBM/include/LightGBM/utils/../../../external_libs/fmt/include/fmt/format-inl.h(85): here                                             
                                                                                                                                                   
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with '...':                                                   
  435 |         function(_Functor&& __f)                                                                                                           
      |                                                                                                                                            
     ^                                                                                                                                             
/usr/include/c++/11/bits/std_function.h:435:145: note:         '_ArgTypes'                                                                         
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with '...':                                                   
  530 |         operator=(_Functor&& __f)                                                                                                          
      |                                                                                                                                            
      ^                                                                                                                                            
/usr/include/c++/11/bits/std_function.h:530:146: note:         '_ArgTypes'  

whole log:
build_fail.txt

@jmakov
Copy link
Author

jmakov commented Mar 31, 2023

This is kinda a blocker for me. Would be great to have some more insight into what can be done about it.

@jameslamb jameslamb added the bug label Jun 26, 2023
@domtisdell
Copy link

domtisdell commented Jul 19, 2023

I've been having similar problems I think when trying to install v4.0. Builds were failing until I switched gcc (and g++ for good measure) to version 10 for compiling.

Found solution from this reference: NVIDIA/nccl#650

@jameslamb
Copy link
Collaborator

jameslamb commented Apr 24, 2024

Sorry for the long delay in response. I believe recent changes in LightGBM have fixed this.

I was able to build latest LightGBM (1443548) in the latest stable rapidsai/base image.

(rapidsai/rapidsai-core images were removed as part of rapidsai/docker#539)

docker run \
    --rm \
    --user root \
    -it rapidsai/base:24.04-cuda12.0-py3.10 \
    bash

mkdir /tmp/lib
cd /tmp/lib 

# install build tools (rapidsai/core doesn't ship these)
apt-get update
apt-get install -y \
    build-essential \
    cmake \
    git

# build LightGBM
git clone --recursive https://github.com/microsoft/LightGBM

cd ./LightGBM
cmake -B build -S . -DUSE_CUDA=1
cmake --build build --target _lightgbm -j2
sh build-python.sh install --precompile

That built successfully for me.

full logs (click me)

Configure step:

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The CUDA compiler identification is NVIDIA 12.0.76
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/conda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found CUDA: /opt/conda/targets/sbsa-linux (found suitable version "12.0", minimum required is "11.0")
-- CMAKE_CUDA_FLAGS:  -Xcompiler=-fopenmp -Xcompiler=-fPIC -Xcompiler=-Wall -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -O3 -lineinfo
-- ALLFEATS_DEFINES: -DPOWER_FEATURE_WORKGROUPS=12;-DUSE_CONSTANT_BUF=0;-DENABLE_ALL_FEATURES
-- FULLDATA_DEFINES: -DPOWER_FEATURE_WORKGROUPS=12;-DUSE_CONSTANT_BUF=0;-DENABLE_ALL_FEATURES;-DIGNORE_INDICES
-- Performing Test MM_PREFETCH
-- Performing Test MM_PREFETCH - Failed
-- Performing Test MM_MALLOC
-- Performing Test MM_MALLOC - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/lib/LightGBM/build

Build step:

[  1%] Building CUDA object CMakeFiles/histo_16_64_256_sp.dir/src/treelearner/kernels/histogram_16_64_256.cu.o
[  2%] Building CUDA object CMakeFiles/histo_16_64_256-fulldata_sp.dir/src/treelearner/kernels/histogram_16_64_256.cu.o
[  2%] Built target histo_16_64_256-fulldata_sp
[  2%] Built target histo_16_64_256_sp
[  4%] Building CXX object CMakeFiles/lightgbm_capi_objs.dir/src/c_api.cpp.o
[  5%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/boosting.cpp.o
[  6%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt.cpp.o
[  8%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt_model_text.cpp.o
[  8%] Built target lightgbm_capi_objs
[  9%] Building CUDA object CMakeFiles/histo_16_64_256_sp_const.dir/src/treelearner/kernels/histogram_16_64_256.cu.o
[ 10%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/gbdt_prediction.cpp.o
[ 12%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/prediction_early_stop.cpp.o
[ 13%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/sample_strategy.cpp.o
[ 13%] Built target histo_16_64_256_sp_const
[ 14%] Building CUDA object CMakeFiles/histo_16_64_256-fulldata_sp_const.dir/src/treelearner/kernels/histogram_16_64_256.cu.o
[ 16%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/bin.cpp.o
[ 16%] Built target histo_16_64_256-fulldata_sp_const
[ 17%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/config.cpp.o
[ 18%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/config_auto.cpp.o
[ 20%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/dataset.cpp.o
[ 21%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/dataset_loader.cpp.o
[ 22%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/file_io.cpp.o
[ 24%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/json11.cpp.o
[ 25%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/metadata.cpp.o
[ 27%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/parser.cpp.o
[ 28%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/train_share_states.cpp.o
[ 29%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/tree.cpp.o
[ 31%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/dcg_calculator.cpp.o
[ 32%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/metric.cpp.o
[ 33%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linker_topo.cpp.o
[ 35%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linkers_mpi.cpp.o
[ 36%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/linkers_socket.cpp.o
[ 37%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/network/network.cpp.o
[ 39%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/objective_function.cpp.o
[ 40%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/data_parallel_tree_learner.cpp.o
[ 41%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/feature_histogram.cpp.o
[ 43%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/feature_parallel_tree_learner.cpp.o
[ 44%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/gpu_tree_learner.cpp.o
[ 45%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/gradient_discretizer.cpp.o
[ 47%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/linear_tree_learner.cpp.o
In file included from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Core:214,
                 from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Dense:1,
                 from /tmp/lib/LightGBM/src/treelearner/linear_tree_learner.cpp:7:
/tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/arch/NEON/PacketMath.h: In function 'Packet Eigen::internal::pload(const typename Eigen::internal::unpacket_traits<T>::type*) [with Packet = Eigen::internal::eigen_packet_wrapper<int, 2>; typename Eigen::internal::unpacket_traits<T>::type = signed char]':
/tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/arch/NEON/PacketMath.h:1671:9: warning: 'void* memcpy(void*, const void*, size_t)' copying an object of non-trivial type 'Eigen::internal::Packet4c' {aka 'struct Eigen::internal::eigen_packet_wrapper<int, 2>'} from an array of 'const int8_t' {aka 'const signed char'} [-Wclass-memaccess]
 1671 |   memcpy(&res, from, sizeof(Packet4c));
      |   ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Core:172,
                 from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Dense:1,
                 from /tmp/lib/LightGBM/src/treelearner/linear_tree_learner.cpp:7:
/tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/GenericPacketMath.h:159:8: note: 'Eigen::internal::Packet4c' {aka 'struct Eigen::internal::eigen_packet_wrapper<int, 2>'} declared here
  159 | struct eigen_packet_wrapper
      |        ^~~~~~~~~~~~~~~~~~~~
In file included from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Core:214,
                 from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Dense:1,
                 from /tmp/lib/LightGBM/src/treelearner/linear_tree_learner.cpp:7:
/tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/arch/NEON/PacketMath.h: In function 'Packet Eigen::internal::ploadu(const typename Eigen::internal::unpacket_traits<T>::type*) [with Packet = Eigen::internal::eigen_packet_wrapper<int, 2>; typename Eigen::internal::unpacket_traits<T>::type = signed char]':
/tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/arch/NEON/PacketMath.h:1716:9: warning: 'void* memcpy(void*, const void*, size_t)' copying an object of non-trivial type 'Eigen::internal::Packet4c' {aka 'struct Eigen::internal::eigen_packet_wrapper<int, 2>'} from an array of 'const int8_t' {aka 'const signed char'} [-Wclass-memaccess]
 1716 |   memcpy(&res, from, sizeof(Packet4c));
      |   ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Core:172,
                 from /tmp/lib/LightGBM/external_libs/eigen/Eigen/Dense:1,
                 from /tmp/lib/LightGBM/src/treelearner/linear_tree_learner.cpp:7:
/tmp/lib/LightGBM/external_libs/eigen/Eigen/src/Core/GenericPacketMath.h:159:8: note: 'Eigen::internal::Packet4c' {aka 'struct Eigen::internal::eigen_packet_wrapper<int, 2>'} declared here
  159 | struct eigen_packet_wrapper
      |        ^~~~~~~~~~~~~~~~~~~~
[ 48%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/serial_tree_learner.cpp.o
[ 50%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/tree_learner.cpp.o
[ 51%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/voting_parallel_tree_learner.cpp.o
[ 52%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/utils/openmp_wrapper.cpp.o
[ 54%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cpp.o
[ 55%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/boosting/cuda/cuda_score_updater.cu.o
[ 56%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/cuda/cuda_algorithms.cu.o
[ 58%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/cuda/cuda_utils.cpp.o
[ 59%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_column_data.cpp.o
[ 60%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_column_data.cu.o
[ 62%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_metadata.cpp.o
[ 63%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_row_data.cpp.o
[ 64%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_tree.cpp.o
[ 66%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/io/cuda/cuda_tree.cu.o
[ 67%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/cuda/cuda_binary_metric.cpp.o
[ 68%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/cuda/cuda_pointwise_metric.cpp.o
[ 70%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/metric/cuda/cuda_pointwise_metric.cu.o
[ 71%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/metric/cuda/cuda_regression_metric.cpp.o
[ 72%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_binary_objective.cpp.o
[ 74%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_binary_objective.cu.o
[ 75%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_multiclass_objective.cpp.o
[ 77%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_multiclass_objective.cu.o
[ 78%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_rank_objective.cpp.o
[ 79%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_rank_objective.cu.o
[ 81%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_regression_objective.cpp.o
[ 82%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/objective/cuda/cuda_regression_objective.cu.o
[ 83%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_best_split_finder.cpp.o
[ 85%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_best_split_finder.cu.o
[ 86%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_data_partition.cpp.o
[ 87%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_data_partition.cu.o
[ 89%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_gradient_discretizer.cu.o
[ 90%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_histogram_constructor.cpp.o
[ 91%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_histogram_constructor.cu.o
[ 93%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_leaf_splits.cpp.o
[ 94%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_leaf_splits.cu.o
[ 95%] Building CXX object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_single_gpu_tree_learner.cpp.o
[ 97%] Building CUDA object CMakeFiles/lightgbm_objs.dir/src/treelearner/cuda/cuda_single_gpu_tree_learner.cu.o
[ 97%] Built target lightgbm_objs
[ 98%] Linking CUDA device code CMakeFiles/_lightgbm.dir/cmake_device_link.o
[100%] Linking CXX shared library ../lib_lightgbm.so
[100%] Built target _lightgb

Python build + install logs.

building lightgbm
Collecting build>=0.10.0
  Downloading build-1.2.1-py3-none-any.whl.metadata (4.3 kB)
Requirement already satisfied: packaging>=19.1 in /opt/conda/lib/python3.10/site-packages (from build>=0.10.0) (24.0)
Collecting pyproject_hooks (from build>=0.10.0)
  Downloading pyproject_hooks-1.0.0-py3-none-any.whl.metadata (1.3 kB)
Collecting tomli>=1.1.0 (from build>=0.10.0)
  Downloading tomli-2.0.1-py3-none-any.whl.metadata (8.9 kB)
Downloading build-1.2.1-py3-none-any.whl (21 kB)
Downloading tomli-2.0.1-py3-none-any.whl (12 kB)
Downloading pyproject_hooks-1.0.0-py3-none-any.whl (9.3 kB)
Installing collected packages: tomli, pyproject_hooks, build
Successfully installed build-1.2.1 pyproject_hooks-1.0.0 tomli-2.0.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
found pre-compiled lib_lightgbm.so
--- building sdist ---
* Creating isolated environment: venv+pip...
* Installing packages in isolated environment:
  - setuptools
* Getting build dependencies for sdist...
running egg_info
creating lightgbm.egg-info
writing lightgbm.egg-info/PKG-INFO
writing dependency_links to lightgbm.egg-info/dependency_links.txt
writing requirements to lightgbm.egg-info/requires.txt
writing top-level names to lightgbm.egg-info/top_level.txt
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.dll' under directory 'lightgbm'
warning: no files found matching '*.dylib' under directory 'lightgbm'
adding license file 'LICENSE'
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
* Building sdist...
running sdist
running egg_info
writing lightgbm.egg-info/PKG-INFO
writing dependency_links to lightgbm.egg-info/dependency_links.txt
writing requirements to lightgbm.egg-info/requires.txt
writing top-level names to lightgbm.egg-info/top_level.txt
reading manifest file 'lightgbm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.dll' under directory 'lightgbm'
warning: no files found matching '*.dylib' under directory 'lightgbm'
adding license file 'LICENSE'
writing manifest file 'lightgbm.egg-info/SOURCES.txt'
running check
creating lightgbm-4.3.0.99
creating lightgbm-4.3.0.99/lightgbm
creating lightgbm-4.3.0.99/lightgbm.egg-info
creating lightgbm-4.3.0.99/lightgbm/lib
copying files to lightgbm-4.3.0.99...
copying LICENSE -> lightgbm-4.3.0.99
copying MANIFEST.in -> lightgbm-4.3.0.99
copying README.rst -> lightgbm-4.3.0.99
copying pyproject.toml -> lightgbm-4.3.0.99
copying setup.cfg -> lightgbm-4.3.0.99
copying lightgbm/__init__.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/basic.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/callback.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/compat.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/dask.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/engine.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/libpath.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/plotting.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/py.typed -> lightgbm-4.3.0.99/lightgbm
copying lightgbm/sklearn.py -> lightgbm-4.3.0.99/lightgbm
copying lightgbm.egg-info/PKG-INFO -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm.egg-info/SOURCES.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm.egg-info/dependency_links.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm.egg-info/requires.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm.egg-info/top_level.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
copying lightgbm/lib/lib_lightgbm.so -> lightgbm-4.3.0.99/lightgbm/lib
copying lightgbm.egg-info/SOURCES.txt -> lightgbm-4.3.0.99/lightgbm.egg-info
Writing lightgbm-4.3.0.99/setup.cfg
Creating tar archive
removing 'lightgbm-4.3.0.99' (and everything under it)
Successfully built lightgbm-4.3.0.99.tar.gz
--- installing lightgbm ---
WARNING: Skipping lightgbm as it is not installed.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in links: .
Processing ./lightgbm-4.3.0.99.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from lightgbm) (1.26.4)
Requirement already satisfied: scipy in /opt/conda/lib/python3.10/site-packages (from lightgbm) (1.13.0)
Building wheels for collected packages: lightgbm
  Building wheel for lightgbm (pyproject.toml) ... done
  Created wheel for lightgbm: filename=lightgbm-4.3.0.99-py3-none-any.whl size=62203670 sha256=ea5fe085de440887522cfa4a3b9f9ee1b076bc93be325cd1a3f068471d73bdf8
  Stored in directory: /tmp/pip-ephem-wheel-cache-_be0h8ev/wheels/97/06/d4/842e2ab3fea42d639f11ba3250fbe19b540afb7108b58b2cfc
Successfully built lightgbm
Installing collected packages: lightgbm
Successfully installed lightgbm-4.3.0.99
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
cleaning up

Copy link

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

@fingoldo
Copy link

fingoldo commented Jul 9, 2024

Sorry for the long delay in response. I believe recent changes in LightGBM have fixed this.

I was able to build latest LightGBM (1443548) in the latest stable rapidsai/base image.

(rapidsai/rapidsai-core images were removed as part of rapidsai/docker#539)

docker run \
    --rm \
    --user root \
    -it rapidsai/base:24.04-cuda12.0-py3.10 \
    bash

mkdir /tmp/lib
cd /tmp/lib 

# install build tools (rapidsai/core doesn't ship these)
apt-get update
apt-get install -y \
    build-essential \
    cmake \
    git

# build LightGBM
git clone --recursive https://github.com/microsoft/LightGBM

cd ./LightGBM
cmake -B build -S . -DUSE_CUDA=1
cmake --build build --target _lightgbm -j2
sh build-python.sh install --precompile

That built successfully for me.

full logs (click me)

Wondering if it's possible to enforce architecture somehow. Trying to reproduce your commands on NVIDIA RTX 6000 Ada (SM 8.9) & CUDA Version: 12.4, Ubuntu 20.04.6 LTS leads to

#$ "/usr/bin"/c++ -D__CUDA_ARCH__=300 -E -x c++
-DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__
-D__CUDACC_VER_MAJOR__=10 -D__CUDACC_VER_MINOR__=1
-D__CUDACC_VER_BUILD__=243 -include "cuda_runtime.h" -m64
"CMakeCUDACompilerId.cu" > "tmp/CMakeCUDACompilerId.cpp1.ii"

#$ cicc --c++14 --gnu_version=80400 --allow_managed -arch compute_30 -m64
-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name
"CMakeCUDACompilerId.fatbin.c" -tused -nvvmir-library
"/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc"
--gen_module_id_file --module_id_file_name
"tmp/CMakeCUDACompilerId.module_id" --orig_src_file_name
"CMakeCUDACompilerId.cu" --gen_c_file_name
"tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name
"tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name
"tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o
"tmp/CMakeCUDACompilerId.ptx"

#$ ptxas -arch=sm_30 -m64 "tmp/CMakeCUDACompilerId.ptx" -o
"tmp/CMakeCUDACompilerId.sm_30.cubin"

ptxas fatal : Value 'sm_30' is not defined for option 'gpu-name'

@fingoldo
Copy link

fingoldo commented Jul 9, 2024

Nevermind. I had to remove nvidia-cuda-toolkit (which I installed 'cause it allowed open CL version of lightgbm to work, only to find out it's buggy on big datasets and overall an abandoned branch).

Currently stuck at

found pre-compiled lib_lightgbm.so
--- building sdist ---
build-python.sh: 347: python: not found

Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command ;-)

@jameslamb
Copy link
Collaborator

build-python.sh: 347: python: not found

You have to have Python installed and a python executable available on PATH to build LightGBM's Python package.

I strongly suspect that you aren't using the exact example I provided in #5785 (comment), but you haven't described your setup here so it's not possible to help much more.

Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command

We're doing the best we can with a much smaller amount of maintainer availability. Those projects both have multiple maintainers being paid to work on them full-time... LightGBM does not.

You're welcome to come contribute here any time.

@fingoldo
Copy link

fingoldo commented Jul 9, 2024

build-python.sh: 347: python: not found

You have to have Python installed and a python executable available on PATH to build LightGBM's Python package.

I strongly suspect that you aren't using the exact example I provided in #5785 (comment), but you haven't described your setup here so it's not possible to help much more.

Why is it so hard to get lightgbm working with GPU? Catboost & Xgboost teams somehow managed to solve it with single "pip install" command

We're doing the best we can with a much smaller amount of maintainer availability. Those projects both have multiple maintainers being paid to work on them full-time... LightGBM does not.

You're welcome to come contribute here any time.

Yeah, I know. Thanks a lot for your hard work, guys. I hope getting an easier access to GPU training is on the roadmap. Not experienced myself in that, otherwise would contribute for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants