Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document build problem on CentOS 7 #2

Closed
kaigai opened this issue May 10, 2017 · 7 comments
Closed

Document build problem on CentOS 7 #2

kaigai opened this issue May 10, 2017 · 7 comments

Comments

@kaigai
Copy link

kaigai commented May 10, 2017

When I tried to build the documentation on CentOS 7, using the packaged version of python-sphinx, its toolchain didn't work well because sphinx-build distributed with RPM package was too old. (python-sphinx-1.1.3-11.el7.noarch.rpm does not support -M option.)
It is helpful to describe the minimum required version.

[kaigai@namazu docs]$ make html
Sphinx v1.1.3
Usage: /usr/bin/sphinx-build [options] sourcedir outdir [filenames...]
Options: -b <builder> -- builder to use; default is html
         -a        -- write all files; default is to only write new and changed files
         -E        -- don't use a saved environment, always read all files
         -t <tag>  -- include "only" blocks with <tag>
         -d <path> -- path for the cached environment and doctree files
                      (default: outdir/.doctrees)
         -c <path> -- path where configuration file (conf.py) is located
                      (default: same as sourcedir)
         -C        -- use no config file at all, only -D options
         -D <setting=value> -- override a setting in configuration
         -A <name=value>    -- pass a value into the templates, for HTML builder
         -n        -- nit-picky mode, warn about all missing references
         -N        -- do not do colored output
         -q        -- no output on stdout, just warnings on stderr
         -Q        -- no output at all, not even warnings
         -w <file> -- write warnings (and errors) to given file
         -W        -- turn warnings into errors
         -P        -- run Pdb on exception
Modi:
* without -a and without filenames, write new and changed files.
* with -a, write all files.
* with filenames, write these.
make: *** [html] Error 1

Documentation could be built with the latest version download and overwritten with pip command, however, it was little bit inconvenient for CentOS/RHEL environment.

Thanks,

@kaigai
Copy link
Author

kaigai commented May 10, 2017

Documentation could be built with the latest version download and overwritten with pip command,

No, it is still failed to build the document.

The following error messages are shown. I'm not sure what package provides /usr/lib/accelerate_radixsort.so, although message said it is just warnings.

[kaigai@namazu docs]$ make html
sphinx-build -M html source build
Running Sphinx v1.5.5
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed
reading sources... [100%] api
/home/kaigai/repo/pygdf/docs/source/api.rst:10: WARNING: autodoc: failed to import class u'DataFrame' from module u'pygdf.dataframe'; the following exception was raised:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/sphinx/ext/autodoc.py", line 551, in import_object
    __import__(self.modname)
  File "/home/kaigai/repo/pygdf/pygdf/dataframe.py", line 9, in <module>
    from . import cudautils, utils
  File "/home/kaigai/repo/pygdf/pygdf/cudautils.py", line 8, in <module>
    from .sorting import RadixSort
  File "/home/kaigai/repo/pygdf/pygdf/sorting/__init__.py", line 1, in <module>
    from .radixsort import RadixSort
  File "/home/kaigai/repo/pygdf/pygdf/sorting/radixsort.py", line 38, in <module>
    lib = load_lib('radixsort')
  File "/home/kaigai/repo/pygdf/pygdf/sorting/common.py", line 26, in load_lib
    return ctypes.CDLL(libpath)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/accelerate_radixsort.so: cannot open shared object file: No such file or directory
/home/kaigai/repo/pygdf/docs/source/api.rst:17: WARNING: autodoc: failed to import class u'Series' from module u'pygdf.dataframe'; the following exception was raised:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/sphinx/ext/autodoc.py", line 551, in import_object
    __import__(self.modname)
  File "/home/kaigai/repo/pygdf/pygdf/dataframe.py", line 9, in <module>
    from . import cudautils, utils
  File "/home/kaigai/repo/pygdf/pygdf/cudautils.py", line 8, in <module>
    from .sorting import RadixSort
  File "/home/kaigai/repo/pygdf/pygdf/sorting/__init__.py", line 1, in <module>
    from .radixsort import RadixSort
  File "/home/kaigai/repo/pygdf/pygdf/sorting/radixsort.py", line 38, in <module>
    lib = load_lib('radixsort')
  File "/home/kaigai/repo/pygdf/pygdf/sorting/common.py", line 26, in load_lib
    return ctypes.CDLL(libpath)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/accelerate_radixsort.so: cannot open shared object file: No such file or directory
/home/kaigai/repo/pygdf/docs/source/api.rst:26: WARNING: autodoc: failed to import class u'GpuArrowReader' from module u'pygdf.gpuarrow'; the following exception was raised:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/sphinx/ext/autodoc.py", line 551, in import_object
    __import__(self.modname)
  File "/home/kaigai/repo/pygdf/pygdf/gpuarrow.py", line 13, in <module>
    from .dataframe import Series
  File "/home/kaigai/repo/pygdf/pygdf/dataframe.py", line 9, in <module>
    from . import cudautils, utils
  File "/home/kaigai/repo/pygdf/pygdf/cudautils.py", line 8, in <module>
    from .sorting import RadixSort
  File "/home/kaigai/repo/pygdf/pygdf/sorting/__init__.py", line 1, in <module>
    from .radixsort import RadixSort
  File "/home/kaigai/repo/pygdf/pygdf/sorting/radixsort.py", line 38, in <module>
    lib = load_lib('radixsort')
  File "/home/kaigai/repo/pygdf/pygdf/sorting/common.py", line 26, in load_lib
    return ctypes.CDLL(libpath)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/accelerate_radixsort.so: cannot open shared object file: No such file or directory
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
generating indices... genindex
writing additional pages... search
copying static files... done
copying extra files... done
dumping search index in English (code: en) ... done
dumping object inventory... done
build succeeded, 3 warnings.

Build finished. The HTML pages are in build/html.

build/html/*.html were generated, however, APIs section are empty...

Any idea?

@kaigai
Copy link
Author

kaigai commented May 17, 2017

Does somebody check the issue?

@sklam
Copy link
Contributor

sklam commented May 17, 2017

Sorry for the late reply. To build the docs, it actually requires all runtime dependency to be present. There are dependencies that are only available through currently. See https://github.com/gpuopenanalytics/pygdf/blob/master/SETUP.md#conda-environments for instruction to setup the testing environment.

We should be able to avoid the need of runtime dependency by moving the import statements around.

@sklam
Copy link
Contributor

sklam commented May 17, 2017

Btw, we are planning to put the docs online soon. We are likely to publish it on readthedocs.io.

@kaigai
Copy link
Author

kaigai commented May 19, 2017

Thanks for your help, however, /usr/lib/accelerate_radixsort.so was not still resolved in my centos7 environment.
According to the error message, sphinx script gets filed to open the shared library above, then it skips some files to be imported. Likely, it is the reason why the generated documents have empty chapters.
Which package provides /usr/lib/accelerate_radixsort.so?

[kaigai@namazu docs]$ source activate pycudf_testing_py35
(pycudf_testing_py35) [kaigai@namazu docs]$
(pycudf_testing_py35) [kaigai@namazu docs]$
(pycudf_testing_py35) [kaigai@namazu docs]$ make html
sphinx-build -M html source build
Running Sphinx v1.5.5
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed
reading sources... [100%] api
/home/kaigai/repo/pygdf/docs/source/api.rst:10: WARNING: autodoc: failed to import class u'DataFrame' from module u'pygdf.dataframe'; the following exception was raised:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/sphinx/ext/autodoc.py", line 551, in import_object
    __import__(self.modname)
  File "/home/kaigai/repo/pygdf/pygdf/dataframe.py", line 9, in <module>
    from . import cudautils, utils
  File "/home/kaigai/repo/pygdf/pygdf/cudautils.py", line 8, in <module>
    from .sorting import RadixSort
  File "/home/kaigai/repo/pygdf/pygdf/sorting/__init__.py", line 1, in <module>
    from .radixsort import RadixSort
  File "/home/kaigai/repo/pygdf/pygdf/sorting/radixsort.py", line 38, in <module>
    lib = load_lib('radixsort')
  File "/home/kaigai/repo/pygdf/pygdf/sorting/common.py", line 26, in load_lib
    return ctypes.CDLL(libpath)
  File "/usr/lib64/python2.7/ctypes/__init__.py", line 360, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/accelerate_radixsort.so: cannot open shared object file: No such file or directory
      :
  <snip>
      :
OSError: /usr/lib/accelerate_radixsort.so: cannot open shared object file: No such file or directory
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
generating indices... genindex
writing additional pages... search
copying static files... done
copying extra files... done
dumping search index in English (code: en) ... done
dumping object inventory... done
build succeeded, 3 warnings.

Build finished. The HTML pages are in build/html.
(pycudf_testing_py35) [kaigai@namazu docs]$

@kaigai
Copy link
Author

kaigai commented May 19, 2017

Btw, we are planning to put the docs online soon. We are likely to publish it on readthedocs.io.

+1
It seems to me better.

In case of PostgreSQL community (where I'm usually working on), developer writes up and modifies documents in SGML, and run document build chain,
https://github.com/postgres/postgres/tree/master/doc/src
however, people usually references static documents already built on web.
https://www.postgresql.org/docs/current/static/index.html

Best regards,

@sklam
Copy link
Contributor

sklam commented May 19, 2017

online docs is now available at http://pygdf.readthedocs.io/
=)

@sklam sklam closed this as completed May 19, 2017
kkraus14 pushed a commit that referenced this issue Aug 23, 2018
* test binary_operator

* test one line

* essentially use _binaryop with a line flipped

* expand to all non commutative reflected ops

* revert rmul
mike-wendt pushed a commit that referenced this issue Oct 26, 2018
kkraus14 pushed a commit that referenced this issue Nov 27, 2018
* adding eq datetime ops for pygdf

* flake8 fixes

* Drop Python 2.7, Add Python 3.7

* removing int coercion for datetime

* Remove Python 3.7 build

* bumping numba

* forgot to commit meta.yaml changes

* flake8

* commutative addition

* commutative subtraction and multiplication

* reflected floordiv and truediv

* cleanup

* stray comment

* change rsub method

* further testing rsub

* rsub docstring

* revert back

* type coercion

* revert to pseudo-commutative implementation

* commutative ops tests

* test comment cleanup

* Feature/reflected ops noncommutative testing (#1)

* np array solution

* cleanup

* np solution for division

* full reflected ops tests

* cleanup

* switching lambda scalar to 2

* Update README.md

Conda installation instruction needed changes with pygdf version.

* Feature/reflected ops update (#2)

* test binary_operator

* test one line

* essentially use _binaryop with a line flipped

* expand to all non commutative reflected ops

* revert rmul

* Feature/reflected ops update (#3)

* test binary_operator

* test one line

* essentially use _binaryop with a line flipped

* expand to all non commutative reflected ops

* revert rmul

* rbinaryop function for clarity

* add scalar to array generation to avoid division by zero behavior

* remove integer division test due to libgdf bug

* Fix timezone issue when converting from datetime object into datetime64

* Remove unused import to fix flake8

* Initial modifications for new join API
mike-wendt added a commit that referenced this issue Dec 10, 2018
raydouglass pushed a commit that referenced this issue May 13, 2019
rjzamora pushed a commit to rjzamora/cudf that referenced this issue Jun 19, 2019
kkraus14 pushed a commit that referenced this issue Aug 16, 2019
kkraus14 pushed a commit that referenced this issue Jan 7, 2020
Modifications to build with external library support.
OlivierNV added a commit to OlivierNV/cudf that referenced this issue Feb 10, 2020
OlivierNV added a commit to OlivierNV/cudf that referenced this issue Feb 21, 2020
codereport added a commit to codereport/cudf that referenced this issue Jun 19, 2020
codereport added a commit to codereport/cudf that referenced this issue Jun 26, 2020
codereport added a commit to codereport/cudf that referenced this issue Jun 29, 2020
codereport added a commit to codereport/cudf that referenced this issue Jul 2, 2020
sperlingxx pushed a commit to sperlingxx/cudf that referenced this issue Oct 27, 2020
codereport added a commit to codereport/cudf that referenced this issue Mar 29, 2021
codereport added a commit to codereport/cudf that referenced this issue May 27, 2021
codereport added a commit to codereport/cudf that referenced this issue Jan 19, 2022
rapids-bot bot pushed a commit that referenced this issue Aug 3, 2022
isort fix for strings_udf
rapids-bot bot pushed a commit that referenced this issue Sep 30, 2022
rapids-bot bot pushed a commit that referenced this issue May 4, 2023
…_counts

Remove UNKNOWN_NULL_COUNT from timestamp and duration factories
rapids-bot bot pushed a commit that referenced this issue Jun 9, 2023
This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort.

Closes #12422.

### Example:
```
#0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446
#1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113
#2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e
#3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88
#4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9
#5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027
#6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f
#7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6
#8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195
#9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109
#10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f
#11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87
#12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95
#13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c
#14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
#15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
#16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5
```

### Usage

In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use.

For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example:
```
try {
  // cudf API calls
} catch (cudf::logic_error const& e) {
  std::cout << e.what() << std::endl;
  std::cout << e.stacktrace() << std::endl;
  throw e;
} 
// similar with catching other exception types
```

### Follow-up work

The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace.


### Note:
 * This feature doesn't require libcudf to be built in Debug mode.
 * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions.
 * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging).
 * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Jason Lowe (https://github.com/jlowe)

URL: #13298
rapids-bot bot pushed a commit that referenced this issue Sep 22, 2023
Pin conda packages to `aws-sdk-cpp<1.11`. The recent upgrade in version `1.11.*` has caused several issues with cleaning up (more details on changes can be read in [this link](https://github.com/aws/aws-sdk-cpp#version-111-is-now-available)), leading to Distributed and Dask-CUDA processes to segfault. The stack for one of those crashes looks like the following:

```
(gdb) bt
#0  0x00007f5125359a0c in Aws::Utils::Logging::s_aws_logger_redirect_get_log_level(aws_logger*, unsigned int) () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../.././libaws-cpp-sdk-core.so
#1  0x00007f5124968f83 in aws_event_loop_thread () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-io.so.1.0.0
#2  0x00007f5124ad9359 in thread_fn () from /opt/conda/envs/dask/lib/python3.9/site-packages/pyarrow/../../../././libaws-c-common.so.1
#3  0x00007f519958f6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4  0x00007f5198b1361f in clone () from /lib/x86_64-linux-gnu/libc.so.6
```

Such segfaults now manifest frequently in CI, and in some cases are reproducible with a hit rate of ~30%. Given the approaching release time, it's probably the safest option to just pin to an older version of the package while we don't pinpoint the exact cause for the issue and a patched build is released upstream.

The `aws-sdk-cpp` is statically-linked in the `pyarrow` pip package, which prevents us from using the same pinning technique. cuDF is currently pinned to `pyarrow=12.0.1` which seems to be built against `aws-sdk-cpp=1.10.*`, as per [recent build logs](https://github.com/apache/arrow/actions/runs/6276453828/job/17046177335?pr=37792#step:6:1372).

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Ray Douglass (https://github.com/raydouglass)

URL: #14173
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants