Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch 22.12 #6

Merged
merged 15 commits into from
Sep 30, 2022
Merged

Branch 22.12 #6

merged 15 commits into from
Sep 30, 2022

Conversation

etseidl
Copy link
Owner

@etseidl etseidl commented Sep 30, 2022

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

galipremsagar and others added 15 commits September 29, 2022 13:24
Root cause:
```python
In [1]: import numpy as np

In [2]: x = np.uint8(1)

In [3]: y = np.float64(1.0)

In [4]: x.__ge__(y)
Out[4]: NotImplemented

In [8]: x >= y
Out[8]: True
```
This is leading to the following error whenever there is a Scalar binary operation involved:
```python
python/cudf/cudf/tests/test_series.py:449: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../envs/cudfdev/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/series.py:2988: in describe
    data = _describe_categorical(self, percentiles)
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/series.py:152: in _describe_categorical
    val_counts = obj.value_counts(ascending=False)
../envs/cudfdev/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/series.py:2862: in value_counts
    res = res.sort_values(ascending=ascending)
../envs/cudfdev/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/series.py:1910: in sort_values
    return super().sort_values(
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/indexed_frame.py:1916: in sort_values
    out = self._gather(
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/indexed_frame.py:1523: in _gather
    if not libcudf.copying._gather_map_is_valid(
copying.pyx:67: in cudf._lib.copying._gather_map_is_valid
    ???
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/mixins/mixin_factory.py:11: in wrapper
    return method(self, *args1, *args2, **kwargs1, **kwargs2)
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/scalar.py:350: in _binaryop
    return Scalar(result, dtype=out_dtype)
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/scalar.py:56: in __call__
    obj = super().__call__(value, dtype=dtype)
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/scalar.py:128: in __init__
    self._host_value, self._host_dtype = self._preprocess_host_value(
../envs/cudfdev/lib/python3.9/site-packages/cudf/core/scalar.py:222: in _preprocess_host_value
    value = to_cudf_compatible_scalar(value, dtype=dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

val = NotImplemented, dtype = <class 'numpy.bool_'>

    def to_cudf_compatible_scalar(val, dtype=None):
        """
        Converts the value `val` to a numpy/Pandas scalar,
        optionally casting to `dtype`.
    
        If `val` is None, returns None.
        """
    
        if cudf._lib.scalar._is_null_host_scalar(val) or isinstance(
            val, cudf.Scalar
        ):
            return val
    
        if not cudf.api.types._is_scalar_or_zero_d_array(val):
>           raise ValueError(
                f"Cannot convert value of type {type(val).__name__} "
                "to cudf scalar"
            )
E           ValueError: Cannot convert value of type NotImplementedType to cudf scalar

../envs/cudfdev/lib/python3.9/site-packages/cudf/utils/dtypes.py:248: ValueError
```
This PR fixes the issue by first trying to call the `op` with `operator` standard library and then try to `getattr` if the `op` is not found in `operator` module.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - https://github.com/brandon-b-miller

URL: #11816
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Compile warning was introduced in #11652 in `bgzip_data_chunk_source.cu`. The warning can be seen here https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-cuda-build/CUDA=11.5/12417/consoleFull (search for `177-D`)
```
/cudf/cpp/src/io/text/bgzip_data_chunk_source.cu(362): warning #177-D: variable "nvtx3_range__" was declared but never referenced
```
The `nvtx3_range__` is part of the `CUDF_FUNC_RANGE()` macro. The warning is incorrect and likely a compiler bug. The workaround in this PR is to add `[[maybe_unused]]` to the variable declaration.

I was not able to create a small reproducer for compile bug filing.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Tobias Ribizel (https://github.com/upsj)
  - MithunR (https://github.com/mythrocks)

URL: #11798
We need to actually call the method otherwise we will get false positives for validity of the operands.

Fortunately, this seems to have been a benign bug since the host pandas `NAType` handles all of 
the operations appropriately, so the code was "working" before, but the logic was incorrect.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Bradley Dice (https://github.com/bdice)

URL: #11818
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
Adding some examples to show off the nested type JSON reading

Authors:
  - Gregory Kimball (https://github.com/GregoryKimball)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #11814
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
## Description
Disable the use of nvCOMP DEFLATE because of issues with nvCOMP 2.4.
Also fix a Python test (did not block CI because the comparison in the test is only done with  `LIBCUDF_NVCOMP_POLICY="ALWAYS"`.

## Checklist
- [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.

Authors:
   - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
   - Nghia Truong (https://github.com/ttnghia)
   - Jim Brennan (https://github.com/jbrennan333)
   - GALI PREM SAGAR (https://github.com/galipremsagar)
   - Robert Maynard (https://github.com/robertmaynard)
   - Vyas Ramasubramani (https://github.com/vyasr)
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
This PR resolves #10323 and phases out the `gitutils.py` module in favor of a dependency on GitPython that is managed by pre-commit. It fixes the pre-commit check for copyright years so that only modifications between the target branch (`branch-X.Y`) and the current git stage will trigger copyright changes (years will not be updated for unmodified files, or for changes that have not been staged). Additionally, it changes the return code to `1` if changes are requested and applied (if modifications were required, that should be considered a failure).

This is the last step to making our entire style check pipeline friendly to pre-commit.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jordan Jacobelli (https://github.com/Ethyling)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #11711
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
## Description

This switches to using CubinLinker (from PTXCompiler, but CubinLinker uses PTXCompiler internally) for Minor Version Compatibility. This enables support for all Numba features except linking archives with MVC, in support of use cases such as String UDFs (#11319) with MVC.

## Checklist
- [X] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md).
- [X] New or existing tests cover these changes.
- [X] The documentation is up to date with these changes.

Authors:
   - Graham Markall (https://github.com/gmarkall)
   - https://github.com/brandon-b-miller
   - Ashwin Srinath (https://github.com/shwina)

Approvers:
   - Ray Douglass (https://github.com/raydouglass)
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
## Description
The docstring for `cudf.read_text` did not include the `byte_range` argument

## Checklist
- [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md).
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.

Authors:
   - Gregory Kimball ([email protected])

Approvers:
   - Ashwin Srinath (https://github.com/shwina)
   - Lawrence Mitchell (https://github.com/wence-)
[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci]
@etseidl etseidl merged commit edf66a7 into etseidl:feature/skip_pages Sep 30, 2022
etseidl pushed a commit that referenced this pull request Jun 9, 2023
This implements stacktrace and adds a stacktrace string into any exception thrown by cudf. By doing so, the exception carries information about where it originated, allowing the downstream application to trace back with much less effort.

Closes rapidsai#12422.

### Example:
```
#0: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::sorted_order<false>(cudf::table_view, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x446
#1: cudf/cpp/build/libcudf.so : cudf::detail::sorted_order(cudf::table_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x113
#2: cudf/cpp/build/libcudf.so : std::unique_ptr<cudf::column, std::default_delete<cudf::column> > cudf::detail::segmented_sorted_order_common<(cudf::detail::sort_method)1>(cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x66e
#3: cudf/cpp/build/libcudf.so : cudf::detail::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::cuda_stream_view, rmm::mr::device_memory_resource*)+0x88
#4: cudf/cpp/build/libcudf.so : cudf::segmented_sort_by_key(cudf::table_view const&, cudf::table_view const&, cudf::column_view const&, std::vector<cudf::order, std::allocator<cudf::order> > const&, std::vector<cudf::null_order, std::allocator<cudf::null_order> > const&, rmm::mr::device_memory_resource*)+0xb9
#5: cudf/cpp/build/gtests/SORT_TEST : ()+0xe3027
#6: cudf/cpp/build/lib/libgtest.so.1.13.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x8f
rapidsai#7: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::Test::Run()+0xd6
rapidsai#8: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestInfo::Run()+0x195
rapidsai#9: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::TestSuite::Run()+0x109
rapidsai#10: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::internal::UnitTestImpl::RunAllTests()+0x44f
rapidsai#11: cudf/cpp/build/lib/libgtest.so.1.13.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x87
rapidsai#12: cudf/cpp/build/lib/libgtest.so.1.13.0 : testing::UnitTest::Run()+0x95
rapidsai#13: cudf/cpp/build/gtests/SORT_TEST : ()+0xdb08c
rapidsai#14: /lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
rapidsai#15: /lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
rapidsai#16: cudf/cpp/build/gtests/SORT_TEST : ()+0xdf3d5
```

### Usage

In order to retrieve a stacktrace with fully human-readable symbols, some compiling options must be adjusted. To make such adjustment convenient and effortless, a new cmake option (`CUDF_BUILD_STACKTRACE_DEBUG`) has been added. Just set this option to `ON` before building cudf and it will be ready to use.

For downstream applications, whenever a cudf-type exception is thrown, it can retrieve the stored stacktrace and do whatever it wants with it. For example:
```
try {
  // cudf API calls
} catch (cudf::logic_error const& e) {
  std::cout << e.what() << std::endl;
  std::cout << e.stacktrace() << std::endl;
  throw e;
} 
// similar with catching other exception types
```

### Follow-up work

The next step would be patching `rmm` to attach stacktrace into `rmm::` exceptions. Doing so will allow debugging various memory exceptions thrown from libcudf using their stacktrace.


### Note:
 * This feature doesn't require libcudf to be built in Debug mode.
 * The flag `CUDF_BUILD_STACKTRACE_DEBUG` should not be turned on in production as it may affect code optimization. Instead, libcudf compiled with that flag turned on should be used only when needed, when debugging cudf throwing exceptions.
 * This flag removes the current optimization flag from compiling (such as `-O2` or `-O3`, if in Release mode) and replaces by `-Og` (optimize for debugging).
 * If this option is not set to `ON`, the stacktrace will not be available. This is to avoid expensive stracktrace retrieval if the throwing exception is expected.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Robert Maynard (https://github.com/robertmaynard)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Jason Lowe (https://github.com/jlowe)

URL: rapidsai#13298
etseidl pushed a commit that referenced this pull request Nov 8, 2023
Fix to_datetime with format allowing out-of-range values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants