Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and add validation to IntervalIndex.__init__ #14778

Merged
merged 14 commits into from
Jan 23, 2024

Conversation

mroeschke
Copy link
Contributor

Description

  • Adding validation to closed, dtype arguments in ItervalIndex.__init__
  • Ensure closed attribute always maps to IntervalDtype.closed
  • build_interval_column was no longer necessary by using IntervalColumn directly

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added bug Something isn't working Python Affects Python cuDF API. non-breaking Non-breaking change labels Jan 18, 2024
@mroeschke mroeschke requested a review from a team as a code owner January 18, 2024 03:44
@mroeschke mroeschke mentioned this pull request Jan 18, 2024
3 tasks
@bdice
Copy link
Contributor

bdice commented Jan 19, 2024

@mroeschke Looks like there are some test failures to work out.

=========================== short test summary info ============================
FAILED tests/test_series.py::test_series_value_counts_bins[1] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997]], dtype='interval[float64, right]')
FAILED tests/test_series.py::test_series_value_counts_bins[2] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 2.0], (2.0, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997], (2.0, 2.0]], dtype='interval[float64, right]')
FAILED tests/test_series.py::test_series_value_counts_bins[3] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 1.667], (1.667, 2.333], (2.333, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997], (1.667, 1.667], (2.333, 2.333]], dtype='interval[float64, right]')
FAILED tests/test_series.py::test_series_value_counts_bins_dropna[True-1] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997]], dtype='interval[float64, right]')
FAILED tests/test_series.py::test_series_value_counts_bins_dropna[True-2] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 2.0], (2.0, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997], (2.0, 2.0]], dtype='interval[float64, right]')
FAILED tests/test_series.py::test_series_value_counts_bins_dropna[True-3] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 1.667], (1.667, 2.333], (2.333, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997], (1.667, 1.667], (2.333, 2.333]], dtype='interval[float64, right]')
FAILED tests/test_series.py::test_series_value_counts_bins_dropna[False-1] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997]], dtype='interval[float64, right]')
FAILED tests/test_series.py::test_series_value_counts_bins_dropna[False-2] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 2.0], (2.0, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997], (2.0, 2.0]], dtype='interval[float64, right]')
FAILED tests/test_series.py::test_series_value_counts_bins_dropna[False-3] - AssertionError: Series.index are different

Series.index values are different (100.0 %)
[left]:  IntervalIndex([(0.997, 1.667], (1.667, 2.333], (2.333, 3.0]], dtype='interval[float64, right]')
[right]: IntervalIndex([(0.997, 0.997], (1.667, 1.667], (2.333, 2.333]], dtype='interval[float64, right]')
FAILED tests/test_indexing.py::test_series_iloc_scalar_interval_return_pd_scalar[iloc-0] - AssertionError: assert Interval(1, 1, closed='right') == Interval(1, 2, closed='right')
FAILED tests/test_indexing.py::test_series_iloc_scalar_interval_return_pd_scalar[loc-a] - AssertionError: assert Interval(1, 1, closed='right') == Interval(1, 2, closed='right')
FAILED tests/test_indexing.py::test_dataframe_iloc_scalar_interval_return_pd_scalar[iloc-0-0] - AssertionError: assert Interval(1, 1, closed='right') == Interval(1, 2, closed='right')
FAILED tests/test_indexing.py::test_dataframe_iloc_scalar_interval_return_pd_scalar[loc-a-a] - AssertionError: assert Interval(1, 1, closed='right') == Interval(1, 2, closed='right')
FAILED tests/test_doctests.py::TestDoctests::test_docstring[Series.value_counts] - AssertionError: 1 of 11 doctests failed for Series.value_counts:
  **********************************************************************
  File "/opt/conda/envs/test/lib/python3.10/site-packages/cudf/core/series.py", line 212, in Series.value_counts
  Failed example:
      s.value_counts(bins=3)
  Expected:
      (2.0, 3.0]      2
      (0.996, 2.0]    2
      (3.0, 4.0]      1
      dtype: int64
  Got:
      (2.0, 2.0]        2
      (0.996, 0.996]    2
      (3.0, 3.0]        1
      dtype: int64
  **********************************************************************
  1 items had failures:
     1 of  11 in Series.value_counts
  ***Test Failed*** 1 failures.
  
assert not 1
 +  where 1 = TestResults(failed=1, attempted=11).failed
=== 14 failed, 101786 passed, 2061 skipped, 916 xfailed in 875.80s (0:14:35) ===

@mroeschke
Copy link
Contributor Author

Thanks yeah it appears I have an off-by-one error somewhere I'm trying to track down

@vyasr
Copy link
Contributor

vyasr commented Jan 23, 2024

/merge

@rapids-bot rapids-bot bot merged commit c83b9fd into rapidsai:branch-24.02 Jan 23, 2024
68 checks passed
@mroeschke mroeschke deleted the ref/ii/init branch January 23, 2024 17:34
rapids-bot bot pushed a commit that referenced this pull request Feb 23, 2024
IMO these do not provide much value compared to constructing with `ListColumn` or `StructColumn` cc #14778 (comment)

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14786
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants