Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add first and last method to IndexedFrame #9710

Merged
merged 13 commits into from
Dec 24, 2021
Merged

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Nov 17, 2021

closes #9600

This PR adds first and last method to indexed_frame. This method only applies to IndexedFrame with DatetimeIndex and gathers the first or last rows within time range specified by offset argument.

@github-actions github-actions bot added the Python Affects Python cuDF API. label Nov 17, 2021
@isVoid isVoid added feature request New feature or request non-breaking Non-breaking change labels Nov 17, 2021
@isVoid isVoid self-assigned this Nov 17, 2021
@codecov
Copy link

codecov bot commented Nov 17, 2021

Codecov Report

Merging #9710 (1076a7c) into branch-22.02 (967a333) will decrease coverage by 0.09%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02    #9710      +/-   ##
================================================
- Coverage         10.49%   10.39%   -0.10%     
================================================
  Files               119      119              
  Lines             20305    20523     +218     
================================================
+ Hits               2130     2134       +4     
- Misses            18175    18389     +214     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.30% <0.00%> (-0.61%) ⬇️
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/series.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/utils.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/dtypes.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/ioutils.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56430b4...1076a7c. Read the comment docs.

@isVoid isVoid marked this pull request as ready for review December 4, 2021 05:15
@isVoid isVoid requested a review from a team as a code owner December 4, 2021 05:15
@galipremsagar galipremsagar self-requested a review December 7, 2021 16:32
return self.copy()

pd_offset = pd.tseries.frequencies.to_offset(offset)
to_search = op(pd.Timestamp(self._index._column[idx]), pd_offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the op callable do? and what is the operation being done in this line // what does to_search repr? jw

Copy link
Contributor Author

@isVoid isVoid Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If first is called, then op is add to compute the cut-off date counting from the first date in the column; if last, op is sub to compute the that from the last date. to_search is the cut-off date. Feel free to request changes if the naming/logic isn't very readable to your taste.

if (
idx == 0
and not isinstance(pd_offset, pd.tseries.offsets.Tick)
and pd_offset.is_on_offset(pd.Timestamp(self._index[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does is_on_offset check for datetimes values only at the end of the offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pandas DateOffset can be MonthBegin or MonthEnd. Depending on the string to be either MS or M.

In [3]: pd.tseries.frequencies.to_offset('M')
Out[3]: <MonthEnd>

In [4]: pd.tseries.frequencies.to_offset('MS')
Out[4]: <MonthBegin>

I believe is_on_offset checks if the given datetime falls on the given offset. There isn't much documentation about this but it seems like every kind of offset has this function.

Like Nano would always return true:
https://github.com/pandas-dev/pandas/blob/878a0225c648cb145949f78085a8ff3f902a1c20/pandas/_libs/tslibs/offsets.pyx#L834-L835

and BusinessDay would check if the weekday of the given date is monday to friday
https://github.com/pandas-dev/pandas/blob/878a0225c648cb145949f78085a8ff3f902a1c20/pandas/_libs/tslibs/offsets.pyx#L1445-L1448

and for MonthEnd I believe it resorts to the base method:
https://github.com/pandas-dev/pandas/blob/878a0225c648cb145949f78085a8ff3f902a1c20/pandas/_libs/tslibs/offsets.pyx#L650-L659
Note that + and - on the MonthEnd basically sets the date to the end of that month, same apply to other offsets.

@karthikeyann karthikeyann changed the title Add first and last method Add first and last method to IndexedFrame Dec 9, 2021
@isVoid isVoid added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Dec 18, 2021
@galipremsagar
Copy link
Contributor

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e432d01 into rapidsai:branch-22.02 Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Series and and DataFrame.first and last
3 participants