-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added test test_datetimeField_after_setitem for issue #6942 #28790
Added test test_datetimeField_after_setitem for issue #6942 #28790
Conversation
pandas/tests/generic/test_frame.py
Outdated
@@ -287,3 +289,36 @@ def test_deepcopy_empty(self): | |||
empty_frame_copy = deepcopy(empty_frame) | |||
|
|||
self._compare(empty_frame_copy, empty_frame) | |||
|
|||
def test_datetimeField_after_setitem(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention the use of at
in the title: test_datetime_setitem_with_at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
pandas/tests/generic/test_frame.py
Outdated
df.at[start, "timenow"] = datetime.today() # initial time. | ||
time1 = df.at[start, "timenow"] | ||
|
||
time.sleep(1) # sleep time of 1 second in between assignments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sleep
calls are not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests were failing with AssertionError, so added those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were AssertionError
s being raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the same code on my computer and got the following results:
time1: 2019-10-04 23:23:31.911241
time2: 2019-10-04 23:23:31.911534
time3: 2019-10-04 23:23:31.912060
that is a time difference of 10^(-3) between the time1 and time2.Now if both the statements are executed within a time frame of <= 10^(-6) then the two will match. So that's why I thought of adding the sleep in between time1, time2 and time3 to ascertain this doesn't happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests should generally be
result = pd.DataFrame(index=pd.date_range(start,periods=1), columns=['timenow','Live'])
var = datetime.today()
result.at[start,'timenow'] = var
result.Live = True
expected = pd.DataFrame([[var, True]],index=pd.date_range(start,periods=1), columns=['timenow','Live'])
tm.assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I misread the problem and was writing the test to check if the timenow column changes value after setting the Live column with "at".
So, what the test should check is that the dtype of timenow column remains unchanged after setting the Live column to True with at. For that I could use the assert_series_equal and check the dtype of timenow before and after.
Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should check that the timenow
column changes once df.Live = True
i.e. the example in #6942 (comment) is the fixed behavior that we want to test for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some points:
-
For the general template of tests that is used in pandas/tests, most of them assert that either two DataFrames or Series are equal and call the predefined functions
tm.assert_something_equal
. However in this case we assert the column actually changes value and I couldn't find a function for such a case intm
and therefore wrote the individual assert statements for the different values in thetimenow
column(asserting that they are not equal). -
Without the sleep statements inserted, some checks were failing. When I viewed the log of the failed tests I observed that the error was because
assert not time1 == time2
was evaluating to
assert not datetime.datetime(2019, 10, 4, 16, 12, 38, 392372) == datetime.datetime(2019, 10, 4, 16, 12, 38, 392372)
So I realized that this could only occur if both the statements are run within a time-frame < 10^(-6). When I added the sleep statements the checks passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will still use tm.assert_frame_equal
. You will have to construct the expected dataframe from scratch
In [4]: result = pd.DataFrame(index=pd.date_range(start,periods=1), columns=['timenow','Live'])
In [5]: result.at[start,'timenow'] = datetime.today() # initial value
In [7]: new_date = datetime.today()
In [9]: result.Live = True
In [10]: result.at[start,'timenow'] = new_date
In [11]: expected = pd.DataFrame([[new_date, True]], index=pd.date_range(start,periods=1), columns=['timenow','Live'])
In [12]: tm.assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, couldn't come up with that! Will make the necessary changes and resubmit the PR.
pandas/tests/generic/test_frame.py
Outdated
|
||
df.Live = True # setting the 'Live' column to True. | ||
|
||
time.sleep(1) # sleep time of 1 second in between assignments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sleep
calls are not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some checks were failing with AssertionError, so I added those
pandas/tests/generic/test_frame.py
Outdated
] = datetime.today() # modified time after 'Live' column is set. | ||
time3 = df.at[start, "timenow"] | ||
|
||
assert not time1 == time2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would like to construct 2 dataframes here and use assert_frame_equal
to compare them
result = ...
expected = ...
tm.assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of the same but the whole point of the test is to make sure that the datetimeField changes with each assignment and as such would require to assert that frames are not equal, that's went with this. Couldn't find a more elegant way round this.
Hello @anirudnits! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-01-04 04:19:00 UTC |
pandas/tests/generic/test_frame.py
Outdated
columns=["timenow", "Live"], | ||
) | ||
|
||
assert_frame_equal(result, expected, check_dtype=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_dtype=True
should be set here.
pandas/tests/generic/test_frame.py
Outdated
index=pd.date_range(start, periods=1), columns=["timenow", "Live"] | ||
) | ||
|
||
result.at[start, "timenow"] = datetime.today() # initial datetime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this comment.
pandas/tests/generic/test_frame.py
Outdated
@@ -287,3 +288,30 @@ def test_deepcopy_empty(self): | |||
empty_frame_copy = deepcopy(empty_frame) | |||
|
|||
self._compare(empty_frame_copy, empty_frame) | |||
|
|||
def test_datetimeField_after_setitem_with_at(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: test_datetime_field_after_setitem_with_at
@anirudnits is this still active? Can you merge master and repush? |
@WillAyd sure will do that. |
…d_test_in_test_frame.py_for_issue_#6942 Need to merge with the master branch to make a pull request to the main repository
@anirudnits looks like this is failing - can you get CI green? |
@WillAyd sorry for the late reply, I am little caught up with some things and will work on the same as soon as possible. Thanks |
…d_test_in_test_frame.py_for_issue_#6942 Just updating the branch to the latest version from GITHUB
@anirudnits is this still active? |
I guess the reason that CI is failing is because that the dtypes of the DataFrames are not equal (I don't know if that is by design or an error). I can make the |
Yea I don't think the original issue is fixed then - are you interested in trying to debug further? |
@WillAyd yeah sure. I will work on it. |
@@ -280,3 +281,30 @@ def test_deepcopy_empty(self): | |||
empty_frame_copy = deepcopy(empty_frame) | |||
|
|||
self._compare(empty_frame_copy, empty_frame) | |||
|
|||
def test_datetime_after_setitem_with_at(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should go in tests.indexing.test_scalar
is "at" the function being tested? if so, the name should be something like "test_at_with..."
# This test covers the unexpected behaviour of datetimeField when using | ||
# setitem on another column as reported in issue #6942 | ||
|
||
start = pd.to_datetime("20140401") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use pd.Timestamp here
@@ -280,3 +281,30 @@ def test_deepcopy_empty(self): | |||
empty_frame_copy = deepcopy(empty_frame) | |||
|
|||
self._compare(empty_frame_copy, empty_frame) | |||
|
|||
def test_datetime_after_setitem_with_at(self): | |||
# This test covers the unexpected behaviour of datetimeField when using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datetimeField -> datetime64 column ?
|
||
new_datetime = datetime.today() | ||
|
||
result.Live = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the test rely on this being result.Live instead of result["Live"]? if not, pls use the latter
|
||
result.Live = True | ||
|
||
# Changing the value in "timenow" column after "Live" colunn is set to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colunn -> column
|
||
result.at[start, "timenow"] = datetime.today() | ||
|
||
new_datetime = datetime.today() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it matter that this datetime.today() is called separately from the one two lines up? if not, pls use just one datetime object
index=pd.date_range(start, periods=1), columns=["timenow", "Live"] | ||
) | ||
|
||
result.at[start, "timenow"] = datetime.today() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the "at" call being tested this one or the one below? if not this one, can you set this with a better-tested indexer (or just directly in the constructor call above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel yeah that could be done. I did that just to replicate the situation in the original issue thread as it mentioned that the "at" worked before setting an entire column(i.e the "Live" column) but not afterwards.
I guess this behavior with at is causing the issue: With
With the
The same thing with
|
@anirudnits do you have time to merge in master and address the review comments? |
@mroeschke but wouldn't the CI still fail? |
Are you interested in solving the original issue still? |
@mroeschke sure, I'll look over the code and get back to you by this weekend. That's okay? |
I found another peculiar behavior with dtypes and believe it to be unintended shouldn't in the first snippet the dtype of column "a" be int64 just like in the second snippet? |
Correct. I think both should be int64 |
@mroeschke Sorry for the delay in responding I had been stuck with my school work. So, along with the dtype issues mentioned in this thread, some other dtype issues especially with .loc have been mentioned in issues #11617, #14205, #14337 and #14361, some of these issues are closed without proper solution of the said problem. To add to these issues I found something interesting:
But
So, I believe it has something to do with how loc sets an entire row versus a single column in a row(even though the row may have only one column). Using at in the above scenario gives the expected dtype of int64 both times. I tried to find the section in pandas/pandas/core/indexing.py Line 900 in 6258397
|
is this still active? this was supposed to be a simple test, but has morphed; let's keep the scope down here. |
@jreback I got stuck with the whole indexer thing and couldn't find any real solutions to the |
@anirudnits can you merge master and fix conflicts? |
@anirudnits thanks for the PR. Going to close as stale but please let us know if you would like to pick up this issue again. |
Added test
test_datetimeField_after_setitem
intests/generic/test_frame.py
.black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff