Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test test_datetimeField_after_setitem for issue #6942 #28790

Closed
wants to merge 8 commits into from
Closed

Added test test_datetimeField_after_setitem for issue #6942 #28790

wants to merge 8 commits into from

Conversation

anirudnits
Copy link
Contributor

Added test test_datetimeField_after_setitem in tests/generic/test_frame.py.

@@ -287,3 +289,36 @@ def test_deepcopy_empty(self):
empty_frame_copy = deepcopy(empty_frame)

self._compare(empty_frame_copy, empty_frame)

def test_datetimeField_after_setitem(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the use of at in the title: test_datetime_setitem_with_at

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

df.at[start, "timenow"] = datetime.today() # initial time.
time1 = df.at[start, "timenow"]

time.sleep(1) # sleep time of 1 second in between assignments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sleep calls are not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests were failing with AssertionError, so added those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were AssertionErrors being raised?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the same code on my computer and got the following results:
time1: 2019-10-04 23:23:31.911241
time2: 2019-10-04 23:23:31.911534
time3: 2019-10-04 23:23:31.912060
that is a time difference of 10^(-3) between the time1 and time2.Now if both the statements are executed within a time frame of <= 10^(-6) then the two will match. So that's why I thought of adding the sleep in between time1, time2 and time3 to ascertain this doesn't happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests should generally be

result = pd.DataFrame(index=pd.date_range(start,periods=1), columns=['timenow','Live'])
var = datetime.today()
result.at[start,'timenow'] = var
result.Live = True
expected = pd.DataFrame([[var, True]],index=pd.date_range(start,periods=1), columns=['timenow','Live'])
tm.assert_frame_equal(result, expected) 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misread the problem and was writing the test to check if the timenow column changes value after setting the Live column with "at".
So, what the test should check is that the dtype of timenow column remains unchanged after setting the Live column to True with at. For that I could use the assert_series_equal and check the dtype of timenow before and after.
Is that right?

Copy link
Member

@mroeschke mroeschke Oct 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should check that the timenow column changes once df.Live = True

i.e. the example in #6942 (comment) is the fixed behavior that we want to test for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some points:

  1. For the general template of tests that is used in pandas/tests, most of them assert that either two DataFrames or Series are equal and call the predefined functions tm.assert_something_equal. However in this case we assert the column actually changes value and I couldn't find a function for such a case in tm and therefore wrote the individual assert statements for the different values in the timenow column(asserting that they are not equal).

  2. Without the sleep statements inserted, some checks were failing. When I viewed the log of the failed tests I observed that the error was because
    assert not time1 == time2
    was evaluating to
    assert not datetime.datetime(2019, 10, 4, 16, 12, 38, 392372) == datetime.datetime(2019, 10, 4, 16, 12, 38, 392372)
    So I realized that this could only occur if both the statements are run within a time-frame < 10^(-6). When I added the sleep statements the checks passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will still use tm.assert_frame_equal. You will have to construct the expected dataframe from scratch

In [4]: result = pd.DataFrame(index=pd.date_range(start,periods=1), columns=['timenow','Live'])

In [5]: result.at[start,'timenow'] = datetime.today() # initial value


In [7]: new_date =  datetime.today() 

In [9]: result.Live = True

In [10]: result.at[start,'timenow'] = new_date 

In [11]: expected = pd.DataFrame([[new_date, True]], index=pd.date_range(start,periods=1), columns=['timenow','Live'])

In [12]: tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, couldn't come up with that! Will make the necessary changes and resubmit the PR.


df.Live = True # setting the 'Live' column to True.

time.sleep(1) # sleep time of 1 second in between assignments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sleep calls are not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some checks were failing with AssertionError, so I added those

] = datetime.today() # modified time after 'Live' column is set.
time3 = df.at[start, "timenow"]

assert not time1 == time2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to construct 2 dataframes here and use assert_frame_equal to compare them

result = ...
expected = ...
tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of the same but the whole point of the test is to make sure that the datetimeField changes with each assignment and as such would require to assert that frames are not equal, that's went with this. Couldn't find a more elegant way round this.

@pep8speaks
Copy link

pep8speaks commented Oct 8, 2019

Hello @anirudnits! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-04 04:19:00 UTC

columns=["timenow", "Live"],
)

assert_frame_equal(result, expected, check_dtype=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_dtype=True should be set here.

index=pd.date_range(start, periods=1), columns=["timenow", "Live"]
)

result.at[start, "timenow"] = datetime.today() # initial datetime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this comment.

@@ -287,3 +288,30 @@ def test_deepcopy_empty(self):
empty_frame_copy = deepcopy(empty_frame)

self._compare(empty_frame_copy, empty_frame)

def test_datetimeField_after_setitem_with_at(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: test_datetime_field_after_setitem_with_at

@WillAyd
Copy link
Member

WillAyd commented Nov 7, 2019

@anirudnits is this still active? Can you merge master and repush?

@anirudnits
Copy link
Contributor Author

@WillAyd sure will do that.

…d_test_in_test_frame.py_for_issue_#6942

Need to merge with the master branch to make a pull request to the main repository
@WillAyd
Copy link
Member

WillAyd commented Dec 9, 2019

@anirudnits looks like this is failing - can you get CI green?

@anirudnits
Copy link
Contributor Author

@WillAyd sorry for the late reply, I am little caught up with some things and will work on the same as soon as possible. Thanks

…d_test_in_test_frame.py_for_issue_#6942

Just updating the branch to the latest version from GITHUB
@WillAyd
Copy link
Member

WillAyd commented Feb 2, 2020

@anirudnits is this still active?

@anirudnits
Copy link
Contributor Author

I guess the reason that CI is failing is because that the dtypes of the DataFrames are not equal (I don't know if that is by design or an error). I can make the check_dytpe = False in the assert_frame_equal function and I believe that will pass the CI and sorry to keep this issue idle for so long. Appreciate your patience.

@WillAyd
Copy link
Member

WillAyd commented Mar 14, 2020

Yea I don't think the original issue is fixed then - are you interested in trying to debug further?

@anirudnits
Copy link
Contributor Author

@WillAyd yeah sure. I will work on it.

@anirudnits
Copy link
Contributor Author

@WillAyd I will also mention the issue in the original thread #6942.

@jbrockmendel jbrockmendel added the Testing pandas testing functions or related to the test suite label Mar 19, 2020
@@ -280,3 +281,30 @@ def test_deepcopy_empty(self):
empty_frame_copy = deepcopy(empty_frame)

self._compare(empty_frame_copy, empty_frame)

def test_datetime_after_setitem_with_at(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go in tests.indexing.test_scalar

is "at" the function being tested? if so, the name should be something like "test_at_with..."

# This test covers the unexpected behaviour of datetimeField when using
# setitem on another column as reported in issue #6942

start = pd.to_datetime("20140401")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use pd.Timestamp here

@@ -280,3 +281,30 @@ def test_deepcopy_empty(self):
empty_frame_copy = deepcopy(empty_frame)

self._compare(empty_frame_copy, empty_frame)

def test_datetime_after_setitem_with_at(self):
# This test covers the unexpected behaviour of datetimeField when using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetimeField -> datetime64 column ?


new_datetime = datetime.today()

result.Live = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the test rely on this being result.Live instead of result["Live"]? if not, pls use the latter


result.Live = True

# Changing the value in "timenow" column after "Live" colunn is set to True.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colunn -> column


result.at[start, "timenow"] = datetime.today()

new_datetime = datetime.today()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it matter that this datetime.today() is called separately from the one two lines up? if not, pls use just one datetime object

index=pd.date_range(start, periods=1), columns=["timenow", "Live"]
)

result.at[start, "timenow"] = datetime.today()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the "at" call being tested this one or the one below? if not this one, can you set this with a better-tested indexer (or just directly in the constructor call above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel yeah that could be done. I did that just to replicate the situation in the original issue thread as it mentioned that the "at" worked before setting an entire column(i.e the "Live" column) but not afterwards.

@anirudnits
Copy link
Contributor Author

I guess this behavior with at is causing the issue:

With loc

f1 = pd.DataFrame(columns=["A"])
f1.loc[1] = [dt.now()]
f1.dtypes
A    datetime64[ns]
dtype: object

With the constructor

f1 = pd.DataFrame([[dt.now()]], columns=["A"])
f1.dtypes
A    datetime64[ns]
dtype: object

The same thing with at:

f1 = pd.DataFrame(columns=["A"])
f1.at[1, "A"] = dt.now()
f1.dtypes
A    object
dtype: object

@mroeschke
Copy link
Member

@anirudnits do you have time to merge in master and address the review comments?

@anirudnits
Copy link
Contributor Author

@mroeschke but wouldn't the CI still fail?

@mroeschke
Copy link
Member

Are you interested in solving the original issue still?

@anirudnits
Copy link
Contributor Author

anirudnits commented May 25, 2020

@mroeschke sure, I'll look over the code and get back to you by this weekend. That's okay?

@anirudnits
Copy link
Contributor Author

I found another peculiar behavior with dtypes and believe it to be unintended
df = pd.DataFrame(columns=["a", "b"])
df.loc[0, "a"] = 1
df.dtypes:
a: **object**
b: object
dtype: object
but
df = pd.DataFrame(columns=["a"])
df.loc[0, "a"] = 1
df.dtypes:
a: **int64**
dtype: object

shouldn't in the first snippet the dtype of column "a" be int64 just like in the second snippet?

@mroeschke
Copy link
Member

Correct. I think both should be int64

@anirudnits
Copy link
Contributor Author

anirudnits commented Jun 6, 2020

@mroeschke Sorry for the delay in responding I had been stuck with my school work. So, along with the dtype issues mentioned in this thread, some other dtype issues especially with .loc have been mentioned in issues #11617, #14205, #14337 and #14361, some of these issues are closed without proper solution of the said problem. To add to these issues I found something interesting:

df = pd.DataFrame(columns=['A'])
df.loc[0, "A"] = 7
df.dtypes
A int64
dtype: object

But

df = pd.DataFrame(columns=['A'])
df.loc[0] = 7
df.dtypes
A object
dtype: object

So, I believe it has something to do with how loc sets an entire row versus a single column in a row(even though the row may have only one column). Using at in the above scenario gives the expected dtype of int64 both times. I tried to find the section in

class _LocIndexer(_LocationIndexer):
, where I could find that difference in the behavior of loc in handling the two different cases. But the deeper the went I still couldn't find the exact code what I was looking for. So that's what I am stuck with now and would be thankful if you could point to the file(s) where I would find the desired code sections.

@jreback
Copy link
Contributor

jreback commented Aug 7, 2020

is this still active? this was supposed to be a simple test, but has morphed; let's keep the scope down here.

@anirudnits
Copy link
Contributor Author

@jreback I got stuck with the whole indexer thing and couldn't find any real solutions to the dtype problem :( I can get the CI to pass by adding check_dtype=False when I check the equality of frames.

@WillAyd
Copy link
Member

WillAyd commented Sep 10, 2020

@anirudnits can you merge master and fix conflicts?

@mroeschke
Copy link
Member

@anirudnits thanks for the PR. Going to close as stale but please let us know if you would like to pick up this issue again.

@mroeschke mroeschke closed this Sep 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: setitem with at not inferring dtype correctly
6 participants