-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 20927 fix resolves read_sas error for dates/datetimes greater than 2262-04-11 #28047
Conversation
Couple of thoughts:
|
Thanks for the thoughts @mroeschke I'll add the UserWarning about returning different types. As the code is at the moment, it will return datetime.date/datetime for all rows if there is at least one SAS date/time that is > 2262-04-11. I get some odd results for Periods with dates larger than 2262-04-11, e.g. import pandas as pd
g = pd.Period(year=2262, month=4, day=11, freq='D')
print(g.start_time)
g2 = pd.Period(year=2262, month=4, day=12, freq='D')
print(g2.start_time) gives This is maybe a bug in Period (?) but for the time being returning datetime.date/time gives a better representation of the original SAS dataset |
Could you open up a new issue for the Period overflowing? That looks like a bug.
|
Ok, I'll raise a new issue for the Period overflow. I'll also alter read_sas to return datetime.datetime for both dates and datetimes > 2262-04-11. There will still be the possibility of a mix of np.datetime64 and datetime.datetime types in the final dataset if read_sas is called as an iterator. |
…s > pd.Timestamp.max Also added test for iterator behaviour, chunks that have all dates/datetimes < pd.Timestamp.max will return np.datetime64 as usual, chunks with any date/datetime > pd.Timestamp.max will return datetime.datetime
Merged upstream master
Hello @paul-lilley! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-05-25 20:25:35 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do our other parsers do for out of bounds datetimes?
can you merge master and update to comments |
Hi, apologies for the delayed response. I'll try to address Tom's points, then merge master and update the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are some linting errors @paul-lilley: https://github.com/pandas-dev/pandas/pull/28047/checks?check_run_id=275107697
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our default behavior of to_datetime is to convert to datetime when OOB
so i think it would be ok to just make this change w/o a keyword arg
Looks like this one fell through but is close - @paul-lilley can you fix the merge conflict? |
Hi @WillAyd - I've merged master (it went through without merge conflicts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed that the docstring does not quite align with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 minor comments, ping on green
thanks @paul-lilley very nice! |
@paul-lilley im trying to understand the comment on test_max_sas_date
Does this mean that the test here is still wrong and it really should be coming back with |
@bashtage do you have convenient access to SAS and willingness to help me track down #28047 (comment) |
Hi @jbrockmendel |
I don't have access to SAS easily, but @paullilley answered. |
I do have access to SAS if needed. |
Thanks @paul-lilley. When I read the file with pyreadstat it comes back as |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff