-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: support writing NAT in datetime column #146
Merged
brendan-ward
merged 9 commits into
geopandas:main
from
theroggy:fix-writing-datetime-nat
Sep 13, 2022
+36
−26
Merged
Changes from 5 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
bf396fb
Trigger bug in test
theroggy a40ed8e
Add fix to write NAT values to datetime column
theroggy d93db18
Specify datatype in test
theroggy 6762069
Enable compare of nat values
theroggy 82e5404
Rename field
theroggy ad2895c
Update CHANGES.md
theroggy 36906c6
Remove check on None
theroggy 2cb5bf0
Merge remote-tracking branch 'upstream/main' into fix-writing-datetim…
theroggy d63e7e7
Update CHANGES.md
theroggy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it ever be None?
I am also wondering if we can do
np.isnat
more efficiently (knowing that this is basically the smallest int64 value). Although it might not matter much for performance, since below we convert to a Python datetime object, which will probably be much slower and dominate the time anyway (so if we want to improve performance here, if it mattered, we should probably first look there).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I was also wondering whether a check for nan would be useful or not. The only case I could imagine is if you are saving a string column into an existing datetime columns (when append is supported) or something like that, but it sounds a bit far fetched.
Possibly the same with None: never gonna happen in a numpy datetime column?
Indeed. When I implemented the datetime write support I noticed the performance is indeed bad. I experimented a bit back then trying to speed it up by implementing the parsing of the datetime using only c, functions, but I didn't get it working (immediately). So the python datetime is the big bottleneck, normally np.isnat even should be fast I think because the cdef version of numpy is imported (as well)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the pandas cython libraries, we have a bunch of code to convert a datime64 numpy int to the different fields, but that's quite a lot of code that we can't just copy/paste (and it's not public, so we also can't import it from pandas).
One possible avenue would be to convert the full array up front to the different fields (for that there are public features in pandas). That would help performance, but will increase memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some more testing regarding the "fieldvalue is None" check, and apparently
So, I removed is "fieldvalue is None" check and added a None case in the unittest, so the test demonstrates this behaviour.