-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add datetime[ns]
dtype
#365
Add datetime[ns]
dtype
#365
Conversation
Added here a missing dtype needed to import datasets that contain data of this type.
I believe we can merge this but we will probably need other changes on functions like Series.to_list for it to work properly |
Maybe it is worth adding a test in the style of #362 for Parquet files? |
In order to convert correctly nanoseconds into microseconds (the standard unit we use), we have added `0.001` as its multiplication factor. This implies some i64 into f64 conversions and vice versa.
Adding here a comment to track the process.. As you said @josevalim , we had to change more than the update from the first commit. I was trying to add some tests for parquet files but I realised that I made some others fail... Let me put some context:
For some reason I cannot recognise yet, this casting is (in some cases) changing the incoming integer. fn main() {
let int_var:i64 = 12091941575702789;
let converted:f64 = int_var as f64;
println!("NUM: {:.1}", converted);
}
// Prints
NUM: 12091941575702788.0 I haven't worked with Rust before, so maybe someone more expert on it could help on this! |
Hi @xaviRodri 👋
This is due to the way float works. They cannot represent every number, and sometimes an approximation is necessary. You can find the same behavior in Elixir itself: :erlang.float_to_binary(12091941575702789.0, decimals: 2)
#=> "12091941575702788.00" You can play with this tool to see how the float is stored: https://www.h-schmidt.net/FloatConverter/IEEE754.html |
In order to be able to parse datetimes with nanoseconds, we are now converting first the timestamp value depending on the time unit. This minimizes the precision loss we could have as we treat the nanoseconds case separately.
Added tests to read and parse Parquet files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small detail about naming, but looks good to me! 👍
@xaviRodri running |
@philss looks like I'm having this "unreachable" warning here.. Edit: I have run both |
@xaviRodri It's because you are covering all values of the |
@philss if you are ok with it, I could rebase + squash all these commits, because looks like I ended up with many little fix commits here.. |
@xaviRodri this not necessary. We usually do a "squash and merge" to collapse into one commit :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome job! 💚 💙 💜 💛 ❤️ |
Added here a missing dtype needed to import datasets that contain data of this type.
Closes #364