Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datetime[ns] dtype #365

Conversation

xaviRodri
Copy link
Contributor

Added here a missing dtype needed to import datasets that contain data of this type.

Closes #364

Added here a missing dtype needed to import datasets that contain data of this type.
@josevalim
Copy link
Member

I believe we can merge this but we will probably need other changes on functions like Series.to_list for it to work properly

@josevalim
Copy link
Member

Maybe it is worth adding a test in the style of #362 for Parquet files?

In order to convert correctly nanoseconds into microseconds (the standard unit we use),
we have added `0.001` as its multiplication factor.

This implies some i64 into f64 conversions and vice versa.
@xaviRodri
Copy link
Contributor Author

Adding here a comment to track the process..

As you said @josevalim , we had to change more than the update from the first commit.
I finally tested the from_parquet/2 function with real data and works fine 👍

I was trying to add some tests for parquet files but I realised that I made some others fail...

Let me put some context:

When we encode datetimes, we use a multiplication factor (milliseconds -> 1000, microseconds -> 1, and now nanoseconds -> 0.001). In order to add this new one, I had to convert also the incoming microseconds (i64) first into a float (f64) using v as f64. Also tried f64::from(v) but "the trait From<i64> is not implemented for f64".

For some reason I cannot recognise yet, this casting is (in some cases) changing the incoming integer.
See an example

fn main() {
    let int_var:i64 = 12091941575702789;

    let converted:f64 = int_var as f64;

    println!("NUM: {:.1}", converted);
}

// Prints
NUM: 12091941575702788.0

I haven't worked with Rust before, so maybe someone more expert on it could help on this!
Thanks in advance!

@philss
Copy link
Member

philss commented Oct 14, 2022

Hi @xaviRodri 👋

For some reason I cannot recognise yet, this casting is (in some cases) changing the incoming integer.

This is due to the way float works. They cannot represent every number, and sometimes an approximation is necessary. You can find the same behavior in Elixir itself:

:erlang.float_to_binary(12091941575702789.0, decimals: 2)
#=> "12091941575702788.00"

You can play with this tool to see how the float is stored: https://www.h-schmidt.net/FloatConverter/IEEE754.html

In order to be able to parse datetimes with nanoseconds, we are now converting first the timestamp value depending on the time unit.
This minimizes the precision loss we could have as we treat the nanoseconds case separately.
Added tests to read and parse Parquet files.
Copy link
Member

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small detail about naming, but looks good to me! 👍

test/explorer/data_frame/parquet_test.ex Outdated Show resolved Hide resolved
test/explorer/data_frame/parquet_test.ex Outdated Show resolved Hide resolved
test/explorer/data_frame/parquet_test.ex Outdated Show resolved Hide resolved
@philss
Copy link
Member

philss commented Oct 25, 2022

@xaviRodri running MIX_ENV=test mix ci should fix the build. You could also jump into native/explorer and run cargo fmt :)

@xaviRodri
Copy link
Contributor Author

xaviRodri commented Oct 26, 2022

@philss looks like I'm having this "unreachable" warning here..
I will investigate it if you don't know what it can be 👍

Edit: I have run both mix ci & cargo fmt before pushing and I'm having the problem locally too..

@philss
Copy link
Member

philss commented Oct 26, 2022

@xaviRodri It's because you are covering all values of the TimeUnit enum. So the solution is just to remove that line :)

@xaviRodri
Copy link
Contributor Author

@philss if you are ok with it, I could rebase + squash all these commits, because looks like I ended up with many little fix commits here..

@philss
Copy link
Member

philss commented Oct 26, 2022

@xaviRodri this not necessary. We usually do a "squash and merge" to collapse into one commit :)

Copy link
Member

@philss philss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@josevalim josevalim merged commit c10bd57 into elixir-explorer:main Oct 27, 2022
@josevalim
Copy link
Member

awesome job! 💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing normalisation for "datetime[ns]" dtype
3 participants