Add `datetime[ns]` dtype #365

xaviRodri · 2022-10-08T17:41:38Z

Added here a missing dtype needed to import datasets that contain data of this type.

Closes #364

Added here a missing dtype needed to import datasets that contain data of this type.

josevalim · 2022-10-08T17:45:52Z

I believe we can merge this but we will probably need other changes on functions like Series.to_list for it to work properly

josevalim · 2022-10-08T17:48:22Z

Maybe it is worth adding a test in the style of #362 for Parquet files?

In order to convert correctly nanoseconds into microseconds (the standard unit we use), we have added `0.001` as its multiplication factor. This implies some i64 into f64 conversions and vice versa.

xaviRodri · 2022-10-14T09:12:58Z

Adding here a comment to track the process..

As you said @josevalim , we had to change more than the update from the first commit.
I finally tested the from_parquet/2 function with real data and works fine 👍

I was trying to add some tests for parquet files but I realised that I made some others fail...

Let me put some context:

When we encode datetimes, we use a multiplication factor (milliseconds -> 1000, microseconds -> 1, and now nanoseconds -> 0.001). In order to add this new one, I had to convert also the incoming microseconds (i64) first into a float (f64) using v as f64. Also tried f64::from(v) but "the trait From<i64> is not implemented for f64".

For some reason I cannot recognise yet, this casting is (in some cases) changing the incoming integer.
See an example

fn main() {
    let int_var:i64 = 12091941575702789;

    let converted:f64 = int_var as f64;

    println!("NUM: {:.1}", converted);
}

// Prints
NUM: 12091941575702788.0

I haven't worked with Rust before, so maybe someone more expert on it could help on this!
Thanks in advance!

philss · 2022-10-14T18:14:23Z

Hi @xaviRodri 👋

For some reason I cannot recognise yet, this casting is (in some cases) changing the incoming integer.

This is due to the way float works. They cannot represent every number, and sometimes an approximation is necessary. You can find the same behavior in Elixir itself:

:erlang.float_to_binary(12091941575702789.0, decimals: 2)
#=> "12091941575702788.00"

You can play with this tool to see how the float is stored: https://www.h-schmidt.net/FloatConverter/IEEE754.html

native/explorer/src/encoding.rs

In order to be able to parse datetimes with nanoseconds, we are now converting first the timestamp value depending on the time unit. This minimizes the precision loss we could have as we treat the nanoseconds case separately.

Added tests to read and parse Parquet files.

philss

Just a small detail about naming, but looks good to me! 👍

test/explorer/data_frame/parquet_test.ex

philss · 2022-10-25T19:06:16Z

@xaviRodri running MIX_ENV=test mix ci should fix the build. You could also jump into native/explorer and run cargo fmt :)

xaviRodri · 2022-10-26T15:41:17Z

@philss looks like I'm having this "unreachable" warning here..
I will investigate it if you don't know what it can be 👍

Edit: I have run both mix ci & cargo fmt before pushing and I'm having the problem locally too..

philss · 2022-10-26T15:47:19Z

@xaviRodri It's because you are covering all values of the TimeUnit enum. So the solution is just to remove that line :)

xaviRodri · 2022-10-26T16:02:03Z

@philss if you are ok with it, I could rebase + squash all these commits, because looks like I ended up with many little fix commits here..

philss · 2022-10-26T16:03:58Z

@xaviRodri this not necessary. We usually do a "squash and merge" to collapse into one commit :)

philss

josevalim · 2022-10-27T11:17:10Z

awesome job! 💚 💙 💜 💛 ❤️

Add datetime[ns] dtype

570b917

Added here a missing dtype needed to import datasets that contain data of this type.

Add multiplication factor for nanoseconds

0db6787

In order to convert correctly nanoseconds into microseconds (the standard unit we use), we have added `0.001` as its multiplication factor. This implies some i64 into f64 conversions and vice versa.

philss reviewed Oct 21, 2022

View reviewed changes

native/explorer/src/encoding.rs Outdated Show resolved Hide resolved

xaviRodri added 2 commits October 25, 2022 20:49

Added time to microseconds conversion

27a8942

In order to be able to parse datetimes with nanoseconds, we are now converting first the timestamp value depending on the time unit. This minimizes the precision loss we could have as we treat the nanoseconds case separately.

Add Parquet tests

a90628c

Added tests to read and parse Parquet files.

philss reviewed Oct 25, 2022

View reviewed changes

test/explorer/data_frame/parquet_test.ex Outdated Show resolved Hide resolved

test/explorer/data_frame/parquet_test.ex Outdated Show resolved Hide resolved

test/explorer/data_frame/parquet_test.ex Outdated Show resolved Hide resolved

xaviRodri added 2 commits October 26, 2022 17:23

Updated tests with suggestions

c103ad9

Format native code

68d30cd

Delete unreachable match

6b3a153

philss approved these changes Oct 26, 2022

View reviewed changes

josevalim merged commit c10bd57 into elixir-explorer:main Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `datetime[ns]` dtype #365

Add `datetime[ns]` dtype #365

xaviRodri commented Oct 8, 2022

josevalim commented Oct 8, 2022

josevalim commented Oct 8, 2022

xaviRodri commented Oct 14, 2022

philss commented Oct 14, 2022

philss left a comment

philss commented Oct 25, 2022

xaviRodri commented Oct 26, 2022 •

edited

Loading

philss commented Oct 26, 2022

xaviRodri commented Oct 26, 2022

philss commented Oct 26, 2022

philss left a comment

josevalim commented Oct 27, 2022

Add datetime[ns] dtype #365

Add datetime[ns] dtype #365

Conversation

xaviRodri commented Oct 8, 2022

josevalim commented Oct 8, 2022

josevalim commented Oct 8, 2022

xaviRodri commented Oct 14, 2022

philss commented Oct 14, 2022

philss left a comment

Choose a reason for hiding this comment

philss commented Oct 25, 2022

xaviRodri commented Oct 26, 2022 • edited Loading

philss commented Oct 26, 2022

xaviRodri commented Oct 26, 2022

philss commented Oct 26, 2022

philss left a comment

Choose a reason for hiding this comment

josevalim commented Oct 27, 2022

Add `datetime[ns]` dtype #365

Add `datetime[ns]` dtype #365

xaviRodri commented Oct 26, 2022 •

edited

Loading