[BUG] fastparquet tests fail on Databricks 13.3 due to NaNs becoming nulls when converting from pandas #9778

jlowe · 2023-11-17T18:24:28Z

On Databricks 13.3, nulls in the Pandas DataFrame (represented as NaNs) are being honored as nulls in the resulting Spark DataFrame when converting a Pandas DataFrame to a Spark DataFrame. Pandas thinks there are nulls in the data, and those nulls are propagating to the Spark DataFrame.

fastparquet loads the NaNs properly, but then when converting the data to pandas, pandas thinks the NaN values are null. This, in turn, causes spark.createDataFrame to produce corresponding nulls. When comparing this to the GPU direct load of the data that contains NaNs (not nulls), the test fails. The problem is not in the way the GPU loads the data, it's the way the NaNs get converted into nulls due to sending the data through pandas before converting to a Spark DataFrame.

sameerz · 2023-11-21T21:57:24Z

Duplicate of #9776

mythrocks · 2023-11-21T22:52:43Z

Duplicate of #9776

Sorry, no, it's not. #9776 is for timestamps. This failure is for floating point types.

This is a dupe of #9767, though. And it's been xfailed as part of #9677.

jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify test Only impacts tests labels Nov 17, 2023

jlowe mentioned this issue Nov 20, 2023

[BUG] fastparquet test fails with DATAGEN_SEED=1700171382 on Databricks (Spark 3.4.1) #9767

Open

sameerz added duplicate This issue or pull request already exists and removed ? - Needs Triage Need team to review and classify labels Nov 21, 2023

sameerz marked this as a duplicate of #9776 Nov 21, 2023

sameerz closed this as completed Nov 21, 2023

mythrocks reopened this Nov 21, 2023

mythrocks closed this as completed Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] fastparquet tests fail on Databricks 13.3 due to NaNs becoming nulls when converting from pandas #9778

[BUG] fastparquet tests fail on Databricks 13.3 due to NaNs becoming nulls when converting from pandas #9778

jlowe commented Nov 17, 2023

sameerz commented Nov 21, 2023

mythrocks commented Nov 21, 2023 •

edited

Loading

[BUG] fastparquet tests fail on Databricks 13.3 due to NaNs becoming nulls when converting from pandas #9778

[BUG] fastparquet tests fail on Databricks 13.3 due to NaNs becoming nulls when converting from pandas #9778

Comments

jlowe commented Nov 17, 2023

sameerz commented Nov 21, 2023

mythrocks commented Nov 21, 2023 • edited Loading

mythrocks commented Nov 21, 2023 •

edited

Loading