Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetimes are different after gs4 write/read round trip #273

Closed
allenbaron opened this issue Oct 24, 2022 · 10 comments
Closed

datetimes are different after gs4 write/read round trip #273

allenbaron opened this issue Oct 24, 2022 · 10 comments
Milestone

Comments

@allenbaron
Copy link

Writing datetimes to a Google Sheet with googlesheets4 and then re-reading it with the column formatted as a datetime alters the value. The change is very small and is not always noticeable with R's default print options.

I've included only a single datetime in the reprex because it shows a difference after printing.

llibrary(googledrive)
library(googlesheets4)
library(readr)

# google drive auth here

ss <- googlesheets4::gs4_create("gs4_dt_roundtrip")
#> ✔ Creating new Sheet: "gs4_dt_roundtrip".


df <- data.frame(
  dt = structure(1660163842, tzone = "UTC", class = c("POSIXct","POSIXt"))
)
df
#>                    dt
#> 1 2022-08-10 20:37:22

googlesheets4::sheet_write(df, ss, sheet = 1)
#> ✔ Writing to "gs4_dt_roundtrip".
#> ✔ Writing to sheet 'Sheet1'.


# Re-read as datetime = DIFFERENT VALUE -----------------------------------
df2 <- googlesheets4::read_sheet(ss, sheet = 1, col_types = "T")
#> ✔ Reading from "gs4_dt_roundtrip".
#> ✔ Range ''Sheet1''.

df2
#> # A tibble: 1 × 1
#>   dt                 
#>   <dttm>             
#> 1 2022-08-10 20:37:21

df$dt - df2$dt
#> Time difference of 4.768372e-07 secs
print(as.numeric(df2$dt), digits = 20)
#> [1] 1660163841.9999995232


# Re-read as character = SAME VALUE ---------------------------------------
df_chr <- googlesheets4::read_sheet(ss, sheet = 1, col_types = "c")
#> ✔ Reading from "gs4_dt_roundtrip".
#> ✔ Range ''Sheet1''.

df_chr$dt <- readr::parse_guess(df_chr$dt)
df_chr
#> # A tibble: 1 × 1
#>   dt                 
#>   <dttm>             
#> 1 2022-08-10 20:37:22

df$dt - df_chr$dt
#> Time difference of 0 secs
print(as.numeric(df_chr$dt), digits = 20)
#> [1] 1660163842

googledrive::drive_trash(ss)
#> File trashed:
#> • 'gs4_dt_roundtrip' <id: 1bJX_GYI1eAx11tC1TgKlfOfpSaTFywPhQqcafKB8MG4>
Created on 2022-10-24 with [reprex v2.0.2](https://reprex.tidyverse.org/)
@jennybc
Copy link
Member

jennybc commented Mar 23, 2023

I'm doing issue triage prior to a release that's under deadline.

But from a quick read, I think this may be a combination of R FAQ 7.31 (https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) with a light sprinkling of (insert something about POSIXct formatting here). I feel like there's something I can point to about POSIXct formatting and hopefully I'll come back to add it.

This is likely a #wontfix.

@jennybc
Copy link
Member

jennybc commented Mar 23, 2023

Coming back just to park a small example from @DavisVaughan about some of the joys of POSIXct. Again, I'm not really digging it to this just yet, but want to leave myself some notes.

> options(digits.secs = 20)
> Sys.time()
[1] "2023-03-23 15:47:08.864708 EDT"
> unclass(Sys.time())
[1] 1679600841
> options(digits = 20)
> unclass(Sys.time())
[1] 1679600848.7778339386

@DavisVaughan
Copy link
Member

A little investigation of my own.

The date-time we used above is:

1660163842

converting that to google sheet form using the algorithm in as_CellData.POSIXct gives us:

# (1660163842 / 86400) + 25569
44783.859282407407591

now, deep in the stack we eventually call httr::POST() to send the data over the wire to google sheets. This ends up calling jsonlite::toJSON(body, auto_unbox = TRUE, digits = 22) which gives us:

44783.8592824074

So we lost some precision. And indeed if you go to the google sheet that was created and convert the cell to Number format you'll see 44783.8592824074 in there.

If you reverse the algorithm with this you get:

# (44783.8592824074 - 25569) * 86400
1660163841.9999992847

# displays as "2022-08-10 20:37:21 UTC"

It is also worth noting that googlesheets4 doesn't "reverse the algorithm" in exactly this way, but i think it should. Instead inside googlesheets4:::as_datetime() it ends up doing the multiplication first, then the subtraction, like:

# (44783.8592824074 * 86400) - (25569 * 86400) 
1660163841.9999995232

Note that doing the math that way gave a slightly different answer than above, and I think it is slightly less correct, but this does reproduce the time difference of 4.768372e-07 reported in the original comment above.

@DavisVaughan
Copy link
Member

DavisVaughan commented Mar 23, 2023

And that jsonlite failure seems to come from an internal helper called num_to_char() which is a C algorithm that seems to differ from format()

Browse[14]> jsonlite:::num_to_char(44783.859282407407591, digits = 22)
[1] "44783.8592824074"
Browse[14]> format(44783.859282407407591, digits = 22)
[1] "44783.85928240740759065"

@DavisVaughan
Copy link
Member

It seems to limit digits to 17 at some point
https://github.com/jeroen/jsonlite/blob/467923567ce2c2d535714cc3908a6912000a29ae/src/num_to_char.c#L65

@jeroen, any idea why that is?

@DavisVaughan
Copy link
Member

Ha oh shoot if you just update to dev jsonlite then the jsonlite part of this is actually fixed:

> unclass(df$dt)/86400 + 25569
[1] 44783.85928240740759065
attr(,"tzone")
[1] "UTC"
> jsonlite::fromJSON(jsonlite::toJSON(unclass(df$dt)/86400 + 25569, digits = 22))
[1] 44783.85928240740759065

Fixed by jeroen/jsonlite#413, which raised the cap from 15 to 17 digits (and 17 actually works)

It was argued for here jeroen/jsonlite#350, since IEEE754 doubles converted to string with 17 significant digits must be able to roundtrip themselves

@jennybc jennybc added this to the v1.1.1 milestone Jun 2, 2023
@jennybc
Copy link
Member

jennybc commented Jun 4, 2023

I did a little study of this and, at least with this example, the jsonlite fix is the only thing that touches it. Changing the order of operations in googlesheets4 when converting from a spreadsheet date time to POSIXct didn't help (in fact the time difference got worse). So I think answer here is that users who need this level of precision right now should install dev jsonlite. And presumably a released version will appear in due course.

@jeroen Can you note this down as a reason in favor of a jsonlite release? TL;DR the 17 significant digits are necessary for roundtripping datetimes between R and Google Sheets.

@jennybc jennybc closed this as completed Jun 4, 2023
@jeroen
Copy link

jeroen commented Jun 5, 2023

OK I'll do that today.

@jeroen
Copy link

jeroen commented Jun 26, 2023

Where exactly does jsonlite get invoked in this stack? We might have to backpaddle a little bit on the default in jsonlite because of jeroen/jsonlite#420, but if you explicitly set digits = 17 then you should be fine.

@jennybc
Copy link
Member

jennybc commented Jun 26, 2023

I can't re-trace this right this moment, but I have no reason to doubt @DavisVaughan's research described above in #273 (comment).

now, deep in the stack we eventually call httr::POST() to send the data over the wire to google sheets. This ends up calling jsonlite::toJSON(body, auto_unbox = TRUE, digits = 22)

Also based on my memory, I think the jsonlite usage, on the "write" side, is likely to all be going through httr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants