Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recursive gc error that could occur with POSIXct columns #390

Merged
merged 5 commits into from
Dec 1, 2021
Merged

Fix recursive gc error that could occur with POSIXct columns #390

merged 5 commits into from
Dec 1, 2021

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Dec 1, 2021

Closes #389

The analysis in this comment was correct #389 (comment), the output_column() method for POSIXct was creating another character ALTREP vector after vroom_convert() was called.

I didn't add a test, because this is hard to test for, but I confirmed that the reprex here no longer errors #389 (comment)


POSIXct isn't necessarily the only type where this could happen.

output_column.POSIXt() calls format(), and what is happening in format() is:

  • There is some C code that converts POSIXct -> character and returns a non-altrep character vector called y
  • At the R level, there is some code that adds names onto y
  • names<- at the C level will create an “ALTREP wrapper” around y. It allows you to wrap y without duplicating it, and then add new attributes on top of it (like names) cheaply
# part of format.POSIXlt
y <- .Internal(format.POSIXlt(x, format, usetz))
names(y) <- names(x$year)
  • There is additionally a structure() call in format.POSIXct() that tries to add names on again to the result of format.POSIXlt(). This is probably also creating another ALTREP wrapper layer on top of the first one.
# part of format.POSIXct
structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz,  ...), names = names(x))

So it is entirely possible that other output_column() methods could do this, but I don’t think any of the currently implemented ones do.

Since we want to ensure that this is the last manipulation done to `x` before writing, and putting it close to the write call will help with that
Copy link
Member

@romainfrancois romainfrancois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DavisVaughan DavisVaughan requested a review from jennybc December 1, 2021 15:13
Copy link
Member

@jennybc jennybc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad we can't add a test, but I agree that it's impractical.

I've tried to look at this with a Chesterton's Fence perspective and it seems likely that the initial order (materialize, convert to character) seemed most natural and it's easy to overlook that character conversion can re-introduce ALTREP vectors (especially since POSIXt may be the only current example).

NEWS.md Outdated Show resolved Hide resolved
Co-authored-by: Jennifer (Jenny) Bryan <[email protected]>
@DavisVaughan DavisVaughan merged commit cfc1824 into tidyverse:main Dec 1, 2021
@DavisVaughan DavisVaughan deleted the fix/altrep-vroom-write branch December 1, 2021 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recursive gc error
3 participants