Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect copycols when building DataFrame from Tables.CopiedColumns #2656

Merged
merged 2 commits into from
Mar 12, 2021

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Mar 12, 2021

Fixes apache/arrow-julia#146. The issue here
was we were just ignoring copycols entirely when building DataFrame
from Tables.CopiedColumns. Which is mostly fine because
CopiedColumns implies DataFrames can assume ownership of the columns.
The wrinkle comes in when you consider cases like Arrow.Table, which
wraps columns in Tables.CopiedColumns, not necessarily because copies
have been made, but because the data is immutable, so it should be safe
for other tables to assume ownership. The problem is when users want to
mutate those columns, they need to make copies of them, so naturally
users try DataFrame(::Arrow.Table; copycols=true), but then that
doesn't actually make copies.

The proposal here just intercepts the top-level
DataFrame(::CopiedColumns) constructor and passes copycols=false as
the default, so if users pass copycols=true, it will still be
respected.

Fixes apache/arrow-julia#146. The issue here
was we were just ignoring `copycols` entirely when building DataFrame
from `Tables.CopiedColumns`. Which is mostly fine because
`CopiedColumns` implies DataFrames can assume ownership of the columns.
The wrinkle comes in when you consider cases like `Arrow.Table`, which
wraps columns in `Tables.CopiedColumns`, not necessarily because copies
have been made, but because the data is immutable, so it should be safe
for other tables to assume ownership. The problem is when users want to
_mutate_ those columns, they need to make copies of them, so naturally
users try `DataFrame(::Arrow.Table; copycols=true)`, but then that
doesn't actually make copies.

The proposal here just intercepts the top-level
`DataFrame(::CopiedColumns)` constructor and passes `copycols=false` as
the default, so if users pass `copycols=true`, it will still be
respected.
@bkamins bkamins added the ecosystem Issues in DataFrames.jl ecosystem label Mar 12, 2021
@bkamins bkamins added this to the 1.0 milestone Mar 12, 2021
src/other/tables.jl Outdated Show resolved Hide resolved
@quinnj quinnj merged commit 140a3e0 into main Mar 12, 2021
@quinnj quinnj deleted the jq/copiedcolumns branch March 12, 2021 17:35
bkamins added a commit that referenced this pull request Mar 31, 2021
…2656)

* Respect copycols when building DataFrame from Tables.CopiedColumns

Fixes apache/arrow-julia#146. The issue here
was we were just ignoring `copycols` entirely when building DataFrame
from `Tables.CopiedColumns`. Which is mostly fine because
`CopiedColumns` implies DataFrames can assume ownership of the columns.
The wrinkle comes in when you consider cases like `Arrow.Table`, which
wraps columns in `Tables.CopiedColumns`, not necessarily because copies
have been made, but because the data is immutable, so it should be safe
for other tables to assume ownership. The problem is when users want to
_mutate_ those columns, they need to make copies of them, so naturally
users try `DataFrame(::Arrow.Table; copycols=true)`, but then that
doesn't actually make copies.

The proposal here just intercepts the top-level
`DataFrame(::CopiedColumns)` constructor and passes `copycols=false` as
the default, so if users pass `copycols=true`, it will still be
respected.

* Update src/other/tables.jl

Co-authored-by: Bogumił Kamiński <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecosystem Issues in DataFrames.jl ecosystem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

copying Arrow.Table does not always copy columns
2 participants