Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function for combine/vcat with source names #659

Closed
johnmyleswhite opened this issue Jul 27, 2014 · 7 comments · Fixed by #2649
Closed

Add a function for combine/vcat with source names #659

johnmyleswhite opened this issue Jul 27, 2014 · 7 comments · Fixed by #2649
Labels
feature non-breaking The proposed change is not breaking
Milestone

Comments

@johnmyleswhite
Copy link
Contributor

Should we add a function like vcat(df1, df2, df3, source = ["X", "Y", "Z"]) which concatenates DataFrames while also labeling their origins? In this example, the output would be equivalent to doing:

df1_alt = copy(df1)
df1_alt[:source] = "X"
df2_alt = copy(df2)
df2_alt[:source] = "Y"
df3_alt = copy(df3)
df3_alt[:source] = "Z"

vcatted = vcat(df1_alt, df2_alt, df3_alt)

The idea is to preserve the semantics of Base.vcat, but allow the introduction of a custom column (whose name comes from a keyword arg) that denotes the origin of each subset of data.

@HarlanH
Copy link
Contributor

HarlanH commented Jul 27, 2014

I like the idea.

Could also/instead be vcat(["X" => df1, "Y" => df2, "Z" => df3])
maybe...?

It might also be framable in the split-apply-combine framework as some
variation on just a combine step.

On Sun, Jul 27, 2014 at 11:49 AM, John Myles White <[email protected]

wrote:

Should we add a function like vcat(df1, df2, df3, source = ["X", "Y",
"Z"]) which concatenates DataFrames while also labeling their origins? In
this example, the output would be equivalent to doing:

df1_alt = copy(df1)
df1_alt[:source] = "X"
df2_alt = copy(df2)
df2_alt[:source] = "Y"
df3_alt = copy(df3)
df3_alt[:source] = "Z"

vcatted = vcat(df1_alt, df2_alt, df3_alt)

The idea is to preserve the semantics of Base.vcat, but allow the
introduction of a custom column (whose name comes from a keyword arg) that
denotes the origin of each subset of data.


Reply to this email directly or view it on GitHub
#659.

@johnmyleswhite
Copy link
Contributor Author

If you do vcat(["X" => df1, "Y" => df2, "Z" => df3]), how do you figure out what the extra column' name should be?

@HarlanH
Copy link
Contributor

HarlanH commented Jul 27, 2014

name="source" default argument? I'd mildly prefer that, as separating the
names from the values seems less elegant then putting them next to each
other. Or maybe support both, to preserve the Base.vcat semantics? Or
what about vcat(X = df1, Y = df2, Z = df3)? All optional arguments isn't
great for multiple dispatch though...

On Sun, Jul 27, 2014 at 1:53 PM, John Myles White [email protected]
wrote:

If you do vcat(["X" => df1, "Y" => df2, "Z" => df3]), how do you figure
out what the extra column' name should be?


Reply to this email directly or view it on GitHub
#659 (comment)
.

@jwmerrill
Copy link

I was just looking at the tidy data paper, and it sounds like this is similar to ldply from plyr.

@quinnj
Copy link
Member

quinnj commented Sep 8, 2017

Anyone still interested in this? Could be useful if someone wants to take a stab at it. @cjprybol ?

@bkamins
Copy link
Member

bkamins commented Feb 12, 2020

@oxinabox seems to also want it in join 😄.

@bkamins
Copy link
Member

bkamins commented Mar 9, 2021

Anyone still interested in this?

I have added it in #2649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature non-breaking The proposed change is not breaking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants