Add a function for combine/vcat with source names #659

johnmyleswhite · 2014-07-27T15:49:06Z

Should we add a function like vcat(df1, df2, df3, source = ["X", "Y", "Z"]) which concatenates DataFrames while also labeling their origins? In this example, the output would be equivalent to doing:

df1_alt = copy(df1)
df1_alt[:source] = "X"
df2_alt = copy(df2)
df2_alt[:source] = "Y"
df3_alt = copy(df3)
df3_alt[:source] = "Z"

vcatted = vcat(df1_alt, df2_alt, df3_alt)

The idea is to preserve the semantics of Base.vcat, but allow the introduction of a custom column (whose name comes from a keyword arg) that denotes the origin of each subset of data.

The text was updated successfully, but these errors were encountered:

HarlanH · 2014-07-27T16:16:02Z

I like the idea.

Could also/instead be vcat(["X" => df1, "Y" => df2, "Z" => df3])
maybe...?

It might also be framable in the split-apply-combine framework as some
variation on just a combine step.

On Sun, Jul 27, 2014 at 11:49 AM, John Myles White <[email protected]

wrote:

Should we add a function like vcat(df1, df2, df3, source = ["X", "Y",
"Z"]) which concatenates DataFrames while also labeling their origins? In
this example, the output would be equivalent to doing:

df1_alt = copy(df1)
df1_alt[:source] = "X"
df2_alt = copy(df2)
df2_alt[:source] = "Y"
df3_alt = copy(df3)
df3_alt[:source] = "Z"

vcatted = vcat(df1_alt, df2_alt, df3_alt)

The idea is to preserve the semantics of Base.vcat, but allow the
introduction of a custom column (whose name comes from a keyword arg) that
denotes the origin of each subset of data.

—
Reply to this email directly or view it on GitHub
#659.

johnmyleswhite · 2014-07-27T17:53:18Z

If you do vcat(["X" => df1, "Y" => df2, "Z" => df3]), how do you figure out what the extra column' name should be?

HarlanH · 2014-07-27T19:34:37Z

name="source" default argument? I'd mildly prefer that, as separating the
names from the values seems less elegant then putting them next to each
other. Or maybe support both, to preserve the Base.vcat semantics? Or
what about vcat(X = df1, Y = df2, Z = df3)? All optional arguments isn't
great for multiple dispatch though...

On Sun, Jul 27, 2014 at 1:53 PM, John Myles White [email protected]
wrote:

If you do vcat(["X" => df1, "Y" => df2, "Z" => df3]), how do you figure
out what the extra column' name should be?

—
Reply to this email directly or view it on GitHub
#659 (comment)
.

jwmerrill · 2014-08-19T15:06:50Z

I was just looking at the tidy data paper, and it sounds like this is similar to ldply from plyr.

quinnj · 2017-09-08T03:42:58Z

Anyone still interested in this? Could be useful if someone wants to take a stab at it. @cjprybol ?

bkamins · 2020-02-12T12:40:29Z

@oxinabox seems to also want it in join 😄.

bkamins · 2021-03-09T19:00:45Z

Anyone still interested in this?

I have added it in #2649

johnmyleswhite added decision labels Jul 27, 2014

bramtayl mentioned this issue Jun 6, 2016

vcat for a single dataframe #989

Closed

nalimilan mentioned this issue Sep 20, 2018

Append column of all one value without knowing length #1339

Closed

nalimilan added intro issue and removed decision labels Sep 20, 2018

nalimilan added the Hacktoberfest label Oct 2, 2018

bkamins mentioned this issue Jan 15, 2019

DataFrames.jl roadmap #1678

Closed

31 tasks

bkamins added non-breaking The proposed change is not breaking and removed Hacktoberfest intro issue labels Feb 12, 2020

bkamins modified the milestones: 1.x, 1.0 Mar 7, 2021

bkamins mentioned this issue Mar 9, 2021

add vcat with source; deprecate indicator in joins in favor of source #2649

Merged

bkamins closed this as completed in #2649 Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a function for combine/vcat with source names #659

Add a function for combine/vcat with source names #659

johnmyleswhite commented Jul 27, 2014

HarlanH commented Jul 27, 2014

johnmyleswhite commented Jul 27, 2014

HarlanH commented Jul 27, 2014

jwmerrill commented Aug 19, 2014

quinnj commented Sep 8, 2017

bkamins commented Feb 12, 2020

bkamins commented Mar 9, 2021

Add a function for combine/vcat with source names #659

Add a function for combine/vcat with source names #659

Comments

johnmyleswhite commented Jul 27, 2014

HarlanH commented Jul 27, 2014

johnmyleswhite commented Jul 27, 2014

HarlanH commented Jul 27, 2014

jwmerrill commented Aug 19, 2014

quinnj commented Sep 8, 2017

bkamins commented Feb 12, 2020

bkamins commented Mar 9, 2021