[ArrayManager] REF: Implement concat with reindexing #39612

jorisvandenbossche · 2021-02-05T12:15:00Z

xref #39146 (the concat work item)

This PR implements concat for DataFrames using the ArrayManager, and all tests in pandas/tests/reshape/ already pass.

Summary of the changes:

In internals/concat.py, I added a concatenate_block_managers equivalent for ArrayManagers. This function rather simply reindexes the managers and then concats the arrays (axis=0) or combines them in a single manager (axis=1).
The main problem with that simple approach is that when reindexing the ArrayManager with the given indexer, we do not yet know the resulting dtype for when a new all-NA column needs to be added (i.e. the common dtype we would find when actually concatting). And we can't make this eg an all-NaN object array, as that would influence the find_common_type logic.
The way I tried to solve this at the moment is by adding a dummy "proxy" object to signal such an all-NA array (the NullArrayProxy class), only tracking its length. And when actually concatting, we can replace this with an actual all-NA array after we determined the resulting dtype. This proxy object only lives inside the concatting code (it only gets created when we will call concat_compat and inside this function they always get replaced), so the user will never see this.
An alternative could be to first determine what the resulting dtype will be in internals/concat.py so we can pass this to the reindex functionality, and after reindexing do the actual concatenation. But IMO that will be more complex; the concat / find_common_type logic gets involved twice (before and after reindexing), and getting those dtypes for which we want to know the common type is also quite tricky before actually reindexing the manager.

Basically, this replaces the 600 lines of code of internals/concat.py with 25 lines of concatenate_array_managers + the NullArrayProxy and _array_from_proxy code in dtypes/concat.py (but I am quite likely still missing some corner cases!)

Note: I did not explicitly test copy=True/False behaviour for concat along the columns with this new code.

cc @jreback @jbrockmendel

jorisvandenbossche · 2021-02-05T12:19:10Z

pandas/core/dtypes/concat.py

+def concat_arrays(to_concat):
+    """
+    Alternative for concat_compat but specialized for use in the ArrayManager.
+
+    Differences: only deals with 1D arrays (no axis keyword) and does not skip
+    empty arrays to determine the dtype.
+    In addition ensures that all NullArrayProxies get replaced with actual
+    arrays.


This concat_arrays is basically a copied+adapated version of concat_compat. I initially adapted concat_compat to cover both cases.

But that resulted in quite some special cases inside concat_compat (checking the proxy objects, while this is not needed for when concat_compat is used outside ArrayManager, + not skipping empties for which I added a keyword).
So in the end I decided to split the changes in a separate function, so both concat_compat and concat_arrays are more readable / easier to follow, but it certainly gives some duplication.

pandas/core/dtypes/concat.py

jorisvandenbossche · 2021-02-05T12:25:05Z

pandas/core/dtypes/concat.py

+    """
+    if is_extension_array_dtype(dtype):
+        return dtype.construct_array_type()._from_sequence(
+            [dtype.na_value] * arr.n, dtype=dtype


We should maybe consider adding a method to the ExtensionArray interface to create an empty/all-NA array of the given dtype, to avoid this rather inefficient "construct from list of NA scalars".

IIRC this can break for dtypes that cant hold NA. the more robust version is

cls = dtype.construct_array_type() empty = cls._from_sequence([], dtype=dtype) indexer = -np.ones(arr., dtype=np.intp) return empty.take(indexer)

(in internals concat we use this in one place and when i tried to change it to what you have here something broke, dont remember what off the top of my head)

Do we have an example dtype that cannot hold NAs?

something like Sparse[int]

Indeed:

In [5]: dtype = pd.SparseDtype("int") In [6]: dtype Out[6]: Sparse[int64, 0] In [7]: dtype.construct_array_type()._from_sequence([dtype.na_value] * 5, dtype=dtype) ... ValueError: Cannot convert non-finite values (NA or inf) to integer

Will use your suggested code for now

I opened #39776 for the idea of adding a method to the EA interface for creating an empty/all-NA array from a given dtype.

jreback · 2021-02-05T14:59:15Z

pandas/core/dtypes/concat.py

+    """
+    if is_extension_array_dtype(dtype):
+        return dtype.construct_array_type()._from_sequence(
+            [dtype.na_value] * arr.n, dtype=dtype


pandas/core/dtypes/concat.py

jreback · 2021-02-05T15:01:16Z

pandas/core/dtypes/concat.py

@@ -132,6 +204,75 @@ def is_nonempty(x) -> bool:
    return np.concatenate(to_concat, axis=axis)


+def concat_arrays(to_concat):


types if you can

pandas/core/dtypes/concat.py

jorisvandenbossche · 2021-02-09T14:08:03Z

I pushed some updates.

Given that the review comments are relatively minor points (but useful! to be clear), does that mean that you are OK with the general approach (of using the temporary proxy class)?

This should be ready for another review round.

jorisvandenbossche · 2021-02-09T14:26:57Z

I ran the concat/join/merge benchmarks comparing ArrayManager with BlockManager (it's running the benchmarks on this branch with the default of using BlockManager ("before") vs this branch with one addtitional commit changing the default setting to ArrayManager ("after"), so green is slower and red is faster for the ArrayManager):

$ asv continuous -f 1.01 -b join_merge HEAD~1 HEAD

       before           after         ratio
     [555d7ac1]       [d0c94a51]
     <am-concat~1>       <am-concat>
+        40.4±3ms       93.3±0.3ms     2.31  join_merge.ConcatDataFrames.time_c_ordered(0, True)
+        40.7±3ms       93.2±0.4ms     2.29  join_merge.ConcatDataFrames.time_c_ordered(0, False)
+        44.2±5ms       93.4±0.2ms     2.11  join_merge.ConcatDataFrames.time_f_ordered(0, True)
+        44.6±4ms       93.0±0.2ms     2.09  join_merge.ConcatDataFrames.time_f_ordered(0, False)
+       304±0.8ms        309±0.7ms     1.02  join_merge.MergeCategoricals.time_merge_cat
-         191±2ms        187±0.9ms     0.98  join_merge.MergeAsof.time_by_int('forward', 5)
-       288±0.4ms        282±0.7ms     0.98  join_merge.MergeAsof.time_by_object('nearest', None)
-       289±0.2ms        282±0.1ms     0.98  join_merge.MergeAsof.time_by_object('nearest', 5)
-     18.5±0.06ms      18.0±0.04ms     0.97  join_merge.Merge.time_merge_dataframe_integer_2key(True)
-       206±0.4ms        199±0.4ms     0.97  join_merge.MergeAsof.time_by_object('forward', None)
-       206±0.5ms       199±0.04ms     0.96  join_merge.MergeAsof.time_by_object('forward', 5)
-     6.98±0.09ms      6.60±0.02ms     0.95  join_merge.Merge.time_merge_dataframe_integer_2key(False)
-        90.8±1ms       85.4±0.1ms     0.94  join_merge.MergeAsof.time_by_int('backward', 5)
-         312±3μs          293±1μs     0.94  join_merge.Concat.time_concat_empty_left(0)
-         312±3μs          292±2μs     0.94  join_merge.Concat.time_concat_empty_right(0)
-       102±0.2ms       95.0±0.3ms     0.93  join_merge.MergeAsof.time_by_object('backward', None)
-       102±0.1ms       95.2±0.3ms     0.93  join_merge.MergeAsof.time_by_object('backward', 5)
-      20.2±0.7ms      18.4±0.07ms     0.91  join_merge.Merge.time_merge_2intkey(False)
-         452±3μs        397±0.9μs     0.88  join_merge.Concat.time_concat_mixed_ndims(0)
-      28.2±0.4ms       24.6±0.1ms     0.87  join_merge.Join.time_join_dataframe_index_multi(True)
-         864±3ms          745±5ms     0.86  join_merge.I8Merge.time_i8merge('right')
-         814±3ms         695±10ms     0.85  join_merge.I8Merge.time_i8merge('left')
-      17.9±0.4ms       15.1±0.4ms     0.84  join_merge.Join.time_join_dataframe_index_shuffle_key_bigger_sort(True)
-         193±1μs        161±0.7μs     0.83  join_merge.Concat.time_concat_empty_right(1)
-         194±2μs        161±0.8μs     0.83  join_merge.Concat.time_concat_empty_left(1)
-         796±7μs          662±1μs     0.83  join_merge.Concat.time_concat_mixed_ndims(1)
-        807±10ms          670±5ms     0.83  join_merge.I8Merge.time_i8merge('outer')
-     3.21±0.01ms         2.65±0ms     0.83  join_merge.Merge.time_merge_dataframe_integer_key(True)
-       106±0.3ms       87.5±0.5ms     0.83  join_merge.MergeOrdered.time_merge_ordered
-        810±10ms          667±4ms     0.82  join_merge.I8Merge.time_i8merge('inner')
-      1.54±0.08s          1.26±0s     0.82  join_merge.JoinIndex.time_left_outer_join_index
-      15.1±0.2ms       12.2±0.3ms     0.80  join_merge.Join.time_join_dataframe_index_single_key_small(True)
-        2.74±0ms         2.17±0ms     0.79  join_merge.Merge.time_merge_dataframe_integer_key(False)
-      23.1±0.2ms      17.8±0.08ms     0.77  join_merge.Join.time_join_dataframe_index_multi(False)
-     23.5±0.04ms      18.0±0.03ms     0.77  join_merge.MergeAsof.time_on_uint64('nearest', 5)
-     22.8±0.05ms      17.3±0.07ms     0.76  join_merge.MergeAsof.time_on_uint64('nearest', None)
-     23.2±0.04ms      17.2±0.06ms     0.74  join_merge.MergeAsof.time_on_int('nearest', 5)
-         419±6μs          310±1μs     0.74  join_merge.Append.time_append_homogenous
-     23.0±0.09ms      16.9±0.03ms     0.73  join_merge.MergeAsof.time_on_int32('nearest', 5)
-     22.5±0.04ms      16.5±0.06ms     0.73  join_merge.MergeAsof.time_on_int('nearest', None)
-     22.5±0.07ms      16.4±0.03ms     0.73  join_merge.MergeAsof.time_on_int32('nearest', None)
-     18.8±0.03ms      13.4±0.05ms     0.71  join_merge.MergeAsof.time_on_uint64('forward', 5)
-      15.2±0.1ms      10.8±0.04ms     0.71  join_merge.Join.time_join_dataframe_index_shuffle_key_bigger_sort(False)
-      15.6±0.3ms       11.0±0.1ms     0.71  join_merge.Join.time_join_dataframe_index_single_key_bigger(False)
-     18.5±0.05ms      13.1±0.08ms     0.71  join_merge.MergeAsof.time_on_uint64('forward', None)
-     6.61±0.01ms      4.63±0.02ms     0.70  join_merge.Join.time_join_dataframes_cross(True)
-     17.4±0.04ms      12.0±0.04ms     0.69  join_merge.MergeAsof.time_on_uint64('backward', 5)
-     6.29±0.02ms      4.28±0.01ms     0.68  join_merge.Join.time_join_dataframes_cross(False)
-     17.0±0.07ms      11.5±0.05ms     0.68  join_merge.MergeAsof.time_on_uint64('backward', None)
-     18.7±0.06ms      12.6±0.03ms     0.68  join_merge.MergeAsof.time_on_int('forward', 5)
-      14.5±0.2ms      9.79±0.01ms     0.68  join_merge.Join.time_join_dataframe_index_single_key_small(False)
-      18.4±0.1ms      12.3±0.04ms     0.67  join_merge.MergeAsof.time_on_int32('forward', 5)
-     18.2±0.05ms      12.2±0.04ms     0.67  join_merge.MergeAsof.time_on_int('forward', None)
-     18.1±0.03ms      12.0±0.02ms     0.66  join_merge.MergeAsof.time_on_int32('forward', None)
-     17.6±0.05ms      11.6±0.04ms     0.66  join_merge.MergeAsof.time_on_int32('backward', 5)
-     17.4±0.04ms      11.3±0.03ms     0.65  join_merge.MergeAsof.time_on_int32('backward', None)
-     17.1±0.02ms      11.0±0.06ms     0.64  join_merge.MergeAsof.time_on_int('backward', 5)
-     16.8±0.07ms      10.8±0.04ms     0.64  join_merge.MergeAsof.time_on_int('backward', None)
-       782±0.5ms          491±2ms     0.63  join_merge.MergeCategoricals.time_merge_object
-         977±2μs          596±2μs     0.61  join_merge.Append.time_append_mixed
-     20.9±0.07ms      12.5±0.03ms     0.60  join_merge.Concat.time_concat_small_frames(0)
-     14.5±0.06ms      8.33±0.03ms     0.57  join_merge.Concat.time_concat_small_frames(1)
-         931±3ms          330±3ms     0.35  join_merge.Merge.time_merge_dataframes_cross(True)
-         931±7ms          330±2ms     0.35  join_merge.Merge.time_merge_dataframes_cross(False)
-        58.4±1ms          318±1μs     0.01  join_merge.ConcatDataFrames.time_f_ordered(1, False)
-        55.9±4ms          264±2μs     0.00  join_merge.ConcatDataFrames.time_f_ordered(1, True)
-       91.2±50ms          317±1μs     0.00  join_merge.ConcatDataFrames.time_c_ordered(1, False)
-       90.3±30ms          265±2μs     0.00  join_merge.ConcatDataFrames.time_c_ordered(1, True)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

The results are quite stable on my benchmarking setup (ran the above twice and gave almost same results, also within a run between different parametrizations is quite consistent), so I think they should give a good picture. Some general observations:

All benchmarks are on par or faster using ArrayManager, except for the concat( , axis=0) cases.
Many of the join/merge/merge_asof cases show some speedup, and I expect that PERF: optimize algos.take for repeated calls #39692 should give a further improvement
Some of the concat(.., axis=1) benchmarks give a huge difference (factor "0.00" for after), but this is largely explained by copy (the BlockManager based method copies in this case of no reindexing, while the current ArrayManager version doesn't do that (yet)). If I add a copy, it becomes a more modest speedup of 2-3x (instead of >100x).
The concat(.., axis=0) case that shows a 2x slowdown with ArrayManager is the expected case, I think. That specific benchmark is using a DataFrame with a single float dtype (single block), so concatenating 20 2D arrays is always going to be faster as concatenating 200 times 20 1D arrays (the benchmark case has 200 columns, and does pd.concat([df] * 20, axis=0)). From profiling the operation, almost all time is spent in the actual numpy concatenation routine (for both cases), so I don't expect there is much room for improvement here.

Generally I am quite positive about the results.

pandas/core/dtypes/concat.py

jorisvandenbossche · 2021-02-12T13:06:41Z

More comments on this?

jorisvandenbossche · 2021-03-24T15:05:50Z

If there are no more comments, I am planning to merge this.

jreback · 2021-03-24T15:17:13Z

pandas/core/internals/concat.py

        mgrs.append(mgr)

    if concat_axis == 1:
        # concatting along the rows -> concat the reindexed arrays
        # TODO(ArrayManager) doesn't yet preserve the correct dtype
        arrays = [
-            concat_compat([mgrs[i].arrays[j] for i in range(len(mgrs))])
+            concat_arrays([mgrs[i].arrays[j] for i in range(len(mgrs))])


can you now remove concat_compat?

and reading your comment below, why is this not in array manger if its only used there?

can you now remove concat_compat?

concat_compat is used in several other places as well

why is this not in array manger if its only used there?

Because this is the code for concatting managers, which for BlockManager also resides in internals/concat.py?

jbrockmendel · 2021-03-24T16:22:38Z

If there are no more comments, I am planning to merge this.

Again, please do not. I've given up asking on things that are Sufficiently Benign, but this is not in that category.

jorisvandenbossche · 2021-03-24T17:06:29Z

Again, please do not. I've given up asking on things that are Sufficiently Benign, but this is not in that category.

Brock, I didn't merge this without asking. I know this is a larger PR that I can't just merge, but so that's exactly the reason I asked it above, so you could reply to it before I would actually merge (which you did now ;)).
(maybe my formulation was not that good, I just wanted to trigger a "review again or indicate it's good")

But to the point: can you then indicate that you are still planning to review in more detail or still have concerns? Because now your last comments were all minor things, which gives me the impression that you no longer have those.

jbrockmendel · 2021-03-24T19:15:52Z

Looking at JoinUnit, it does have self.block, but it wouldn't be too hard to have it take block.values instead, at which point it would be BlockManager/ArrayManager-agnostic. _get_mgr_concatenation_plan accesses mgr.blklocs and mgr.blknos twice each, but i think the ArrayManager analogues of those are np.ones(len(self.arrays)) and np.arange(len(self.arrays))

So I think we can adapt existing code (and as a result, behavior) to be shared rather than add a bunch of new code

jorisvandenbossche · 2021-03-26T20:33:03Z

_get_mgr_concatenation_plan accesses mgr.blklocs and mgr.blknos twice each

It indeed accesses those twice, but then the rest of the function further processes those .. AFAIK this whole function is fully BlockManager-specific (also the return value includes BlockPlacement, so is BlockManager-specific)

Looking at JoinUnit, it does have self.block, but it wouldn't be too hard to have it take block.values instead, at which point it would be BlockManager/ArrayManager-agnostic.

It's probably indeed possible to make JoinUnit directly hold the values instead of the Block. But what's your concrete idea here? That it would be used instead of the NullProxy?

Personally, I think the implementation I made right now for ArrayManager is much cleaner and simpler to follow. And my impression is that @jbrockmendel agrees with that ("If NullArrayProxy can be used to as an alternative to JoinUnit/ConcatenationPlan that'd be great").
If we don't want to have the temporary duplication between ArrayManager and BlockManager, then there are two options:

I ditch my new implementation and rewrite it using bits and pieces of the current BlockManager implementation. Personally, I would not like to do that. The new version is simpler (also because the required logic for ArrayManager is inherently less complex as it doesn't need to deal with possibly unaligned blocks that get concatenated), and adding logic for the ArrayManager to the current implementation will only make the already complex piece of code more complex (and at the point when we move to ArrayManager as default, we would want this simpler version anyway).
We refactor the full of concatenation_plan / JoinUnit based on my NullProxy idea. In principle, I think this would be a good idea, as I think it could mean an improvement of the current implementation.
However, this is a quite big refactor, and I would argue that this is 1) not a priority short-term for getting the ArrayManager feature-complete (and be able to do a full evaluation / comparison with the BlockManager) and 2) given that (assuming we decide to go with ArrayManager) the BlockManager is not intended to be actively developed for a long time (eg only 1-2 years), a big refactor might not be worth it, as any refactor always has the risk to introduce regressions (for long-lived code that is typically worth it, though). Even if we want to do this, I think this can be a follow-up to this PR.

So as a summary, IMO it's worth it to have this temporary duplicated implementation for ArrayManager (just as other parts of ArrayManager also duplicate parts of the BlockManager), as it allows to keep the code for ArrayManager simpler (which is the code we intend to keep long-term).

jbrockmendel · 2021-03-27T00:22:30Z

pandas/core/internals/concat.py

+        else:
+            target_dtype = to_concat_no_proxy[0].dtype
+    elif not single_dtype:
+        if any(kind in ["m", "M"] for kind in kinds):


why cant we use find_common_type here?

Good question. So, can we think of any combination of dtypes where np.find_common_type(..) (which would get used by our find_common_type in this case) would not result in object dtype if the input contains datetime or timedelta?

Most obvious case I could think of is with integers, and numpy also gives object dtype there:

In [20]: np.find_common_type([np.dtype("datetime64[ns]"), np.dtype("int64")], []) Out[20]: dtype('O')

The reason I handled it separately here is because in dtypes/concat.py we also handle datetime-likes separately, and in _concat_datetime we also "hardcode" to use object dtype in case of multiple input dtypes.
But if find_common_dtype always gives object dtype for mixed datetimelike input, this can indeed be simplified here. Will try that and see what the tests say.

@jbrockmendel so I was indeed able to simplify this, also the if any_ea check was not needed, since find_common_type handles all cases

pandas/core/internals/concat.py

jreback · 2021-03-29T14:43:58Z

I am ok with temporary duplication of things. This is a pretty common pattern, e.g. first to make things work, then moving towards removing the duplication in steps.

jbrockmendel · 2021-03-29T14:53:34Z

Agreed. I'd like to see the find_common_type thing resolved first, but wont hold this up based on that

jorisvandenbossche · 2021-03-31T14:51:19Z

Updated this PR, and resolved the find_common_type comment. The py37_32bit failing test seems a flaky hypothesis one

jorisvandenbossche · 2021-04-02T17:06:53Z

Any more comments?

jbrockmendel · 2021-04-02T21:35:32Z

Any more comments?

https://github.com/pandas-dev/pandas/pull/39612/files#r602641371, but doesnt need to be addressed here. LGTM

jreback · 2021-04-02T21:38:36Z

lgtm

jbrockmendel · 2021-04-11T22:42:46Z

LGTM, might do another rebase to be on the safe side

jorisvandenbossche · 2021-04-12T10:52:22Z

https://github.com/pandas-dev/pandas/pull/39612/files#r602641371, but doesnt need to be addressed here. LGTM

Sorry, I missed that one earlier (GitHub's UI is not fantastic when replying to an older comment, as it doesn't show up in the recent comments in the timeline ..). I opened #40893 to track this inconsistency.

[ArrayManager] Implement concat with reindexing

6cd1c4d

jorisvandenbossche added Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode Internals Related to non-user accessible pandas implementation labels Feb 5, 2021

jorisvandenbossche commented Feb 5, 2021

View reviewed changes

pandas/core/dtypes/concat.py Outdated Show resolved Hide resolved

jorisvandenbossche commented Feb 5, 2021

View reviewed changes

jorisvandenbossche mentioned this pull request Feb 5, 2021

Refactor - ArrayManager overview issue #39146

Closed

11 tasks

jorisvandenbossche added 2 commits February 5, 2021 13:57

fix mypy

73d9de2

pass through allow dups

272d674

jorisvandenbossche changed the title ~~[ArrayManager] Implement concat with reindexing~~ [ArrayManager] REF: Implement concat with reindexing Feb 5, 2021

jreback requested changes Feb 5, 2021

View reviewed changes

jbrockmendel reviewed Feb 5, 2021

View reviewed changes

pandas/core/dtypes/concat.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Feb 5, 2021

View reviewed changes

pandas/core/dtypes/concat.py Outdated Show resolved Hide resolved

jorisvandenbossche added 4 commits February 5, 2021 16:43

simplify _array_from_proxy

555d7ac

Merge remote-tracking branch 'upstream/master' into am-concat

ebad8a4

Merge remote-tracking branch 'upstream/master' into am-concat

ee495e5

fix creation empty + turn into method

19c7f75

jorisvandenbossche mentioned this pull request Feb 9, 2021

[ArrayManager] TST: get tests running for /tests/frame #39700

Merged

jbrockmendel reviewed Feb 9, 2021

View reviewed changes

pandas/core/dtypes/concat.py Outdated Show resolved Hide resolved

jorisvandenbossche added 2 commits February 10, 2021 08:30

remove overriding of fill_value

42e1b05

Merge remote-tracking branch 'upstream/master' into am-concat

724be3e

jorisvandenbossche added this to the 1.3 milestone Feb 10, 2021

jorisvandenbossche mentioned this pull request Feb 10, 2021

PERF: special case numpy.dtype in is_extension_array_dtype #39678

Merged

jorisvandenbossche added 2 commits February 12, 2021 14:08

Merge remote-tracking branch 'upstream/master' into am-concat

db3f0ed

use ensure_dtype_can_hold_na

a2aa388

jorisvandenbossche mentioned this pull request Feb 12, 2021

API: ExtensionArray interface method to create an empty / all-NA array for a given dtype #39776

Open

jorisvandenbossche mentioned this pull request Mar 24, 2021

ENH: Categorical.empty #40602

Merged

4 tasks

jreback requested changes Mar 24, 2021

View reviewed changes

jbrockmendel reviewed Mar 27, 2021

View reviewed changes

pandas/core/internals/concat.py Show resolved Hide resolved

jorisvandenbossche added 2 commits March 31, 2021 11:08

Merge remote-tracking branch 'upstream/master' into am-concat

0fafb1a

simplify usage of find_common_type

f67e9e2

jorisvandenbossche added 3 commits March 31, 2021 20:29

Merge remote-tracking branch 'upstream/master' into am-concat

f655e33

update annotation

d21bd3a

Merge remote-tracking branch 'upstream/master' into am-concat

9435c39

jreback approved these changes Apr 2, 2021

View reviewed changes

jorisvandenbossche added 2 commits April 7, 2021 14:04

Merge remote-tracking branch 'upstream/master' into am-concat

22ea7d2

fixup typing

77b05f4

Merge remote-tracking branch 'upstream/master' into am-concat

81d0954

jorisvandenbossche mentioned this pull request Apr 12, 2021

API: value-dependent behaviour in concat with all-NA data #40893

Closed

jorisvandenbossche merged commit f0c4093 into pandas-dev:master Apr 12, 2021

jorisvandenbossche deleted the am-concat branch April 12, 2021 10:56

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

[ArrayManager] REF: Implement concat with reindexing (pandas-dev#39612)

f57304e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ArrayManager] REF: Implement concat with reindexing #39612

[ArrayManager] REF: Implement concat with reindexing #39612

jorisvandenbossche commented Feb 5, 2021 •

edited

Loading

jorisvandenbossche Feb 5, 2021

jorisvandenbossche Feb 5, 2021

jreback Feb 5, 2021

jbrockmendel Feb 5, 2021

jorisvandenbossche Feb 5, 2021

jbrockmendel Feb 5, 2021

jorisvandenbossche Feb 9, 2021

jorisvandenbossche Feb 12, 2021

jreback Feb 5, 2021

jreback Feb 5, 2021

jorisvandenbossche commented Feb 9, 2021

jorisvandenbossche commented Feb 9, 2021

jorisvandenbossche commented Feb 12, 2021

jorisvandenbossche commented Mar 24, 2021

jreback Mar 24, 2021

jreback Mar 24, 2021

jorisvandenbossche Mar 24, 2021

jbrockmendel commented Mar 24, 2021

jorisvandenbossche commented Mar 24, 2021

jbrockmendel commented Mar 24, 2021

jorisvandenbossche commented Mar 26, 2021 •

edited

Loading

jbrockmendel Mar 27, 2021

jorisvandenbossche Mar 27, 2021

jorisvandenbossche Mar 31, 2021

jreback commented Mar 29, 2021

jbrockmendel commented Mar 29, 2021

jorisvandenbossche commented Mar 31, 2021

jorisvandenbossche commented Apr 2, 2021

jbrockmendel commented Apr 2, 2021

jreback commented Apr 2, 2021

jbrockmendel commented Apr 11, 2021

jorisvandenbossche commented Apr 12, 2021

		@@ -132,6 +204,75 @@ def is_nonempty(x) -> bool:
		return np.concatenate(to_concat, axis=axis)


		def concat_arrays(to_concat):

[ArrayManager] REF: Implement concat with reindexing #39612

[ArrayManager] REF: Implement concat with reindexing #39612

Conversation

jorisvandenbossche commented Feb 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Feb 9, 2021

jorisvandenbossche commented Feb 9, 2021

jorisvandenbossche commented Feb 12, 2021

jorisvandenbossche commented Mar 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Mar 24, 2021

jorisvandenbossche commented Mar 24, 2021

jbrockmendel commented Mar 24, 2021

jorisvandenbossche commented Mar 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 29, 2021

jbrockmendel commented Mar 29, 2021

jorisvandenbossche commented Mar 31, 2021

jorisvandenbossche commented Apr 2, 2021

jbrockmendel commented Apr 2, 2021

jreback commented Apr 2, 2021

jbrockmendel commented Apr 11, 2021

jorisvandenbossche commented Apr 12, 2021

jorisvandenbossche commented Feb 5, 2021 •

edited

Loading

jorisvandenbossche commented Mar 26, 2021 •

edited

Loading