ARROW-3144: [C++/Python] Move "dictionary" member from DictionaryType to ArrayData to allow for variable dictionaries #4316

wesm · 2019-05-15T13:13:55Z

This patch moves the dictionary member out of DictionaryType to a new
member on the internal ArrayData structure. As a result, serializing
and deserializing schemas requires only a single IPC message, and
schemas have no knowledge of what the dictionary values are.

The objective of this change is to correct a long-standing Arrow C++
design problem with dictionary-encoded arrays where the dictionary
values must be known at schema construction time. This has plagued us
all over the codebase:

In reading Parquet files, reading directly to DictionaryArray is not
simple because each row group may have a different dictionary
In IPC streams, delta dictionaries (not yet implemented) would
invalidate the pre-existing schema, causing subsequent RecordBatch
objects to be incompatible
In Arrow Flight, schema negotiation requires the dictionaries to be
sent, having possibly unbounded size.
Not possible to have different dictionaries in a ChunkedArray
In CSV files, converting columns to dictionary in parallel would
require an expensive type unification

The summary of what can be learned from this is: do not put data in
type objects, only metadata. Dictionaries are data, not metadata.

There are a number of unavoidable API changes (straightforward for
library users to fix) but otherwise no functional difference in the
library.

As you can see the change is quite complex as significant parts of IPC
read/write, JSON integration testing, and Flight needed to be reworked
to alter the control flow around schema resolution and handling the
first record batch.

Key APIs changed

DictionaryType constructor requires a DataType for the
dictionary value type instead of the dictionary itself. The
dictionary factory method is correspondingly changed. The
dictionary accessor method on DictionaryType is replaced with
value_type.
DictionaryArray constructor and DictionaryArray::FromArrays must
be passed the dictionary values as an additional argument.
DictionaryMemo is exposed in the public API as it is now required
for granular interactions with IPC messages with such functions as
ipc::ReadSchema and ipc::ReadRecordBatch
A DictionaryMemo* argument is added to several low-level public
functions in ipc/writer.h and ipc/reader.h

Some other incidental changes:

Because DictionaryType objects could be reused previous in Schemas, such dictionaries would be "deduplicated" in IPC messages in passing. This is no longer possible by the same trick, so dictionary reuse will have to be handled in a different way (I opened ARROW-5340 to investigate)
As a result of this, an integration test that featured dictionary reuse has been changed to not reuse dictionaries. Technically this is a regression, but I didn't want to block the patch over it
R is added to allow_failures in Travis CI for now

bkietz · 2019-05-15T13:47:32Z

cpp/src/arrow/array-dict-test.cc

  std::shared_ptr<Array> int_array;
  ASSERT_OK(int_builder.Finish(&int_array));

-  DictionaryArray expected(dtype, int_array);
+  DictionaryArray expected(dictionary(int16(), decimal_type), int_array, fsb_array);
  ASSERT_TRUE(expected.Equals(result));
 }

 // ----------------------------------------------------------------------
 // DictionaryArray tests

 TEST(TestDictionary, Basics) {


Should this be moved to type-test.cc?

Yes, will do

bkietz · 2019-05-15T14:23:47Z

cpp/src/arrow/array.cc

@@ -765,20 +763,30 @@ static Status TransposeDictIndices(MemoryPool* pool, const ArrayData& in_data,
 }

 Status DictionaryArray::Transpose(MemoryPool* pool, const std::shared_ptr<DataType>& type,
+                                  const std::shared_ptr<Array>& dictionary,
                                  const std::vector<int32_t>& transpose_map,
                                  std::shared_ptr<Array>* out) const {
  DCHECK_EQ(type->id(), Type::DICTIONARY);
  const auto& out_dict_type = checked_cast<const DictionaryType&>(*type);



This seems equivalent to Cast(array=Take(indices=transpose_map, values=data_), to=out_index_type). Should we add an explicit output type to TakeOptions?

Would we want to use that here?

Possibly, but out of scope for this patch

I opened https://issues.apache.org/jira/browse/ARROW-5343 which would be a pre-requisite for this

bkietz · 2019-05-15T14:31:08Z

cpp/src/arrow/array.h

+
+  // The dictionary for this Array, if any. Only used for dictionary
+  // type
+  std::shared_ptr<Array> dictionary;


Why not use child_data[0]?

This was discussed on the mailing list. I agree with the others (Antoine / Micah) that having an explicit dictionary field is more clear. I added a benchmark to assess if it causes meaningful overhead which it does not seem to.

Relevant mailing list entry: https://lists.apache.org/thread.html/987a9f9a36f491e49d71ddcf9ce91f03420210ec30dd2c24895b2332@%3Cdev.arrow.apache.org%3E

bkietz · 2019-05-15T14:59:10Z

cpp/src/arrow/compute/kernels/cast.cc

        break;
      default:
        ctx->SetStatus(
-            Status::Invalid("Invalid index type: ", indices.type()->ToString()));
+            Status::Invalid("Invalid index type: ", type.index_type()->ToString()));


Use TypeError here

I prefer to leave the semantics here as unchanged as possible from master

went ahead and changed

bkietz · 2019-05-15T15:28:16Z

cpp/src/arrow/type_traits.h

@@ -292,7 +291,16 @@ struct TypeTraits<ExtensionType> {
 //

 template <typename T>
-using is_number = std::is_base_of<Number, T>;


If we're renaming this one, should we do the same for is_signed_integer etc?

Probably. There's some other cleanup to do but not here, I will open a JIRA

bkietz · 2019-05-15T15:33:31Z

cpp/src/arrow/util/concatenate.cc

-    return ConcatenateBuffers(Buffers(1, *fixed), pool_, &out_.buffers[1]);
+
+    // Two cases: all the dictionaries are the same, or unification is
+    // required


bkietz · 2019-05-15T15:36:57Z

cpp/src/arrow/type.h


-/// Concrete type class for dictionary data
+/// \brief Dictionary-encoded value type with data-dependent
+/// dictionary
 class ARROW_EXPORT DictionaryType : public FixedWidthType {


After this refactor DictionaryType looks more like a nested type.

It's a synthetic construct in C++ since there is no Dictionary type in the protocol metadata...

pitrou

I have refrained from looking at the IPC implementation details for now.

The one thing I'm worried about is that DictionaryMemo now must be handled by all users of the IPC layer (or their system will be incompatible with dict arrays).

pitrou · 2019-05-15T15:09:40Z

cpp/src/arrow/builder-benchmark.cc

@@ -364,11 +364,30 @@ static void BM_BuildStringDictionaryArray(
  state.SetBytesProcessed(state.iterations() * fodder_size);
 }

+static void BM_ArrayDataConstructDestruct(


Is this actually useful?

Not really. I'll remove it. I found that the roundtrip cost of an empty ArrayData is about 60ns

pitrou · 2019-05-15T15:15:26Z

cpp/src/arrow/compute/kernels/cast.cc

+  using ArrayType = typename TypeTraits<T>::ArrayType;
+
+  template <typename IndexType>
+  Status Unpack(FunctionContext* ctx, const ArrayData& indices,


The UnpackHelper implementations could use ArrayDataVisitor from visitor_inline.h.

Yes, they could. This patch removes some prior code duplication but otherwise roughly maintains the status quo

https://issues.apache.org/jira/browse/ARROW-5344

pitrou · 2019-05-15T15:17:09Z

cpp/src/arrow/compute/kernels/take.cc

@@ -145,19 +145,21 @@ struct UnpackValues {

  Status Visit(const DictionaryType& t) {
    std::shared_ptr<Array> taken_indices;
+    const auto& values = static_cast<const DictionaryArray&>(*params_.values);


Should use checked_cast.

pitrou · 2019-05-15T15:26:50Z

cpp/src/arrow/flight/server.cc

@@ -53,6 +55,9 @@ using ServerWriter = grpc::ServerWriter<T>;
 namespace pb = arrow::flight::protocol;

 namespace arrow {
+
+using internal::make_unique;


AFAIR, our make_unique implementation is buggy on MSVC. @bkietz

accurate; MSVC will occasionally get confused between internal::make_unique and std::make_unique when it is usinged like this. Referring to it with internal::make_unique prevents the issue https://issues.apache.org/jira/browse/ARROW-5121

Removed using statement

pitrou · 2019-05-15T15:45:53Z

cpp/src/arrow/flight/server.cc

+ private:
+  Status GetNextDictionary(FlightPayload* payload) {
+    const auto& it = dictionaries_[dictionary_index_++];
+    return ipc::internal::GetDictionaryPayload(it.first, it.second, pool_,


I'm worried about the details of the IPC stream format leaking into all IPC users.
Can we implement an IpcPayloadWriter instead and rely on OpenRecordBatchWriter?

Agreed. Let's address shortly after this is merged

pitrou · 2019-05-15T15:47:36Z

cpp/src/arrow/flight/server.h

  virtual std::shared_ptr<Schema> schema() = 0;

+  /// \brief Compute FlightPayload containing serialized RecordBatch schema
+  virtual Status GetSchemaPayload(FlightPayload* payload) = 0;


I think the user shouldn't have to implement this and deal with the specifics of dictionary transmission over the wire.
cc @lidavidm for opinions.

(or is it @lihalite ?)

Either will ping me!

I think in this case, it's OK, in that Flight explicitly states we transmit the schema first, then data; also, if we have a set of reasonable implementations of this interface, users should hopefully not feel a need to implement it themselves unless they did actually want control over the specifics.

Right, but if you look at the RecordBatchStream implementation, it's doing non-trivial stuff with dictionaries.

I think we'll have to develop some utility code to assist with these details, but it doesn't seem urgent for the moment. If you are creating a custom stream I am not sure right now how to protect the developer from the details of the stream protocol

I did the necessary to hide the details in IpcPayloadWriter, and the converse is available for reading in ipc::MessageReader together with RecordBatchStreamReader. There shouldn't be a need to expose this at all.

OK, how would you like to address this, sequence-wise -- is only the implementation here a problem, or is the public API also a problem? I think it would be much easier to fix both after getting this patch merged since we aren't releasing anytime soon

Ok, let's fix this afterwards.

pitrou · 2019-05-15T15:55:05Z

cpp/src/arrow/ipc/read-write-test.cc

    io::BufferReader buf_reader(serialized_schema);
-    return ReadSchema(&buf_reader, result);
+    return ReadSchema(&buf_reader, &in_memo, result);


Should we test something about in_memo?

I added an assertion that in_memo and out_memo agree about the number of dictionaries

pitrou · 2019-05-15T16:05:46Z

cpp/src/arrow/python/flight.cc

+std::shared_ptr<Schema> PyGeneratorFlightDataStream::schema() { return schema_; }
+
+Status PyGeneratorFlightDataStream::GetSchemaPayload(FlightPayload* payload) {
+  return ipc::internal::GetSchemaPayload(*schema_, &dictionary_memo_,


I'm not sure I understand. This seems to populate a DictionaryMemo but it's not used afterwards?

We don't have a test for dictionary arrays in test_flight.py...

Last I tried, cross-language Flight with dictionaries still didn't work, or even from C++ to C++, so it wouldn't have worked before in Python. https://issues.apache.org/jira/browse/ARROW-5143

Ah... They work with DoGet but perhaps not with DoPut then?

@lihalite I'm fixing dict transfer with DoPut as part of ARROW-5113. It may produce conflicts for both you and @wesm :-)

OK, you may want to abort the dict transfer work until this is merged

cpp/src/arrow/array.cc

pitrou · 2019-05-15T18:59:05Z

cpp/src/arrow/flight/server.cc

@@ -64,9 +69,11 @@ class FlightMessageReaderImpl : public FlightMessageReader {
 public:
  FlightMessageReaderImpl(const FlightDescriptor& descriptor,
                          std::shared_ptr<Schema> schema,
+                          std::unique_ptr<ipc::DictionaryMemo> dict_memo,


See #4319 for a clean, dictionary-compatible, re-implementation of FlightMessageReader based on ipc::MessageReader.

that's excellent, thank you

wesm · 2019-05-15T19:52:25Z

I spoke with @romainfrancois and he may not be able to help fix the R bindings until next week, so if it doesn't offend anyone greatly I would add R to allowed failures in Travis CI until R can be fixed

wesm · 2019-05-15T23:23:41Z

I addressed the comments so far and added R to allow_failures in .travis.yml. I think the only thing left for this to be merge-able is to fix GLib and Ruby -- I will wait for advice from @kou. I also need to merge #3644 and then rebase this on top of it

wesm · 2019-05-16T00:00:52Z

Looks like there are still some doxygen issues, and the integration tests are broken (I was hoping I would get lucky there...) so I will address those issues tomorrow (Thursday)

kou · 2019-05-16T01:54:42Z

I'll work on this in a few days.
(I'll push some commits to this branch.)

wesm · 2019-05-16T02:40:17Z

thanks @kou! I might take a look at the GLib stuff quickly tomorrow to see how involved the changes are, let me know if you start working on it. I need to rebase after ARROW-835 went in so that will take me a little time

kou · 2019-05-16T09:05:19Z

OK.
I'll leave a comment when I start committing.

kou · 2019-05-16T09:37:21Z

This may be a out of scope topic of this pull request but I want to share.

Can we useDictionaryType instead of dictionary type's value type in DictionaryBuilder?
In Arrow GLib, we want to detect builder type from ArrayBuilder to map corresponding Arrow GLib class: https://github.com/apache/arrow/blob/master/c_glib/arrow-glib/array-builder.cpp#L3970
The current DictionaryBuilder only keeps value type. So we can't use ArrayBuilder::type() for this.

With this change, can we use DictionaryType in DictionaryBuilder because we don't need dictionary for DictionaryType?

If this is out of scope of this pull request, I'll open a JIRA issue.

pitrou · 2019-05-16T12:26:08Z

@wesm I'm currently trying to rebase this, and also fixing CUDA compile failures.

Edit: done.

pitrou · 2019-05-16T13:44:19Z

I think the integration failure is because the dictionary integration test reuses the same dictionary array for two different fields (with different index types):

{
  "schema": {
    "fields": [
      {
        "name": "dict1_0",
        "type": {
          "name": "utf8"
        },
        "nullable": true,
        "children": [],
        "dictionary": {
          "id": 0,
          "indexType": {
            "name": "int",
            "isSigned": true,
            "bitWidth": 8
          },
          "isOrdered": false
        }
      },
      {
        "name": "dict1_1",
        "type": {
          "name": "utf8"
        },
        "nullable": true,
        "children": [],
        "dictionary": {
          "id": 0,
          "indexType": {
            "name": "int",
            "isSigned": true,
            "bitWidth": 32
          },
          "isOrdered": false
        }
      },
      {
        "name": "dict2_0",
        "type": {
          "name": "int",
          "isSigned": true,
          "bitWidth": 64
        },
        "nullable": true,
        "children": [],
        "dictionary": {
          "id": 1,
          "indexType": {
            "name": "int",
            "isSigned": true,
            "bitWidth": 16
          },
          "isOrdered": false
        }
      }
    ]
  },
  "dictionaries": [
    {
...

A complication is with how the dictionary types are serialized into JSON. The "dictionaries" key doesn't allow unserializing the dictionary arrays by themselves, as you have to parse the "schema" key to get the value type...

wesm · 2019-05-16T14:01:02Z

@kou the troublesome thing with that is the DictionaryBuilder will automatically promote the index type depending on the size of the dictionary, so you might have initialized the builder with dictionary(int8(), utf8()) but then the result might be dictionary(int32(), utf8()). Maybe I'm wrong -- either way I think we can address with a new JIRA

wesm · 2019-05-16T14:01:58Z

@pitrou I can look at the integration test failure today -- thank you for rebasing!

pitrou · 2019-05-16T15:32:29Z

So the problem is that DictionaryMemo expects a <dictionary field> <-> <dictionary id> bijection, but that's not true with the JSON integration tests (see example above).

Either we change the JSON integration tests, or we need to change the IPC layer to accomodate this non-bijection.

wesm · 2019-05-16T17:40:31Z

Yes, indeed. I'm on it, I will fix.

wesm · 2019-05-16T18:35:03Z

OK, I have the integration tests fixed locally. Now multiple fields can reference the same dictionary with no problem. I'll add a unit test for the change I made to DictionaryMemo (since the case of multiple fields referencing the same dictionary only occurs in the integration tests)

wesm · 2019-05-16T18:54:17Z

Done. I also added rudimentary arguments to toggle the JS and Java integration testers off, it might be worth looking a bit more holistically at integration test CLI options per ARROW-5066

wesm · 2019-05-16T21:46:34Z

Integration tests are failing with this error:

[dictionary: 0], dict1_1: Int(32, true)[dictionary: 0], dict2_0: Int(16, true)[dictionary: 1]>
Incompatible files
Different schemas:
Schema<dict1_0: Int(8, true)[dictionary: 0], dict1_1: Int(32, true)[dictionary: 1], dict2_0: Int(16, true)[dictionary: 2]>
Schema<dict1_0: Int(8, true)[dictionary: 0], dict1_1: Int(32, true)[dictionary: 0], dict2_0: Int(16, true)[dictionary: 1]>
20:23:31.048 [main] ERROR org.apache.arrow.tools.Integration - Incompatible files
java.lang.IllegalArgumentException: Different schemas:
Schema<dict1_0: Int(8, true)[dictionary: 0], dict1_1: Int(32, true)[dictionary: 1], dict2_0: Int(16, true)[dictionary: 2]>
Schema<dict1_0: Int(8, true)[dictionary: 0], dict1_1: Int(32, true)[dictionary: 0], dict2_0: Int(16, true)[dictionary: 1]>
	at org.apache.arrow.vector.util.Validator.compareSchemas(Validator.java:47)
	at org.apache.arrow.tools.Integration$Command$3.execute(Integration.java:194)
	at org.apache.arrow.tools.Integration.run(Integration.java:118)
	at org.apache.arrow.tools.Integration.main(Integration.java:69)

With these changes it is now more difficult to refer to dictionaries multiple times in IPC streams because ids are assigned to fields prior to becoming aware of the dictionaries themselves. I opened ARROW-5340 to spend some time on this -- I'm inclined to remove the multiply-referenced dictionary from the integration tests and leave it for follow up work

Another option is to change Java to not perform assertions on the dictionary id's when comparing schemas

kou · 2019-05-16T21:47:00Z

@wesm I've created a new issue for #4316 (comment) : https://issues.apache.org/jira/browse/ARROW-5355
So we can ignore this topic in this pull request.

wesm · 2019-05-16T22:47:44Z

I removed the multiply-referenced dictionary from the integration tests. I think the dictionary-encoding stuff in Java will need a little bit of work -- it isn't clear to me, for example, why Field objects in Schema in Java have the dictionary id. In the meantime this will give me some time to sort out how to handle dictionary reuse (which is an optimization basically) in C++

kou · 2019-05-17T04:51:48Z

I'll work on this.

kou · 2019-05-17T05:18:13Z

Done.

wesm · 2019-05-17T12:12:09Z

Thanks @kou! I will rebase this to try to get a green build. @pitrou or @kou can you approve the PR?

Rust build is failing, @sunchao @nevi-me @andygrove do you know what is wrong?

More refactoring Continued refactoring Begin removing fixed dictionaries from codebase Fix up Unify implementation and tests More refactoring, consolidation Revert changes to builder_dict.*

…tests

…low on work around this can be done

wesm · 2019-05-17T12:19:44Z

@xhochy if you are available to peek at this or approve, that would be helpful. I'm happy to address post-merge feedback as well

nevi-me · 2019-05-17T12:29:55Z

Thanks @kou! I will rebase this to try to get a green build. @pitrou or @kou can you approve the PR?

Rust build is failing, @sunchao @nevi-me @andygrove do you know what is wrong?

Hi @wesm, looks like an issue with a dependency. I'll investigate (https://travis-ci.org/apache/arrow/jobs/533676609#L536)

Corresponding issue in the rustyline repo: kkawakam/rustyline#217.

I'm checking what's changed with the latest nightly. CC @andygrove

wesm · 2019-05-17T13:38:23Z

I'm inclined to merge this with the Rust build broken since there are a lot of PRs that need to be rebased... if anyone has any objection or wants to look more at the changes please let me know in the next hour or two

nevi-me · 2019-05-17T14:28:18Z

I've logged https://issues.apache.org/jira/browse/ARROW-5360

I'm inclined to merge this with the Rust build broken since there are a lot of PRs that need to be rebased... if anyone has any objection or wants to look more at the changes please let me know in the next hour or two

@wesm I'm happy that we don't stop the train for other languages because of the Rust issue. It's only occurring on nightly, and we at least know what the issue is.

wesm · 2019-05-17T16:40:35Z

Merging so we can begin rebasing other PRs

wesm · 2019-05-17T16:41:19Z

thanks @pitrou and @kou for you help getting this done!

…ROW-3144 At the moment however, all the `DictionaryMemo` use is internal, it should probably be promoted to arguments (with defaults) to the R functions. I'll do this here or on another PR if this one is merged first so that `r/` builds again on travis. This now needs the C++ lib up to date, e.g. on my setup I get it through `brew install apache-arrow --HEAD`, and there is no conditional compiling so that it still works with previous versions. Let me know if that's ok. follow up from #4316 Author: Romain Francois <[email protected]> Closes #4413 from romainfrancois/ARROW-5361/dictionary and squashes the following commits: b0de1a8 <Romain Francois> R should pass now 2556c16 <Romain Francois> document() fa0440f <Romain Francois> update R to changes from ARROW-3144 #4316

…DictionaryBuilder Adds support for building and writing delta dictionaries. Moves the `dictionary` Vector pointer to the Data class, similar to #4316. Forked from #4476 since this adds support for delta dictionaries to the DictionaryBuilder. Will rebase this PR after that's merged. All the work is in the last commit, here: b12d842 Author: ptaylor <[email protected]> Closes #4502 from trxcllnt/js/delta-dictionaries and squashes the following commits: 6a70a25 <ptaylor> make dictionarybuilder and recordbatchwriter support delta dictionaries

bkietz mentioned this pull request May 15, 2019

ARROW-1278: [Integration] Adding integration tests for fixed_size_list #4309

Closed

bkietz reviewed May 15, 2019

View reviewed changes

pitrou reviewed May 15, 2019

View reviewed changes

wesm force-pushed the ARROW-3144 branch from b642834 to ba8585e Compare May 15, 2019 23:02

wesm mentioned this pull request May 15, 2019

ARROW-835: [Format][C++][Java] Create a new Duration type #3644

Closed

pitrou force-pushed the ARROW-3144 branch from 7d8377d to cd119a2 Compare May 16, 2019 12:42

wesm added 2 commits May 17, 2019 07:16

Initial boilerplate

24f99f1

More refactoring Continued refactoring Begin removing fixed dictionaries from codebase Fix up Unify implementation and tests More refactoring, consolidation Revert changes to builder_dict.*

Start porting changes

6bd72f9

wesm and others added 8 commits May 17, 2019 07:16

Fix all but 3 Python unit tests

f1178b2

Fix rest of Python unit tests, fix some incorrect code comments

b1cc52e

Code review comments

bd04774

Add R to allow_failures for now

0370750

Fix CUDA and Duration issues

37e82b4

Support many fields referencing the same dictionary, fix integration …

f62819f

…tests

Do not reuse dictionaries in integration tests for now until more fol…

89e274d

…low on work around this can be done

[GLib][Ruby] Follow DictionaryArray changes

9f1ccfb

wesm force-pushed the ARROW-3144 branch from 7880e16 to 9f1ccfb Compare May 17, 2019 12:16

wesm closed this in e68ca7f May 17, 2019

wesm deleted the ARROW-3144 branch May 17, 2019 16:41

romainfrancois added a commit to romainfrancois/arrow that referenced this pull request May 29, 2019

update R to changes from ARROW-3144 apache#4316

99f2b6b

romainfrancois added a commit to romainfrancois/arrow that referenced this pull request May 30, 2019

update R to changes from ARROW-3144 apache#4316

023ed9c

romainfrancois mentioned this pull request May 30, 2019

ARROW-5361: [R] Follow DictionaryType/DictionaryArray changes from ARROW-3144 #4413

Closed

romainfrancois added a commit to romainfrancois/arrow that referenced this pull request May 31, 2019

update R to changes from ARROW-3144 apache#4316

2ba199c

romainfrancois mentioned this pull request May 31, 2019

ARROW-3814: [R] RecordBatch$from_arrays() #3565

Closed

romainfrancois added a commit to romainfrancois/arrow that referenced this pull request Jun 1, 2019

update R to changes from ARROW-3144 apache#4316

fa0440f

trxcllnt mentioned this pull request Jun 9, 2019

ARROW-5537: [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder #4502

Closed

romainfrancois mentioned this pull request Jun 10, 2019

ARROW-4845: [R] Compiler warnings on Windows MingW64 #4511

Closed

jorisvandenbossche mentioned this pull request Aug 24, 2020

Data types to support data-apis/dataframe-api#26

Open

rgommers mentioned this pull request Apr 8, 2021

Add a prototype of the dataframe interchange protocol data-apis/dataframe-api#38

Merged

asfimport mentioned this pull request Aug 1, 2019

[C++] Move "dictionary" member from DictionaryType to ArrayData to allow for changing dictionaries between Array chunks #19492

Closed

ARROW-3144: [C++/Python] Move "dictionary" member from DictionaryType to ArrayData to allow for variable dictionaries #4316

ARROW-3144: [C++/Python] Move "dictionary" member from DictionaryType to ArrayData to allow for variable dictionaries #4316

Conversation

wesm commented May 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wesm commented May 15, 2019

wesm commented May 15, 2019

wesm commented May 16, 2019

kou commented May 16, 2019

wesm commented May 16, 2019

kou commented May 16, 2019

kou commented May 16, 2019

pitrou commented May 16, 2019 • edited Loading

pitrou commented May 16, 2019

wesm commented May 16, 2019

wesm commented May 16, 2019

pitrou commented May 16, 2019 • edited Loading

wesm commented May 16, 2019

wesm commented May 16, 2019

wesm commented May 16, 2019

wesm commented May 16, 2019 • edited Loading

kou commented May 16, 2019

wesm commented May 16, 2019

kou commented May 17, 2019

kou commented May 17, 2019

wesm commented May 17, 2019

wesm commented May 17, 2019

nevi-me commented May 17, 2019 • edited Loading

wesm commented May 17, 2019

nevi-me commented May 17, 2019 • edited Loading

wesm commented May 17, 2019

wesm commented May 17, 2019

wesm commented May 15, 2019 •

edited

Loading

pitrou commented May 16, 2019 •

edited

Loading

pitrou commented May 16, 2019 •

edited

Loading

wesm commented May 16, 2019 •

edited

Loading

nevi-me commented May 17, 2019 •

edited

Loading

nevi-me commented May 17, 2019 •

edited

Loading