Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple roundtrip #1

Merged
merged 3 commits into from
Nov 8, 2016
Merged

simple roundtrip #1

merged 3 commits into from
Nov 8, 2016

Conversation

julienledem
Copy link

Still TODO: actually read the values from a vector and write them back to a new one.
This one just reads the buffers in a vector and writes them back.

@wesm
Copy link
Owner

wesm commented Nov 3, 2016

How do I run this program? I don't know how to set the right classpath

@julienledem
Copy link
Author

julienledem commented Nov 4, 2016

to try it out:

mvn package
java -cp tools/target/arrow-tools-0.1.1-SNAPSHOT-jar-with-dependencies.jar org.apache.arrow.tools.FileRoundtrip -in foo -out bar

@wesm
Copy link
Owner

wesm commented Nov 7, 2016

Thank you! I will start with this and work on an initial "hello world" integration test

@wesm wesm merged commit aee552a into wesm:roundtrip-tool Nov 8, 2016
@wesm
Copy link
Owner

wesm commented Nov 8, 2016

Merged these commits into my branch. Very helpful, thank you

wesm added a commit that referenced this pull request Jan 20, 2017
…on format.

Author: Wes McKinney <[email protected]>
Author: Nong Li <[email protected]>

Closes apache#292 from nongli/file and squashes the following commits:

18890a9 [Wes McKinney] Message fixes. Fix Java test suite. Integration tests pass
f187539 [Nong Li] Merge pull request #1 from wesm/file-change-cpp-impl
e3af434 [Wes McKinney] Remove unused variable
664d5be [Wes McKinney] Fixes, stream tests pass again
ba8db91 [Wes McKinney] Redo MessageSerializer with unions. Still has bugs
21854cc [Wes McKinney] Restore Block.bodyLength to long
7c6f7ef [Nong Li] Update to restore Block behavior
27b3909 [Nong Li] [ARROW-499]: [Java] Update file serialization to use the streaming serialization format.
@julienledem julienledem deleted the roundtrip-tool branch April 25, 2017 18:31
wesm pushed a commit that referenced this pull request Jul 26, 2017
…XXX in plasma protocol.

Related to apache#878, add DCHECK for ReadXXX.

Author: Yeolar <[email protected]>

Closes apache#887 from Yeolar/fixtypo_plasma_and_add_DCHECK and squashes the following commits:

4df63bc [Yeolar] clang-format for too long lines.
143d254 [Yeolar] Update, compile passed.
09ff103 [Yeolar] Fix conflicts.
b951d8d [Yeolar] Merge pull request #1 from apache/master
ebae611 [Yeolar] Fix typo in plasma protocol; add DCHECK for ReadXXX in plasma protocol.
wesm pushed a commit that referenced this pull request Aug 9, 2017
…ties

As per apache#872 I am upgrading Jackson to the latest version on the current train (2.7.1 --> 2.7.9)

Author: Matt Darwin <(none)>
Author: Matt <[email protected]>

Closes apache#929 from mattdarwin/ARROW-1242-upgrade-jackson and squashes the following commits:

d059517 [Matt Darwin] 1242 upgraing jackson to 2.7.9
bc3b6a0 [Matt] Merge pull request #1 from apache/master
wesm pushed a commit that referenced this pull request Aug 9, 2017
NB this commit excludes Jackson and logback upgrades, since they are dealt with in 871 and 872

Author: Matt Darwin <(none)>
Author: Matt Darwin <[email protected]>
Author: Matt <[email protected]>

Closes apache#873 from mattdarwin/upgrade-libs and squashes the following commits:

9b51f46 [Matt Darwin] Merge branch 'master' into upgrade-libs
284a4ce [Matt Darwin] Merge branch 'master' of https://github.com/apache/arrow
79550b1 [Matt Darwin] rolling back lilith to 0.9.44 since 8 doesn't support java 7
c63eef6 [Matt Darwin] Merge branch 'master' into upgrade-libs
bc3b6a0 [Matt] Merge pull request #1 from apache/master
8599ba0 [Matt Darwin] backing out guava upgrade
80d81e6 [Matt Darwin] downgrading guava to 20 for java 7 compatibility
806f348 [Matt Darwin] Merge branch 'master' into upgrade-libs
8aafb7e [Matt Darwin] correcting indentation in BaseValueVector
94c1469 [Matt Darwin] upgrading netty to 4.0.49
cff5596 [Matt Darwin] reverting to netty 4.0.41.Final
568737d [Matt Darwin] switching to Collections from Guava for empty iterator
c194e48 [Matt Darwin] upgraded hppc to 0.7.2
38be468 [Matt Darwin] upgrading libs except jackson and logback
wesm pushed a commit that referenced this pull request Aug 11, 2017
…(take 2)

sorry, this was still not fixed properly.  logback version is separately specified in 2 places.

Fixed properly this time.

Author: Matt Darwin <(none)>
Author: Matt <[email protected]>

Closes apache#960 from mattdarwin/ARROW-1240-upgrade-logback and squashes the following commits:

3492f66 [Matt Darwin] upgrading logback in tools/pom.xml
206b48d [Matt Darwin] Merge branch 'master' into ARROW-1240-upgrade-logback
284a4ce [Matt Darwin] Merge branch 'master' of https://github.com/apache/arrow
bc3b6a0 [Matt] Merge pull request #1 from apache/master
3e2f676 [Matt Darwin] Merge branch 'master' into ARROW-1240-upgrade-logback
caed163 [Matt Darwin] upgrading slf4j to 1.7.25
wesm pushed a commit that referenced this pull request Aug 11, 2017
…ties (take 2)

sorry, PR apache#929 failed to actually change the Jackson version, since the `jackson.version` variable defined in java/pom.xml is not used in java/vector/pom.xml

That's now fixed in this PR.

Author: Matt Darwin <(none)>
Author: Matt <[email protected]>

Closes apache#957 from mattdarwin/ARROW-1242-upgrade-jackson and squashes the following commits:

ad15e5f [Matt Darwin] Merge branch 'master' into ARROW-1242-upgrade-jackson
ee29d65 [Matt Darwin] Merge branch 'master' of https://github.com/apache/arrow into ARROW-1242-upgrade-jackson
06d7745 [Matt Darwin] upgrading jackson to 2.7.9 PROPERLY this time...
284a4ce [Matt Darwin] Merge branch 'master' of https://github.com/apache/arrow
d059517 [Matt Darwin] 1242 upgraing jackson to 2.7.9
bc3b6a0 [Matt] Merge pull request #1 from apache/master
wesm pushed a commit that referenced this pull request Jan 17, 2018
…is alive before enqueue new record when download file.

use pyarrow download file will raise queue.Full exceptions sometimes.
jira: https://issues.apache.org/jira/browse/ARROW-2002

Author: kmiku7 <[email protected]>

Closes apache#1485 from kmiku7/master and squashes the following commits:

8d5f905 [kmiku7] fix queue.FULL exception when writer thread write data slowly.
722182b [kmiku7] Merge pull request #1 from apache/master
wesm pushed a commit that referenced this pull request Jan 24, 2018
…lue data

Modified BinaryBuilder::Resize(int64_t) so that when building BinaryArrays with a known size, space is also reserved for value_data_builder_ to prevent internal reallocation.

Author: Panchen Xue <[email protected]>

Closes apache#1481 from xuepanchen/master and squashes the following commits:

707b67b [Panchen Xue] ARROW-1712: [C++] Fix lint errors
360e601 [Panchen Xue] Merge branch 'master' of https://github.com/xuepanchen/arrow
d4bbd15 [Panchen Xue] ARROW-1712: [C++] Modify test case for BinaryBuilder::ReserveData() and change arguments for offsets_builder_.Resize()
77f8f3c [Panchen Xue] Merge pull request #5 from apache/master
bc5db7d [Panchen Xue] ARROW-1712: [C++] Remove unneeded data member in BinaryBuilder and modify test case
5a5b70e [Panchen Xue] Merge pull request #4 from apache/master
8e4c892 [Panchen Xue] Merge pull request #3 from xuepanchen/xuepanchen-arrow-1712
d3c8202 [Panchen Xue] ARROW-1945: [C++] Fix a small typo
0b07895 [Panchen Xue] ARROW-1945: [C++] Add data_capacity_ to track capacity of value data
18f90fb [Panchen Xue] ARROW-1945: [C++] Add data_capacity_ to track capacity of value data
bbc6527 [Panchen Xue] ARROW-1945: [C++] Update test case for BinaryBuild data value space reservation
15e045c [Panchen Xue] Add test case for array-test.cc
5a5593e [Panchen Xue] Update again ReserveData(int64_t) method for BinaryBuilder
9b5e805 [Panchen Xue] Update ReserveData(int64_t) method signature for BinaryBuilder
8dd5eaa [Panchen Xue] Update builder.cc
b002e0b [Panchen Xue] Remove override keyword from ReserveData(int64_t) method for BinaryBuilder
de318f4 [Panchen Xue] Implement ReserveData(int64_t) method for BinaryBuilder
e0434e6 [Panchen Xue] Add ReserveData(int64_t) and value_data_capacity() for methods for BinaryBuilder
5ebfb32 [Panchen Xue] Add capacity() method for TypedBufferBuilder
5b73c1c [Panchen Xue] Update again BinaryBuilder::Resize(int64_t capacity) in builder.cc
d021c54 [Panchen Xue] Merge pull request #2 from xuepanchen/xuepanchen-arrow-1712
232024e [Panchen Xue] Update BinaryBuilder::Resize(int64_t capacity) in builder.cc
c2f8dc4 [Panchen Xue] Merge pull request #1 from apache/master
wesm pushed a commit that referenced this pull request Feb 2, 2018
This PR moves the `Table` class out of the Vector hierarchy and adds optimized dataframe operations to it. Currently implements an optimized `scan()` method, `filter(predicate)`, `count()`, and `countBy(column_name)` (only works on dictionary-encoded columns).

Some usage examples, based on the file generated by `js/test/data/tables/generate.py`:
``` js
> let table = Table.from(...);
> table.count()
1000000
> table.filter(col('lat').gteq(0)).count()
499718
> table.countBy('origin').toJSON()
{ Charlottesville: 166839,
  'New York': 166251,
  'San Francisco': 166642,
  Seattle: 166659,
  'Terre Haute': 166756,
  'Washington, DC': 166853 }
> table.filter(col('lng').gteq(0)).countBy('origin').toJSON()
{ Charlottesville: 83109,
  'New York': 83221,
  'San Francisco': 83515,
  Seattle: 83362,
  'Terre Haute': 83314,
  'Washington, DC': 83479 }
```
There are performance tests for the dataframe operations, to run them you must first generate the test data by running `npm run create:perfdata`.

The PR also includes @trxcllnt's refactor of the JS implementation to make it more closely resemble the C++ implementation. This refactor resolves multiple JIRAs: ARROW-1903, ARROW-1898, ARROW-1502, ARROW-1952 (partially), and ARROW-1985

Author: Paul Taylor <[email protected]>
Author: Brian Hulette <[email protected]>
Author: Brian Hulette <[email protected]>

Closes apache#1482 from TheNeuralBit/table-scan-perf and squashes the following commits:

52f1e0e [Brian Hulette] <, > are not commutative, misc cleanup
04b1838 [Brian Hulette] even more table tests
16b9ccb [Brian Hulette] Merge pull request #4 from trxcllnt/js-cpp-refactor
fe300df [Paul Taylor] fix closure es5/umd toString() iterator
3d5240a [Paul Taylor] fix more externs
10c48ad [Paul Taylor] Merge branch 'table-scan-perf' of github.com:ccri/arrow into js-cpp-refactor
dbe7f81 [Brian Hulette] Add more Table unit tests
1910962 [Brian Hulette] Add optional bind callback to scan
5bdf17f [Brian Hulette] Fix perf
8cf2473 [Brian Hulette] Merge remote-tracking branch 'origin/master' into table-scan-perf
4a41b18 [Paul Taylor] add src/predicate to the list of exports we should save from uglify
5a91fab [Paul Taylor] add more view, predicate externs
f6adfb3 [Brian Hulette] Create predicate namespace
f7bb0ed [Paul Taylor] Merge branch 'table-scan-perf' of github.com:ccri/arrow into js-cpp-refactor
e148ee4 [Paul Taylor] Merge branch 'extern-woes' into js-cpp-refactor
25cdc4a [Paul Taylor] add src/predicate to the list of exports we should save from uglify
dc7c728 [Paul Taylor] add more view, predicate externs
25e6af7 [Brian Hulette] Create predicate namespace
579ab1f [Brian Hulette] Merge pull request #2 from trxcllnt/js-cpp-refactor
f3cde1a [Paul Taylor] fix lint
9769773 [Paul Taylor] fix vector perf tests
016ba78 [Brian Hulette] Merge pull request #1 from trxcllnt/js-cpp-refactor
272d293 [Paul Taylor] Merge pull request #4 from ccri/empty-table
7bc7363 [Brian Hulette] Fix exception for empty Table
8ddce0a [Paul Taylor] check bounds in getChildAt(i) to avoid NPEs
f1dead0 [Paul Taylor] compute chunked nested childData list correctly
18807c6 [Paul Taylor] rename ChunkData's fields so it's more clear they're not semantically similar to other similarly named fields
7e43b78 [Paul Taylor] add test:integration npm script
a5f200f [Paul Taylor] Merge pull request #3 from ccri/table-from-struct
c8cd286 [Brian Hulette] Add Table.fromStruct
a00415e [Brian Hulette] Fix perf
54d4f5b [Paul Taylor] lazily allocate table and recordbatch columns, support NestedView's getChildAt(i) method in ChunkedView
40b3638 [Paul Taylor] run integration tests with local data for coverage stats
fe31ee0 [Paul Taylor] slice the flat data values before returning an iterator of them
e537789 [Paul Taylor] make it easier to run all integration tests from local data
c0fd2f9 [Paul Taylor] use the dictionary of the last chunked vector list for chunked dictionary vectors
e33c068 [Paul Taylor] Merge pull request #2 from ccri/fixed-size-list
5bb63af [Brian Hulette] Don't read OFFSET vector for FixedSizeList
614b688 [Paul Taylor] add asEpochMs to date and timestamp vectors
87334a5 [Paul Taylor] Merge branch 'table-scan-perf' of github.com:ccri/arrow into js-cpp-refactor
b7f5bfb [Paul Taylor] rename numRows to length, add table.getColumn()
e81082f [Paul Taylor] export vector views, allow cloning data as another type
700a47c [Paul Taylor] export visitors
e859e13 [Paul Taylor] fix package.json bin entry
0620cfd [Brian Hulette] use Math.fround
0126dc4 [Brian Hulette] Don't recompute total length
e761eee [Brian Hulette] Rename asJSON to toJSON
6c91ed4 [Paul Taylor] Merge branch 'master' of github.com:apache/arrow into js-cpp-refactor-merge_with-table-scan-perf
d2b18d5 [Paul Taylor] Merge remote-tracking branch 'ccri/table-scan-perf' into js-cpp-refactor-merge_with-table-scan-perf
f3f3b86 [Paul Taylor] rename table.ts to recordbatch.ts in preparation for merging latest
e3f629d [Paul Taylor] fix rest of the mangling issues
fa7c17a [Paul Taylor] passing all tests except es5 umd mangler ones
e20decd [Brian Hulette] Add license headers
edcbdbe [Brian Hulette] cleanup
20717d5 [Brian Hulette] Fixed countBy(string)
7244887 [Brian Hulette] Add table unit tests...
6719147 [Brian Hulette] Add DataFrame.countBy operation
2f4a349 [Brian Hulette] Minor tweaks
2e118ab [Brian Hulette] linter
a788db3 [Brian Hulette] Cleanup
a9fff89 [Brian Hulette] Move Table out of the Vector hierarchy
1d60aa1 [Brian Hulette] Moved DataFrame ops to Table. DataFrame is now an interface
e8979ba [Brian Hulette] Refactor DataFrame to extend Vector<StructRow>
6a41d68 [Brian Hulette] clean up table benchmarks
2744c63 [Brian Hulette] Remove Chunked/Simple DataFrame distinction
aa999f8 [Brian Hulette] Add DictionaryVector optimization for equals predicate
4d9e8c0 [Brian Hulette] Add concept of predicates for filtering dataframes
796f45d [Brian Hulette] add DataFrame filter and count ops
30f0330 [Brian Hulette] Add basic DataFrame impl ...
a1edac2 [Brian Hulette] Add perf tests for table scans
d18d915 [Paul Taylor] fix struct and map rows
61dc699 [Paul Taylor] WIP -- refactor types to closer match arrow-cpp
62db338 [Paul Taylor] update dependencies and add es6+ umd targets to jest transform ignore patterns to fix ci
6ff18e9 [Paul Taylor] ship es2015 commonJS in main package to avoid confusion
74e828a [Paul Taylor] fix typings issues (ARROW-1903)
wesm pushed a commit that referenced this pull request Jan 30, 2019
https://issues.apache.org/jira/browse/ARROW-3965

This creates an object which configures the BaseAllocator and Calendar used during to configure the translation from a JDBC ResultSet to an Arrow vector.

Author: Mike Pigott <[email protected]>
Author: Michael Pigott <[email protected]>

Closes apache#3133 from mikepigott/jdbc-to-arrow-config and squashes the following commits:

be95426 <Mike Pigott> ARROW-3965: JDBC-To-Arrow Config Builder javadocs.
d6c64a7 <Mike Pigott> ARROW-3965: JdbcToArrowConfigBuilder
d7ca982 <Mike Pigott> Merge branch 'master' into jdbc-to-arrow-config
789c8c8 <Michael Pigott> Merge pull request #4 from apache/master
e5b19ee <Michael Pigott> Merge pull request #3 from apache/master
3b17c29 <Michael Pigott> Merge pull request #2 from apache/master
5b1b364 <Mike Pigott> Merge branch 'master' into jdbc-to-arrow-config
881c6c8 <Michael Pigott> Merge pull request #1 from apache/master
bb3165b <Mike Pigott> Updating the function calls to use the JdbcToArrowConfig versions.
68c91e7 <Mike Pigott> Modifying the jdbcToArrowSchema and jdbcToArrowVectors methods to receive JdbcToArrowConfig objects.
8d6cf00 <Mike Pigott> Documentation for public static VectorSchemaRoot sqlToArrow(Connection connection, String query, JdbcToArrowConfig config)
4f1260c <Mike Pigott> Adding documentation for public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, JdbcToArrowConfig config)
df632e3 <Mike Pigott> Updating the SQL tests to include JdbcToArrowConfig versions.
b270044 <Mike Pigott> Updated validaton & documentation, and unit tests for the new JdbcToArrowConfig.
da77cbe <Mike Pigott> Creating a configuration class for the JDBC-to-Arrow converter.
wesm pushed a commit that referenced this pull request Feb 5, 2019
https://issues.apache.org/jira/browse/ARROW-3923

Hello!  I was reading through the JDBC source code and I noticed that a java.util.Calendar was required for creating an Arrow Schema and Arrow Vectors from a JDBC ResultSet, when none is required.

This change makes the Calendar optional.

Unit Tests:
The existing SureFire plugin configuration uses a UTC calendar for the database, which is the default Calendar in the existing code.  Likewise, no changes to the unit tests are required to provide adequate coverage for the change.

Author: Michael Pigott <[email protected]>
Author: Mike Pigott <[email protected]>

Closes apache#3066 from mikepigott/jdbc-timestamp-no-calendar and squashes the following commits:

4d95da0 <Mike Pigott> ARROW-3923: Supporting a null Calendar in the config, and reverting the breaking change.
cd9a230 <Mike Pigott> Merge branch 'master' into jdbc-timestamp-no-calendar
509a1cc <Michael Pigott> Merge pull request #5 from apache/master
789c8c8 <Michael Pigott> Merge pull request #4 from apache/master
e5b19ee <Michael Pigott> Merge pull request #3 from apache/master
3b17c29 <Michael Pigott> Merge pull request #2 from apache/master
881c6c8 <Michael Pigott> Merge pull request #1 from apache/master
089cff4 <Mike Pigott> Format fixes
a58a4a5 <Mike Pigott> Fixing calendar usage.
e12832a <Mike Pigott> Allowing for timestamps without a time zone.
wesm pushed a commit that referenced this pull request Feb 6, 2019
https://issues.apache.org/jira/browse/ARROW-3966

This change includes apache#3133, and supports a new configuration item called "Include Metadata."  If true, metadata from the JDBC ResultSetMetaData object is pulled along to the Schema Field Metadata.  For now, this includes:
* Catalog Name
* Table Name
* Column Name
* Column Type Name

Author: Mike Pigott <[email protected]>
Author: Michael Pigott <[email protected]>

Closes apache#3134 from mikepigott/jdbc-column-metadata and squashes the following commits:

02f2f34 <Mike Pigott> ARROW-3966: Picking up lost change to support null calendars.
7049c36 <Mike Pigott> Merge branch 'master' into jdbc-column-metadata
e9a9b2b <Michael Pigott> Merge pull request #6 from apache/master
65741a9 <Mike Pigott> ARROW-3966: Code review feedback
cc6cc88 <Mike Pigott> ARROW-3966: Using a 1:N loop instead of a 0:N-1 loop for fewer index offsets in code.
cfb2ba6 <Mike Pigott> ARROW-3966: Using a helper method for building a UTC calendar with root locale.
2928513 <Mike Pigott> ARROW-3966: Moving the metadata flag assignment into the builder.
69022c2 <Mike Pigott> ARROW-3966: Fixing merge.
4a6de86 <Mike Pigott> Merge branch 'master' into jdbc-column-metadata
509a1cc <Michael Pigott> Merge pull request #5 from apache/master
789c8c8 <Michael Pigott> Merge pull request #4 from apache/master
e5b19ee <Michael Pigott> Merge pull request #3 from apache/master
3b17c29 <Michael Pigott> Merge pull request #2 from apache/master
d847ebc <Mike Pigott> Fixing file location
1ceac9e <Mike Pigott> Merge branch 'master' into jdbc-column-metadata
881c6c8 <Michael Pigott> Merge pull request #1 from apache/master
03091a8 <Mike Pigott> Unit tests for including result set metadata.
72d64cc <Mike Pigott> Affirming the field metadata is empty when the configuration excludes field metadata.
7b4527c <Mike Pigott> Test for the include-metadata flag in the configuration.
7e9ce37 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata
bb3165b <Mike Pigott> Updating the function calls to use the JdbcToArrowConfig versions.
a6fb1be <Mike Pigott> Fixing function call
5bfd6a2 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata
68c91e7 <Mike Pigott> Modifying the jdbcToArrowSchema and jdbcToArrowVectors methods to receive JdbcToArrowConfig objects.
b5b0cb1 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata
8d6cf00 <Mike Pigott> Documentation for public static VectorSchemaRoot sqlToArrow(Connection connection, String query, JdbcToArrowConfig config)
4f1260c <Mike Pigott> Adding documentation for public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, JdbcToArrowConfig config)
e34a9e7 <Mike Pigott> Fixing formatting.
fe097c8 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata
df632e3 <Mike Pigott> Updating the SQL tests to include JdbcToArrowConfig versions.
b270044 <Mike Pigott> Updated validaton & documentation, and unit tests for the new JdbcToArrowConfig.
da77cbe <Mike Pigott> Creating a configuration class for the JDBC-to-Arrow converter.
a78c770 <Mike Pigott> Updating Javadocs.
523387f <Mike Pigott> Updating the API to support an optional 'includeMetadata' field.
5af1b5b <Mike Pigott> Separating out the field-type creation from the field creation.
wesm pushed a commit that referenced this pull request Feb 9, 2019
https://issues.apache.org/jira/browse/ARROW-3923

Hello!  I was reading through the JDBC source code and I noticed that a java.util.Calendar was required for creating an Arrow Schema and Arrow Vectors from a JDBC ResultSet, when none is required.

This change makes the Calendar optional.

Unit Tests:
The existing SureFire plugin configuration uses a UTC calendar for the database, which is the default Calendar in the existing code.  Likewise, no changes to the unit tests are required to provide adequate coverage for the change.

Author: Michael Pigott <[email protected]>
Author: Mike Pigott <[email protected]>

Closes apache#3066 from mikepigott/jdbc-timestamp-no-calendar and squashes the following commits:

4d95da0 <Mike Pigott> ARROW-3923: Supporting a null Calendar in the config, and reverting the breaking change.
cd9a230 <Mike Pigott> Merge branch 'master' into jdbc-timestamp-no-calendar
509a1cc <Michael Pigott> Merge pull request #5 from apache/master
789c8c8 <Michael Pigott> Merge pull request #4 from apache/master
e5b19ee <Michael Pigott> Merge pull request #3 from apache/master
3b17c29 <Michael Pigott> Merge pull request #2 from apache/master
881c6c8 <Michael Pigott> Merge pull request #1 from apache/master
089cff4 <Mike Pigott> Format fixes
a58a4a5 <Mike Pigott> Fixing calendar usage.
e12832a <Mike Pigott> Allowing for timestamps without a time zone.
wesm pushed a commit that referenced this pull request Feb 9, 2019
https://issues.apache.org/jira/browse/ARROW-3966

This change includes apache#3133, and supports a new configuration item called "Include Metadata."  If true, metadata from the JDBC ResultSetMetaData object is pulled along to the Schema Field Metadata.  For now, this includes:
* Catalog Name
* Table Name
* Column Name
* Column Type Name

Author: Mike Pigott <[email protected]>
Author: Michael Pigott <[email protected]>

Closes apache#3134 from mikepigott/jdbc-column-metadata and squashes the following commits:

02f2f34 <Mike Pigott> ARROW-3966: Picking up lost change to support null calendars.
7049c36 <Mike Pigott> Merge branch 'master' into jdbc-column-metadata
e9a9b2b <Michael Pigott> Merge pull request #6 from apache/master
65741a9 <Mike Pigott> ARROW-3966: Code review feedback
cc6cc88 <Mike Pigott> ARROW-3966: Using a 1:N loop instead of a 0:N-1 loop for fewer index offsets in code.
cfb2ba6 <Mike Pigott> ARROW-3966: Using a helper method for building a UTC calendar with root locale.
2928513 <Mike Pigott> ARROW-3966: Moving the metadata flag assignment into the builder.
69022c2 <Mike Pigott> ARROW-3966: Fixing merge.
4a6de86 <Mike Pigott> Merge branch 'master' into jdbc-column-metadata
509a1cc <Michael Pigott> Merge pull request #5 from apache/master
789c8c8 <Michael Pigott> Merge pull request #4 from apache/master
e5b19ee <Michael Pigott> Merge pull request #3 from apache/master
3b17c29 <Michael Pigott> Merge pull request #2 from apache/master
d847ebc <Mike Pigott> Fixing file location
1ceac9e <Mike Pigott> Merge branch 'master' into jdbc-column-metadata
881c6c8 <Michael Pigott> Merge pull request #1 from apache/master
03091a8 <Mike Pigott> Unit tests for including result set metadata.
72d64cc <Mike Pigott> Affirming the field metadata is empty when the configuration excludes field metadata.
7b4527c <Mike Pigott> Test for the include-metadata flag in the configuration.
7e9ce37 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata
bb3165b <Mike Pigott> Updating the function calls to use the JdbcToArrowConfig versions.
a6fb1be <Mike Pigott> Fixing function call
5bfd6a2 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata
68c91e7 <Mike Pigott> Modifying the jdbcToArrowSchema and jdbcToArrowVectors methods to receive JdbcToArrowConfig objects.
b5b0cb1 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata
8d6cf00 <Mike Pigott> Documentation for public static VectorSchemaRoot sqlToArrow(Connection connection, String query, JdbcToArrowConfig config)
4f1260c <Mike Pigott> Adding documentation for public static VectorSchemaRoot sqlToArrow(ResultSet resultSet, JdbcToArrowConfig config)
e34a9e7 <Mike Pigott> Fixing formatting.
fe097c8 <Mike Pigott> Merge branch 'jdbc-to-arrow-config' into jdbc-column-metadata
df632e3 <Mike Pigott> Updating the SQL tests to include JdbcToArrowConfig versions.
b270044 <Mike Pigott> Updated validaton & documentation, and unit tests for the new JdbcToArrowConfig.
da77cbe <Mike Pigott> Creating a configuration class for the JDBC-to-Arrow converter.
a78c770 <Mike Pigott> Updating Javadocs.
523387f <Mike Pigott> Updating the API to support an optional 'includeMetadata' field.
5af1b5b <Mike Pigott> Separating out the field-type creation from the field creation.
wesm pushed a commit that referenced this pull request Feb 25, 2019
…mpute module

Author: Nicolas Trinquier <[email protected]>
Author: Nicolas Trinquier <[email protected]>
Author: Neville Dipale <[email protected]>

Closes apache#3741 from ntrinquier/ARROW-4605 and squashes the following commits:

344379a <Nicolas Trinquier> Initialize vectors with a capacity
257d235 <Nicolas Trinquier> Add support for null values in limit and filter
f0578f6 <Nicolas Trinquier> Add tests for limit and filter with BinaryArray
728884b <Nicolas Trinquier> Merge pull request #1 from nevi-me/ARROW-4605
58d1f5c <Nicolas Trinquier> Merge branch 'ARROW-4605' into ARROW-4605
5a1047c <Nicolas Trinquier> Name variables consistently
2e9616b <Nicolas Trinquier> Add documentation for the limit function
2f44a8a <Nicolas Trinquier> Use the size of the array as limit instead of returning an error
6422e18 <Neville Dipale> cargo fmt
2a389a3 <Neville Dipale> create BinaryArray directly from byte slice to prevent converting to String > &str > &
b20ea6d <Nicolas Trinquier> Do bound checking in limit function
32a2f85 <Nicolas Trinquier> Add tests for limit and filter
0ca0412 <Nicolas Trinquier> Rewrite filter and limit using macros
d216fa0 <Nicolas Trinquier> Move filter and limit to array_ops
wesm pushed a commit that referenced this pull request Aug 8, 2019
This updates the language in `install_arrow()` to follow the README revision that will land in https://github.com/apache/arrow/pull/4948/files#diff-563b2cb2c8c2d51b2ff6b177e2d84286R33.

The [Jira ticket](https://issues.apache.org/jira/browse/ARROW-6142) requested three things; this is `#2` in the list. On `#1`, I defer to the C++ installation docs, which are already included in the install_arrow message, rather than duplicating content here. `#3` is out of scope.

Closes apache#5027 from nealrichardson/no-ppa and squashes the following commits:

80b142e <Neal Richardson> s/arrow/Arrow/
44c9659 <Neal Richardson> Tweak language again
36cfe28 <Neal Richardson> Further linux install revisions
79bd7e0 <Neal Richardson> One more PPurge
63f75bd <Neal Richardson> Revise install_arrow instructions for Linux

Authored-by: Neal Richardson <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
wesm pushed a commit that referenced this pull request Sep 4, 2019
According to the discussion in apache#4993 (comment), we often encountered this scenario: we compare values repeatedly. The comparisons differs only in the parameters (vector to compare, start index, etc).

According to the current API, we have to create a new RangeEqualVisitor object each time the comparison is performed. This leads to non-trivial performance overhead.

To address this problem, we make the RangeEqualVisitor reusable, and allow the client to change parameters of an existing visitor.

Closes apache#5195 from liyafan82/fly_0826_reuse and squashes the following commits:

ffe0e6a <liyafan82> Merge pull request #1 from pravindra/pull-5195
073bc78 <Pindikura Ravindra> Test: Move out Range from the visitor params
7482414 <liyafan82>  Wrapper visit parameters into a pojo
53c1e0b <liyafan82> Merge branch 'master' into fly_0826_reuse
a1f7046 <liyafan82>  Make range equal visitor reusable

Lead-authored-by: liyafan82 <[email protected]>
Co-authored-by: Pindikura Ravindra <[email protected]>
Co-authored-by: liyafan82 <[email protected]>
Signed-off-by: Pindikura Ravindra <[email protected]>
wesm pushed a commit that referenced this pull request Feb 24, 2020
…comments.

The reset method allow the data structures to be re-used so they don't have to be allocated over and over again.

Closes apache#6430 from richardartoul/ra/merge-upstream and squashes the following commits:

5a08281 <Richard Artoul> Add license to test file
d76be05 <Richard Artoul> Add test for data reset
d102b1f <Richard Artoul> Add tests
d3e6e67 <Richard Artoul> cleanup comments
c8525ae <Richard Artoul> Add Reset method to int array (#5)
489ca25 <Richard Artoul> Fix array.setData() to retain before release (#4)
88cd05f <Richard Artoul> Add reset method to Data (#3)
6d1b277 <Richard Artoul> Add Reset() method to String array (#2)
dca2303 <Richard Artoul> Add Reset method to buffer and cleanup comments (#1)

Lead-authored-by: Richard Artoul <[email protected]>
Co-authored-by: Richard Artoul <[email protected]>
Signed-off-by: Sebastien Binet <[email protected]>
wesm pushed a commit that referenced this pull request May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). apache#7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test apache#11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test apache#12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test apache#13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test apache#14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test apache#15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test apache#16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test apache#17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test apache#18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test apache#19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test apache#20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test apache#21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test apache#22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test apache#23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test apache#24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test apache#25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test apache#26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test apache#27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test apache#28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test apache#29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test apache#30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test apache#31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test apache#32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test apache#33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test apache#34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test apache#35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test apache#36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test apache#37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test apache#38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test apache#39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test apache#40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test apache#41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test apache#42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test apache#43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test apache#44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test apache#45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test apache#46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test apache#47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test apache#48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test apache#49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test apache#50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test apache#51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes apache#7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
wesm pushed a commit that referenced this pull request May 11, 2020
…lure on big-endian platforms

This PR gets an element data using an endianless API in Flatbuffer instead of getting a pointer. This can fix a failure of TestPlasmaSerialization.DeleteReply in plasma-serialization-tests.

Without this PR
```
1: [==========] Running 14 tests from 1 test case.
1: [----------] Global test environment set-up.
1: [----------] 14 tests from TestPlasmaSerialization
1: [ RUN      ] TestPlasmaSerialization.CreateRequest
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-kk8t88p9/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.CreateRequest (2 ms)
1: [ RUN      ] TestPlasmaSerialization.CreateReply
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-97gspx5v/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.CreateReply (0 ms)
1: [ RUN      ] TestPlasmaSerialization.SealRequest
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-dkksx76p/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.SealRequest (1 ms)
1: [ RUN      ] TestPlasmaSerialization.SealReply
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-oqbs9vm0/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.SealReply (0 ms)
1: [ RUN      ] TestPlasmaSerialization.GetRequest
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-d7q6h5q4/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.GetRequest (1 ms)
1: [ RUN      ] TestPlasmaSerialization.GetReply
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-sxsncs72/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.GetReply (1 ms)
1: [ RUN      ] TestPlasmaSerialization.ReleaseRequest
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-njc3g3b5/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.ReleaseRequest (0 ms)
1: [ RUN      ] TestPlasmaSerialization.ReleaseReply
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-917ybxmo/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.ReleaseReply (1 ms)
1: [ RUN      ] TestPlasmaSerialization.DeleteRequest
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-1kwauefv/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.DeleteRequest (0 ms)
1: [ RUN      ] TestPlasmaSerialization.DeleteReply
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-4ftq28pq/fileXXXXXX'
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:271: Failure
1: Value of: error_vec[0] == PlasmaError::ObjectExists
1:   Actual: false
1: Expected: true
1: [  FAILED  ] TestPlasmaSerialization.DeleteReply (1 ms)
1: [ RUN      ] TestPlasmaSerialization.EvictRequest
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-vl97870w/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.EvictRequest (0 ms)
1: [ RUN      ] TestPlasmaSerialization.EvictReply
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-3am9a6rv/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.EvictReply (1 ms)
1: [ RUN      ] TestPlasmaSerialization.DataRequest
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-plye5tmm/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.DataRequest (0 ms)
1: [ RUN      ] TestPlasmaSerialization.DataReply
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma/test/serialization_tests.cc:87: file path: '/tmp/ser-test-mbu6lqsq/fileXXXXXX'
1: [       OK ] TestPlasmaSerialization.DataReply (1 ms)
1: [----------] 14 tests from TestPlasmaSerialization (9 ms total)
1:
1: [----------] Global test environment tear-down
1: [==========] 14 tests from 1 test case ran. (9 ms total)
1: [  PASSED  ] 13 tests.
1: [  FAILED  ] 1 test, listed below:
1: [  FAILED  ] TestPlasmaSerialization.DeleteReply
1:
1:  1 FAILED TEST
1: /home/ishizaki/Arrow/arrow/cpp/src/plasma
1/3 Test #1: plasma-serialization-tests .......***Failed    0.27 sec
...
3/3 Test #3: plasma-external-store-tests ......   Passed    0.46 sec
```

With this PR
```
$ ctest
Test project /home/ishizaki/Arrow/arrow/cpp/src/plasma
    Start 1: plasma-serialization-tests
1/3 Test #1: plasma-serialization-tests .......   Passed    0.26 sec
    Start 2: plasma-client-tests
2/3 Test #2: plasma-client-tests ..............   Passed   14.99 sec
    Start 3: plasma-external-store-tests
3/3 Test #3: plasma-external-store-tests ......   Passed    0.49 sec

100% tests passed, 0 tests failed out of 3

Label Time Summary:
plasma-tests    =  15.74 sec (3 tests)
unittest        =  15.74 sec (3 tests)

Total Test time (real) =  15.74 sec
```

Closes apache#7148 from kiszk/ARROW-8759

Authored-by: Kazuaki Ishizaki <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
wesm pushed a commit that referenced this pull request Apr 25, 2021
From a deadlocked run...

```
#0  0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#3  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#4  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#5  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#6  0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#7  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#8  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#9  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
```

The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock.

To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests.

Closes apache#9842 from westonpace/bugfix/arrow-12040

Lead-authored-by: Weston Pace <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
wesm pushed a commit that referenced this pull request Jul 28, 2021
Before change:

```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x522f09 in
    #1 0x7f28ae5826f4 in
    #2 0x7f28ae57fa5d in
    #3 0x7f28ae58cb0f in
    #4 0x7f28ae58bda0 in
    ...
```

After change:
```
Direct leak of 65536 byte(s) in 1 object(s) allocated from:
    #0 0x522f09 in posix_memalign (/build/cpp/debug/arrow-dataset-file-csv-test+0x522f09)
    #1 0x7f28ae5826f4 in arrow::(anonymous namespace)::SystemAllocator::AllocateAligned(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:213:24
    #2 0x7f28ae57fa5d in arrow::BaseMemoryPoolImpl<arrow::(anonymous namespace)::SystemAllocator>::Allocate(long, unsigned char**) /arrow/cpp/src/arrow/memory_pool.cc:405:5
    #3 0x7f28ae58cb0f in arrow::PoolBuffer::Reserve(long) /arrow/cpp/src/arrow/memory_pool.cc:717:9
    #4 0x7f28ae58bda0 in arrow::PoolBuffer::Resize(long, bool) /arrow/cpp/src/arrow/memory_pool.cc:741:7
    ...
```

Closes apache#10498 from westonpace/feature/ARROW-13027--c-fix-asan-stack-traces-in-ci

Authored-by: Weston Pace <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
wesm pushed a commit that referenced this pull request May 16, 2022
Error log of Valgrind failure:
```
[----------] 3 tests from TestArrowReadDeltaEncoding
[ RUN      ] TestArrowReadDeltaEncoding.DeltaBinaryPacked
[       OK ] TestArrowReadDeltaEncoding.DeltaBinaryPacked (812 ms)
[ RUN      ] TestArrowReadDeltaEncoding.DeltaByteArray
==12587== Conditional jump or move depends on uninitialised value(s)
==12587==    at 0x4F12C57: Advance (bit_stream_utils.h:426)
==12587==    by 0x4F12C57: parquet::(anonymous namespace)::DeltaBitPackDecoder<parquet::PhysicalType<(parquet::Type::type)1> >::GetInternal(int*, int) (encoding.cc:2216)
==12587==    by 0x4F13823: Decode (encoding.cc:2091)
==12587==    by 0x4F13823: parquet::(anonymous namespace)::DeltaByteArrayDecoder::SetData(int, unsigned char const*, int) (encoding.cc:2360)
==12587==    by 0x4E89EF5: parquet::(anonymous namespace)::ColumnReaderImplBase<parquet::PhysicalType<(parquet::Type::type)6> >::InitializeDataDecoder(parquet::DataPage const&, long) (column_reader.cc:797)
==12587==    by 0x4E9AE63: ReadNewPage (column_reader.cc:614)
==12587==    by 0x4E9AE63: HasNextInternal (column_reader.cc:576)
==12587==    by 0x4E9AE63: parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)6> >::ReadRecords(long) (column_reader.cc:1228)
==12587==    by 0x4DFB19F: parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long) (reader.cc:467)
==12587==    by 0x4DF513C: parquet::arrow::ColumnReaderImpl::NextBatch(long, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:108)
==12587==    by 0x4DFB74D: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::vector<int, std::allocator<int> > const&, parquet::arrow::ColumnReader*, std::shared_ptr<arrow::ChunkedArray>*) (reader.cc:273)
==12587==    by 0x4E11FDA: operator() (reader.cc:1180)
==12587==    by 0x4E11FDA: arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > arrow::internal::OptionalParallelForAsync<parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::shared_ptr<arrow::ChunkedArray> >(bool, std::vector<std::shared_ptr<parquet::arrow::ColumnReaderImpl>, std::allocator<arrow::Future<std::vector<std::shared_ptr<arrow::ChunkedArray>, std::allocator<arrow::Future> > > > >, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*)::{lambda(unsigned long, std::shared_ptr<parquet::arrow::ColumnReaderImpl>)#1}&, arrow::internal::Executor*) (parallel.h:95)
==12587==    by 0x4E126A9: parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, arrow::internal::Executor*) (reader.cc:1198)
==12587==    by 0x4E12F50: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups(std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:1160)
==12587==    by 0x4DFA2BC: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::vector<int, std::allocator<int> > const&, std::shared_ptr<arrow::Table>*) (reader.cc:198)
==12587==    by 0x4DFA392: parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadTable(std::shared_ptr<arrow::Table>*) (reader.cc:289)
==12587==    by 0x1DCE62: parquet::arrow::TestArrowReadDeltaEncoding::ReadTableFromParquetFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<arrow::Table>*) (arrow_reader_writer_test.cc:4174)
==12587==    by 0x2266D2: parquet::arrow::TestArrowReadDeltaEncoding_DeltaByteArray_Test::TestBody() (arrow_reader_writer_test.cc:4209)
==12587==    by 0x4AD2C9B: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607)
==12587==    by 0x4AC9DD1: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643)
==12587==    by 0x4AA4C02: testing::Test::Run() (gtest.cc:2682)
==12587==    by 0x4AA563A: testing::TestInfo::Run() (gtest.cc:2861)
==12587==    by 0x4AA600F: testing::TestSuite::Run() (gtest.cc:3015)
==12587==    by 0x4AB631B: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5855)
==12587==    by 0x4AD3CE7: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2607)
==12587==    by 0x4ACB063: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2643)
==12587==    by 0x4AB47B6: testing::UnitTest::Run() (gtest.cc:5438)
==12587==    by 0x4218918: RUN_ALL_TESTS() (gtest.h:2490)
==12587==    by 0x421895B: main (gtest_main.cc:52)
```

Closes apache#11725 from pitrou/ARROW-14704-parquet-valgrind

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
wesm pushed a commit that referenced this pull request May 16, 2022
TODOs:
Convert cheat sheet to PDF and hide slide #1.

Closes apache#12445 from pachadotdev/patch-4

Lead-authored-by: Stephanie Hazlitt <[email protected]>
Co-authored-by: Pachá <[email protected]>
Co-authored-by: Mauricio Vargas <[email protected]>
Co-authored-by: Pachá <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants