Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-24: C++: Implement a logical Table container type #16

Closed
wants to merge 3 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Mar 5, 2016

A table enables us to interpret a collection of Arrow arrays as a logical table or "data frame"-like structure. Each column may consist of one or more "primitive" Arrow memory containers.

Note that this currently has the limitation that the table column names must be strings. At least, this is consistent with most storage media and up-stack table implementations (e.g. R's data.frame).

Currently this is somewhat limited in the arrangement of data (a vector of chunked columns -- the columns may contain only one data chunk) -- since a Table might be assembled from a vector of row batches (coming across the wire), "pivoting" the row batches might have performance implications that we can examine further on down the road.

@asfgit asfgit closed this in 612fbc7 Mar 5, 2016
@wesm wesm deleted the ARROW-24 branch March 5, 2016 02:00
std::string name_;

std::shared_ptr<Schema> schema_;
std::vector<std::shared_ptr<Column> > columns_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nitpick. C++11 doesn't require the spaces between right brackets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if anyone has created c++11-mode for emacs (I have mostly done this because of emacs -- C++0xbased -- formatting reasons)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the github tool chain, but it might be nice to have clang-format run on c++ checkins to make this consistent. At least it might be nice to make it the standard for the project. It even supports emacs :-) http://clang.llvm.org/docs/ClangFormat.html#emacs-integration

@emkornfield
Copy link
Contributor

Sorry for the multiple rounds of reviews, I had interruptions along the ways. I don't think I have anything left to add.

@wesm
Copy link
Member Author

wesm commented Mar 17, 2016

No problem, it's very much appreciated. I'll incorporate the feedback either in #28 or in a follow-up patch

wesm added a commit to wesm/arrow that referenced this pull request Mar 19, 2016
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 2, 2018
This is based off of pull request apache/parquet-cpp#14.

Author: Kalon Mills <[email protected]>

Closes apache#16 from kalaxy/libparquet-library-update-build and squashes the following commits:

0ce51db [Kalon Mills] Add script for automating build env setup.
82a198c [Kalon Mills] Make thrift build on mac only a warning when not specified explicitly.
d096c64 [Kalon Mills] Update build instructions.
6709182 [Kalon Mills] Support thrift dependency in thirdparty scripts for linux.
bedd0d4 [Kalon Mills] Remove thirdparty code lz4 from repo.
30c2b7e [Kalon Mills] Support build environment configuration of LZ4 library.
73e7785 [Kalon Mills] Support build environment configuration of Snappy library.
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 4, 2018
This is based off of pull request apache/parquet-cpp#14.

Author: Kalon Mills <[email protected]>

Closes apache#16 from kalaxy/libparquet-library-update-build and squashes the following commits:

0ce51db [Kalon Mills] Add script for automating build env setup.
82a198c [Kalon Mills] Make thrift build on mac only a warning when not specified explicitly.
d096c64 [Kalon Mills] Update build instructions.
6709182 [Kalon Mills] Support thrift dependency in thirdparty scripts for linux.
bedd0d4 [Kalon Mills] Remove thirdparty code lz4 from repo.
30c2b7e [Kalon Mills] Support build environment configuration of LZ4 library.
73e7785 [Kalon Mills] Support build environment configuration of Snappy library.

Change-Id: I266283fc15457a6116fc944b29656d071a917159
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 6, 2018
This is based off of pull request apache/parquet-cpp#14.

Author: Kalon Mills <[email protected]>

Closes apache#16 from kalaxy/libparquet-library-update-build and squashes the following commits:

0ce51db [Kalon Mills] Add script for automating build env setup.
82a198c [Kalon Mills] Make thrift build on mac only a warning when not specified explicitly.
d096c64 [Kalon Mills] Update build instructions.
6709182 [Kalon Mills] Support thrift dependency in thirdparty scripts for linux.
bedd0d4 [Kalon Mills] Remove thirdparty code lz4 from repo.
30c2b7e [Kalon Mills] Support build environment configuration of LZ4 library.
73e7785 [Kalon Mills] Support build environment configuration of Snappy library.

Change-Id: I266283fc15457a6116fc944b29656d071a917159
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 7, 2018
This is based off of pull request apache/parquet-cpp#14.

Author: Kalon Mills <[email protected]>

Closes apache#16 from kalaxy/libparquet-library-update-build and squashes the following commits:

0ce51db [Kalon Mills] Add script for automating build env setup.
82a198c [Kalon Mills] Make thrift build on mac only a warning when not specified explicitly.
d096c64 [Kalon Mills] Update build instructions.
6709182 [Kalon Mills] Support thrift dependency in thirdparty scripts for linux.
bedd0d4 [Kalon Mills] Remove thirdparty code lz4 from repo.
30c2b7e [Kalon Mills] Support build environment configuration of LZ4 library.
73e7785 [Kalon Mills] Support build environment configuration of Snappy library.

Change-Id: I266283fc15457a6116fc944b29656d071a917159
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 8, 2018
This is based off of pull request apache/parquet-cpp#14.

Author: Kalon Mills <[email protected]>

Closes apache#16 from kalaxy/libparquet-library-update-build and squashes the following commits:

0ce51db [Kalon Mills] Add script for automating build env setup.
82a198c [Kalon Mills] Make thrift build on mac only a warning when not specified explicitly.
d096c64 [Kalon Mills] Update build instructions.
6709182 [Kalon Mills] Support thrift dependency in thirdparty scripts for linux.
bedd0d4 [Kalon Mills] Remove thirdparty code lz4 from repo.
30c2b7e [Kalon Mills] Support build environment configuration of LZ4 library.
73e7785 [Kalon Mills] Support build environment configuration of Snappy library.

Change-Id: I266283fc15457a6116fc944b29656d071a917159
xuechendi added a commit to xuechendi/arrow that referenced this pull request Dec 17, 2019
kou pushed a commit that referenced this pull request May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test #20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test #21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test #22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test #23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test #24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test #25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test #26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test #27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test #28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test #29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test #30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test #31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test #32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test #33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test #34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test #35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test #36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test #37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test #38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test #39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test #40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test #41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test #42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test #43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test #44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test #45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test #46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test #47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test #48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test #49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test #50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test #51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes #7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
projjal added a commit to projjal/arrow that referenced this pull request Aug 7, 2020
projjal added a commit to projjal/arrow that referenced this pull request Aug 20, 2020
projjal added a commit to projjal/arrow that referenced this pull request Aug 20, 2020
projjal added a commit to projjal/arrow that referenced this pull request Aug 20, 2020
projjal added a commit to projjal/arrow that referenced this pull request Aug 20, 2020
projjal added a commit to projjal/arrow that referenced this pull request Aug 20, 2020
projjal added a commit to projjal/arrow that referenced this pull request Aug 20, 2020
jayhomn-bitquill pushed a commit to Bit-Quill/arrow that referenced this pull request Aug 10, 2022
…e#16)

* Add support for getUnicodeStream

* nit: remove outofscope changes

* Fix small mistake in one of the added tests

* nit: remove unused imports

* Simplify PR
rtpsw pushed a commit to rtpsw/arrow that referenced this pull request Oct 23, 2022
icexelloss pushed a commit to icexelloss/arrow that referenced this pull request Oct 28, 2022
felipecrv added a commit to felipecrv/arrow that referenced this pull request Apr 5, 2024
…ormatting buffer

With ASAN, this reproduces the issue.

    [ RUN      ] Formatting.Timestamp
    =================================================================
    ==4191383==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff1804c48f at pc 0x5608edffe39d bp 0x7fff1804c110 sp 0x7fff1804c108
    WRITE of size 1 at 0x7fff1804c48f thread T0
        #0 0x5608edffe39c in arrow::internal::detail::FormatOneChar(char, char**) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h:132:67
        apache#1 0x5608ee035c00 in arrow::internal::detail::FormatYYYY_MM_DD(arrow_vendored::date::year_month_day, char**) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h:351:5
        apache#2 0x5608ee05e8a0 in decltype(std::declval<arrow::StringAppender&>()(std::basic_string_view<char, std::char_traits<char> >{})) arrow::internal::StringFormatter<arrow::TimestampType, void>::operator()<std::chrono::duration<long, std::ratio<1l, 1000l> >, arrow::StringAppender&>(std::chrono::duration<long, std::ratio<1l, 1000l> >, long, arrow::StringAppender&) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h
    :521:5
        apache#3 0x5608ee05d60f in decltype(std::declval<arrow::internal::StringFormatter<arrow::TimestampType, void>&>()(std::chrono::duration<long, std::ratio<1l, 1l> >{}, std::declval<long&>(), std::declval<arrow::StringAppender&>())) arrow::util::VisitDuration<arrow::internal::StringFormatter<arrow::TimestampType, void>&, long&, arrow::StringAppender&>(arrow::TimeUnit::type, arrow::internal::StringFormatter<arrow::Timestam
    pType, void>&, long&, arrow::StringAppender&) /home/felipeo/code/arrow/cpp/src/arrow/util/time.h:60:14
        apache#4 0x5608ee05d122 in decltype(std::declval<arrow::StringAppender&>()(std::basic_string_view<char, std::char_traits<char> >{})) arrow::internal::StringFormatter<arrow::TimestampType, void>::operator()<arrow::StringAppender&>(long, arrow::StringAppender&) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h:527:12
        apache#5 0x5608edfeffb3 in void arrow::AssertFormatting<arrow::internal::StringFormatter<arrow::TimestampType, void>, long>(arrow::internal::StringFormatter<arrow::TimestampType, void>&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting_util_test.cc:52:3
        apache#6 0x5608edfece95 in arrow::Formatting_Timestamp_Test::TestBody() /home/felipeo/code/arrow/cpp/src/arrow/util/formatting_util_test.cc:540:5
        apache#7 0x7fd95d7901de in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd901de) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#8 0x7fd95d784905 in testing::Test::Run() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd84905) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#9 0x7fd95d784a84 in testing::TestInfo::Run() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd84a84) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#10 0x7fd95d785038 in testing::TestSuite::Run() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd85038) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#11 0x7fd95d78573e in testing::internal::UnitTestImpl::RunAllTests() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd8573e) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#12 0x7fd95d7907a6 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd907a6) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#13 0x7fd95d784b4b in testing::UnitTest::Run() (/home/felipeo/code/arrow/cpp/ninja/debug/libarrow_testing.so.1600+0xd84b4b) (BuildId: dd9af0bafdb1786050262e8f6002568a9f08ecf6)
        apache#14 0x5608ee54506d in RUN_ALL_TESTS() /usr/include/gtest/gtest.h:2490:46
        apache#15 0x5608ee544fb9 in main /home/felipeo/code/arrow/cpp/src/arrow/util/logging_test.cc:129:10
        apache#16 0x7fd93de29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
        apache#17 0x7fd93de29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
        apache#18 0x5608ed7b4f64 in _start (/home/felipeo/code/arrow/cpp/ninja/debug/arrow-utility-test+0x1224f64) (BuildId: 81cfdc36b7a960a7249ecd5884beaa869140ab89)

    Address 0x7fff1804c48f is located in stack of thread T0 at offset 399 in frame
        #0 0x5608ee05dc9f in decltype(std::declval<arrow::StringAppender&>()(std::basic_string_view<char, std::char_traits<char> >{})) arrow::internal::StringFormatter<arrow::TimestampType, void>::operator()<std::chrono::duration<long, std::ratio<1l, 1000l> >, arrow::StringAppender&>(std::chrono::duration<long, std::ratio<1l, 1000l> >, long, arrow::StringAppender&) /home/felipeo/code/arrow/cpp/src/arrow/util/formatting.h
    :486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants