Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet reader error: presetNullsConsumed_ == presetNullsSize_ #9238

Closed
qqibrow opened this issue Mar 25, 2024 · 4 comments
Closed

parquet reader error: presetNullsConsumed_ == presetNullsSize_ #9238

qqibrow opened this issue Mar 25, 2024 · 4 comments
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.

Comments

@qqibrow
Copy link
Collaborator

qqibrow commented Mar 25, 2024

Bug description

E0322 22:03:58.533190 2827883 Exceptions.h:69] Line: ../.././velox/dwio/parquet/reader/ParquetData.h:94, Function:setNulls, Expression: presetNullsConsumed_ == presetNullsSize_ (843 vs. 2767), Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (843 vs. 2767)
Retriable: False
Expression: presetNullsConsumed_ == presetNullsSize_
Function: setNulls
File: ../.././velox/dwio/parquet/reader/ParquetData.h
Line: 94
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1}>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1})
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxRuntimeError::VeloxRuntimeError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  facebook::velox::parquet::ParquetData::setNulls(boost::intrusive_ptr<facebook::velox::Buffer>&, int)
# 5  facebook::velox::parquet::ListColumnReader::setLengthsFromRepDefs(facebook::velox::parquet::PageReader&)
# 6  facebook::velox::parquet::(anonymous namespace)::readLeafRepDefs(facebook::velox::dwio::common::SelectiveColumnReader*, int, bool)
# 7  facebook::velox::parquet::(anonymous namespace)::readLeafRepDefs(facebook::velox::dwio::common::SelectiveColumnReader*, int, bool)
# 8  facebook::velox::parquet::ensureRepDefs(facebook::velox::dwio::common::SelectiveColumnReader&, int)
# 9  facebook::velox::parquet::MapColumnReader::read(int, folly::Range<int const*>, unsigned long const*)
# 10 facebook::velox::dwio::common::SelectiveStructColumnReaderBase::read(int, folly::Range<int const*>, unsigned long const*)
# 11 facebook::velox::parquet::StructColumnReader::read(int, folly::Range<int const*>, unsigned long const*)
# 12 facebook::velox::dwio::common::SelectiveStructColumnReaderBase::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
# 13 facebook::velox::parquet::ParquetRowReader::Impl::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
# 14 facebook::velox::parquet::ParquetRowReader::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
# 15 main
# 16 __libc_start_main
# 17 _start

*** Aborted at 1711145038 (Unix time, try 'date -d @1711145038') ***
*** Signal 6 (SIGABRT) (0x3e47002b266b) received by PID 2827883 (pthread TID 0x7fba50379b40) (linux TID 2827883) (maybe from PID 2827883, UID 15943) (code: -6), stack trace: ***
    @ 0000000002069cd3 folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
                       /home/lniu/code/presto/presto-native-execution/dependencies/folly/_build/../folly/experimental/symbolizer/SignalHandler.cpp:449
    @ 0000000002069db4 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       /home/lniu/code/presto/presto-native-execution/dependencies/folly/_build/../folly/experimental/symbolizer/SignalHandler.cpp:470
    @ 000000000001441f (unknown)
    @ 000000000004300b gsignal
    @ 0000000000022858 abort
    @ 000000000009e910 (unknown)
    @ 00000000000aa38b (unknown)
    @ 00000000000aa3f6 std::terminate()
    @ 00000000000aa6a8 __cxa_throw
    @ 0000000001f1c522 __cxa_throw
                       /home/lniu/code/presto/presto-native-execution/dependencies/folly/_build/../folly/experimental/exception_tracer/ExceptionTracerLib.cpp:159
    @ 0000000001df424a void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
                       /home/lniu/code/velox_new/velox/_build/debug/../.././velox/common/base/Exceptions.h:85
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/common/base/Exceptions.cpp
    @ 00000000017adaaf facebook::velox::parquet::ParquetData::setNulls(boost::intrusive_ptr<facebook::velox::Buffer>&, int)
                       /home/lniu/code/velox_new/velox/_build/debug/../.././velox/dwio/parquet/reader/ParquetData.h:94
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/RepeatedColumnReader.cpp
    @ 00000000017ad4c5 facebook::velox::parquet::ListColumnReader::setLengthsFromRepDefs(facebook::velox::parquet::PageReader&)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/RepeatedColumnReader.cpp:286
    @ 00000000017abd95 facebook::velox::parquet::(anonymous namespace)::readLeafRepDefs(facebook::velox::dwio::common::SelectiveColumnReader*, int, bool)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/RepeatedColumnReader.cpp:45
    @ 00000000017abe17 facebook::velox::parquet::(anonymous namespace)::readLeafRepDefs(facebook::velox::dwio::common::SelectiveColumnReader*, int, bool)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/RepeatedColumnReader.cpp:50
    @ 00000000017ac329 facebook::velox::parquet::ensureRepDefs(facebook::velox::dwio::common::SelectiveColumnReader&, int)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/RepeatedColumnReader.cpp:108
    @ 00000000017acd79 facebook::velox::parquet::MapColumnReader::read(int, folly::Range<int const*>, unsigned long const*)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/RepeatedColumnReader.cpp:190
    @ 0000000001a96e55 facebook::velox::dwio::common::SelectiveStructColumnReaderBase::read(int, folly::Range<int const*>, unsigned long const*)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/common/SelectiveStructColumnReader.cpp:170
    @ 00000000017b42e7 facebook::velox::parquet::StructColumnReader::read(int, folly::Range<int const*>, unsigned long const*)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/StructColumnReader.cpp:103
    @ 0000000001a96663 facebook::velox::dwio::common::SelectiveStructColumnReaderBase::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/common/SelectiveStructColumnReader.cpp:91
    @ 00000000010b80cb facebook::velox::parquet::ParquetRowReader::Impl::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:789
    @ 00000000010b1492 facebook::velox::parquet::ParquetRowReader::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:876
    @ 000000000109b8b5 main
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/tests/reader/ParquetReaderExample.cpp:101
    @ 0000000000024082 __libc_start_main
    @ 0000000001098ccd _start
Aborted

System information

Velox System Info v0.0.2
Commit: 1e186e548833750cdee4b95d829711ddad78aba1
CMake Version: 3.16.3
System: Linux-5.4.0-1063-aws
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 9.4.0
C Compiler: /usr/bin/cc
C Compiler Version: 9.4.0
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

N/A
@qqibrow qqibrow added bug Something isn't working triage Newly created issue that needs attention. parquet labels Mar 25, 2024
@qqibrow
Copy link
Collaborator Author

qqibrow commented Mar 25, 2024

files to reproduce the issue:
https://www.dropbox.com/scl/fi/pseqazjyity58yy5bzbn5/test7057647351746625583parquet?rlkey=7dol5hxsfeq7tuhqr2lzpprnk&dl=0

lniu@lniu-FXGFKFV Downloads % parquet head  test7057647351746625583parquet
{"test": [{"0": [], "1": [0], "2": [1, 2]}]}
{"test": []}
{"test": [{"3": null, "4": [3, 4, 5]}, {"5": [6], "6": [7], "7": null}]}
{"test": [{"8": [8, 9, 10, 11, 12, 13], "9": [14]}]}
{"test": []}
{"test": [{}, {"11": null, "12": [16], "13": [], "14": [], "15": null, "16": [], "17": [17, 18, 19, 20, 21], "18": [], "19": null, "10": [15]}]}
{"test": [{"20": [22, 23]}, {}]}
lniu@lniu-FXGFKFV Downloads % parquet schema test7057647351746625583parquet
{
  "type" : "record",
  "name" : "hive_schema",
  "fields" : [ {
    "name" : "test",
    "type" : [ "null", {
      "type" : "array",
      "items" : {
        "type" : "map",
        "values" : [ "null", {
          "type" : "array",
          "items" : "int"
        } ]
      }
    } ],
    "default" : null
  } ]
}

@qqibrow
Copy link
Collaborator Author

qqibrow commented Mar 25, 2024

@jaystarshot @hitarth @qqibrow are working on it.

@qqibrow
Copy link
Collaborator Author

qqibrow commented Mar 26, 2024

looks a schema mismatch issue:
the output shows test should be an array. but velox shows it is a map:

velox type: ROW<test:MAP<VARCHAR,ARRAY<INTEGER>>>

@hitarth
Copy link
Collaborator

hitarth commented Mar 26, 2024

Here is schema of the file

message hive_schema {
  optional group test (LIST) {
    repeated group array (MAP) {
      repeated group key_value (MAP_KEY_VALUE) {
        required binary key (UTF8);
        optional group value (LIST) {
          repeated int32 array;
        }
      }
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parquet triage Newly created issue that needs attention.
Projects
None yet
2 participants