Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] failed ut: "read data from orc file format" and "test table bucketed by all typed columns" #7823

Closed
taiyang-li opened this issue Nov 6, 2024 · 3 comments · Fixed by #7917
Labels
bug Something isn't working triage

Comments

@taiyang-li
Copy link
Contributor

Backend

CH (ClickHouse)

Bug description

09:04:33.819 ERROR org.apache.spark.task.TaskResources: Task 1 failed by error: 
org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Input value -1094795586 of a column "date_field" exceeds the range of type Date32: (in file/uri /data1/liyang/cppproject/gluten/backends-clickhouse/target/scala-2.12/test-classes/orc-data/all_data_types_with_non_primitive_type.snappy.orc): While executing SubstraitFileSource
0. ./contrib/llvm-project/libcxx/include/exception:141: Poco::Exception::Exception(String const&, int) @ 0x0000000030bfadb4
1. ./build_gcc/./src/Common/Exception.cpp:109: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000001a9d6c09
2. ./src/Common/Exception.h:110: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009e064c5
3. ./src/Common/Exception.h:128: DB::Exception::Exception<int&, String const&>(int, FormatStringHelperImpl<std::type_identity<int&>::type, std::type_identity<String const&>::type>, int&, String const&) @ 0x0000000029b9b3fa
4. ./build_gcc/./src/Processors/Formats/Impl/NativeORCBlockInputFormat.cpp:1554: DB::ORCColumnToCHColumn::readColumnFromORCColumn(orc::ColumnVectorBatch const*, orc::Type const*, String const&, bool, std::shared_ptr<DB::IDataType const>) const @ 0x0000000029b84eb7
5. ./build_gcc/./src/Processors/Formats/Impl/NativeORCBlockInputFormat.cpp:1601: DB::ORCColumnToCHColumn::readColumnFromORCColumn(orc::ColumnVectorBatch const*, orc::Type const*, String const&, bool, std::shared_ptr<DB::IDataType const>) const @ 0x0000000029b793aa
6. ./build_gcc/./src/Processors/Formats/Impl/NativeORCBlockInputFormat.cpp:1892: DB::ORCColumnToCHColumn::orcColumnsToCHChunk(DB::Chunk&, std::unordered_map<String, std::pair<orc::ColumnVectorBatch const*, orc::Type const*>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, std::pair<orc::ColumnVectorBatch const*, orc::Type const*>>>>&, unsigned long, DB::BlockMissingValues*) @ 0x0000000029b75a04
7. ./build_gcc/./src/Processors/Formats/Impl/NativeORCBlockInputFormat.cpp:1136: DB::ORCColumnToCHColumn::orcTableToCHChunk(DB::Chunk&, orc::Type const*, orc::ColumnVectorBatch const*, unsigned long, DB::BlockMissingValues*) @ 0x0000000029b7134a
8. ./build_gcc/./src/Processors/Formats/Impl/NativeORCBlockInputFormat.cpp:1034: DB::NativeORCBlockInputFormat::read() @ 0x0000000029b70986
9. ./build_gcc/./src/Processors/Formats/IInputFormat.cpp:19: DB::IInputFormat::generate() @ 0x0000000029a96486
10. ./build_gcc/./utils/extern-local-engine/Storages/SubstraitSource/SubstraitFileSource.cpp:375: local_engine::NormalFileReader::pull(DB::Chunk&) @ 0x000000001b9d23d7
11. ./build_gcc/./utils/extern-local-engine/Storages/SubstraitSource/SubstraitFileSource.cpp:113: local_engine::SubstraitFileSource::generate() @ 0x000000001b9cc72d
12. ./build_gcc/./src/Processors/ISource.cpp:139: DB::ISource::tryGenerate() @ 0x0000000029a3fa38
13. ./build_gcc/./src/Processors/ISource.cpp:108: DB::ISource::work() @ 0x0000000029a3eec1
14. ./build_gcc/./src/Processors/Executors/ExecutionThreadContext.cpp:47: DB::ExecutionThreadContext::executeTask() @ 0x0000000029a7c98e
15. ./build_gcc/./src/Processors/Executors/PipelineExecutor.cpp:289: DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000029a601b1
16. ./build_gcc/./src/Processors/Executors/PipelineExecutor.cpp:163: DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x0000000029a5f012
17. ./build_gcc/./src/Processors/Executors/PullingPipelineExecutor.cpp:54: DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x0000000029a8c116
18. ./build_gcc/./src/Processors/Executors/PullingPipelineExecutor.cpp:65: DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x0000000029a8c676
19. ./build_gcc/./utils/extern-local-engine/Parser/LocalExecutor.cpp:68: local_engine::LocalExecutor::hasNext() @ 0x000000001b2b4233
20. ./build_gcc/./utils/extern-local-engine/local_engine_jni.cpp:308: Java_org_apache_gluten_vectorized_BatchIterator_nativeHasNext @ 0x0000000009dd7493

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@taiyang-li taiyang-li added bug Something isn't working triage labels Nov 6, 2024
@taiyang-li
Copy link
Contributor Author

Reason: After ClickHouse/ClickHouse#69473, DataBuffer constructor will not set each element to zero, which means that after ColumnReader::next, only batch.data[i] in those rows with batch.notNulls[i] != 0 will be initialized. For rows with batch.notNulls[i] == 0, batch.data[i] is not initialized.

@taiyang-li
Copy link
Contributor Author

taiyang-li commented Nov 6, 2024

The changes which caused uninitialized batch.data[i].

   template <class T>
   DataBuffer<T>::DataBuffer(MemoryPool& pool, uint64_t newSize)
-      : memoryPool(pool), buf(nullptr), currentSize(0), currentCapacity(0) {
-    resize(newSize);
+      : memoryPool_(pool), buf_(nullptr), currentSize_(0), currentCapacity_(0) {
+    reserve(newSize);
+    currentSize_ = newSize;
   } 

@taiyang-li
Copy link
Contributor Author

Currently ignored by #7821

@taiyang-li taiyang-li changed the title [CH] failed ut: "read data from orc file format" [CH] failed ut: "read data from orc file format" and "test table bucketed by all typed columns" Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
1 participant