Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 read_ahead_count and reading errors #5096

Closed
devinrsmith opened this issue Feb 1, 2024 · 1 comment · Fixed by #5137
Closed

s3 read_ahead_count and reading errors #5096

devinrsmith opened this issue Feb 1, 2024 · 1 comment · Fixed by #5137
Assignees
Labels
bug Something isn't working parquet Related to the Parquet integration s3 triage

Comments

@devinrsmith
Copy link
Member

from deephaven import parquet
from deephaven.experimental.s3 import S3Instructions

from datetime import timedelta

y = parquet.read(
    "s3://drivestats-parquet/drivestats/year=2023/month=02/2023-02-1.parquet",
    special_instructions=S3Instructions(
        "us-west-004",
        aws_access_key_id="0045f0571db506a0000000007",
        aws_secret_access_key="K004cogT4GIeHHfhCyPPLsPBT4NyY1A",
        endpoint_override="https://s3.us-west-004.backblazeb2.com/",
        read_ahead_count=16,
        read_timeout=timedelta(seconds=30),
        fragment_size=8192,
    ),
).coalesce()

causes

2024-02-01T00:11:10.414Z | heduler-Concurrent-1 | ERROR | i.d.s.s.SessionService    | Internal Error '61a9b31b-5749-47e4-8b62-88cd27f7265d' java.io.UncheckedIOException: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPage(VariablePageSizeColumnChunkPageStore.java:130)
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:170)
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:22)
        at io.deephaven.engine.page.PageStore.fillChunk(PageStore.java:81)
        at io.deephaven.parquet.table.region.ParquetColumnRegionBase.fillChunk(ParquetColumnRegionBase.java:57)
        at io.deephaven.engine.table.impl.sources.regioned.DeferredColumnRegionBase.fillChunk(DeferredColumnRegionBase.java:82)
        at io.deephaven.engine.page.PageStore.fillChunk(PageStore.java:85)
        at io.deephaven.engine.table.impl.sources.regioned.RegionedColumnSourceBase.fillChunk(RegionedColumnSourceBase.java:56)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.getSnapshotDataAsChunkList(ConstructSnapshot.java:1641)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.serializeAllTable(ConstructSnapshot.java:1531)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.lambda$constructBackplaneSnapshotInPositionSpace$2(ConstructSnapshot.java:698)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:1210)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:1152)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.constructBackplaneSnapshotInPositionSpace(ConstructSnapshot.java:703)
        at io.deephaven.server.barrage.BarrageMessageProducer.getSnapshot(BarrageMessageProducer.java:2244)
        at io.deephaven.server.barrage.BarrageMessageProducer.updateSubscriptionsSnapshotAndPropagate(BarrageMessageProducer.java:1336)
        at io.deephaven.server.barrage.BarrageMessageProducer$UpdatePropagationJob.run(BarrageMessageProducer.java:1012)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
        at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:112)
        at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
        at org.xerial.snappy.Snappy.uncompress(Snappy.java:551)
        at org.apache.parquet.hadoop.codec.SnappyDecompressor.uncompress(SnappyDecompressor.java:30)
        at org.apache.parquet.hadoop.codec.NonBlockedDecompressor.decompress(NonBlockedDecompressor.java:73)
        at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
        at java.base/java.io.DataInputStream.readFully(DataInputStream.java:208)
        at java.base/java.io.DataInputStream.readFully(DataInputStream.java:179)
        at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:286)
        at org.apache.parquet.bytes.BytesInput.copy(BytesInput.java:202)
        at io.deephaven.parquet.compress.DeephavenCompressorAdapterFactory$CodecWrappingCompressorAdapter.decompress(DeephavenCompressorAdapterFactory.java:145)
        at io.deephaven.parquet.base.ColumnPageReaderImpl.readDataPage(ColumnPageReaderImpl.java:278)
        at io.deephaven.parquet.base.ColumnPageReaderImpl.materialize(ColumnPageReaderImpl.java:112)
        at io.deephaven.parquet.table.pagestore.topage.ToPage.getResult(ToPage.java:59)
        at io.deephaven.parquet.table.pagestore.topage.ToPage.toPage(ToPage.java:87)
        at io.deephaven.parquet.table.pagestore.ColumnChunkPageStore.toPage(ColumnChunkPageStore.java:141)
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPage(VariablePageSizeColumnChunkPageStore.java:127)
        ... 23 more

The above reproducer depends on (but should not be caused by) #5087. The query works with read_ahead_count=1 or 0.

@devinrsmith devinrsmith added bug Something isn't working triage parquet Related to the Parquet integration s3 labels Feb 1, 2024
@devinrsmith devinrsmith added this to the 3. Triage milestone Feb 1, 2024
@devinrsmith
Copy link
Member Author

This has materialized in different forms of errors as well:

2024-02-05T18:58:31.749Z | heduler-Concurrent-1 | ERROR | i.d.s.s.SessionService    | Internal Error '3e438f10-130c-44ab-bf2c-1e7d0a26a259' java.io.UncheckedIOException: java.io.IOException: can not read class org
.apache.parquet.format.PageHeader: don't know what type: 14                                                                                                                                                          
        at io.deephaven.parquet.base.ColumnChunkReaderImpl.getDictionary(ColumnChunkReaderImpl.java:186)                                                                                                             
        at io.deephaven.util.datastructures.LazyCachingFunction.apply(LazyCachingFunction.java:48)                                                                                                                   
        at io.deephaven.parquet.base.ColumnPageReaderImpl.getDataReader(ColumnPageReaderImpl.java:570)                                                                                                               
        at io.deephaven.parquet.base.ColumnPageReaderImpl.readPageV1(ColumnPageReaderImpl.java:387)                                                                                                                  
        at io.deephaven.parquet.base.ColumnPageReaderImpl.readDataPage(ColumnPageReaderImpl.java:270)                                                                                                                
        at io.deephaven.parquet.base.ColumnPageReaderImpl.materialize(ColumnPageReaderImpl.java:114)                                                                                                                 
        at io.deephaven.parquet.table.pagestore.topage.ToPage.getResult(ToPage.java:59)                                                                                                                              
        at io.deephaven.parquet.table.pagestore.topage.ToPage.toPage(ToPage.java:87)                                                                                                                                 
        at io.deephaven.parquet.table.pagestore.ColumnChunkPageStore.toPage(ColumnChunkPageStore.java:142)                                                                                                           
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPage(VariablePageSizeColumnChunkPageStore.java:128)                                                                          
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:165)                                                                
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:23)                                                                 
        at io.deephaven.engine.page.PageStore.fillChunk(PageStore.java:81)                                                                                                                                           
        at io.deephaven.parquet.table.region.ParquetColumnRegionBase.fillChunk(ParquetColumnRegionBase.java:57)                                                                                                      
        at io.deephaven.engine.table.impl.sources.regioned.DeferredColumnRegionBase.fillChunk(DeferredColumnRegionBase.java:82)                                                                                      
        at io.deephaven.engine.page.PageStore.fillChunk(PageStore.java:85)                                                                                                                                           
        at io.deephaven.engine.table.impl.sources.regioned.RegionedColumnSourceBase.fillChunk(RegionedColumnSourceBase.java:56)                                                                                      
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.getSnapshotDataAsChunkList(ConstructSnapshot.java:1641)                                                                                           
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.serializeAllTable(ConstructSnapshot.java:1531)                                                                                                    
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.lambda$constructBackplaneSnapshotInPositionSpace$2(ConstructSnapshot.java:698)                                                                    
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:1210)                                                                                             
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:1152)                                                                                             
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.constructBackplaneSnapshotInPositionSpace(ConstructSnapshot.java:703)                                                                             
        at io.deephaven.server.barrage.BarrageMessageProducer.getSnapshot(BarrageMessageProducer.java:2244)                                                                                                          
        at io.deephaven.server.barrage.BarrageMessageProducer.updateSubscriptionsSnapshotAndPropagate(BarrageMessageProducer.java:1336)                                                                              
        at io.deephaven.server.barrage.BarrageMessageProducer$UpdatePropagationJob.run(BarrageMessageProducer.java:1012)                                                                                             
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)                                                                                                                         
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)                                                                                                                                        
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)                                                                                  
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)                                                                                                                 
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)                                                                                                                 
        at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)                                                                                            
        at java.base/java.lang.Thread.run(Thread.java:1583)                                                                                                                                                          
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 14                                                                                                    
        at org.apache.parquet.format.Util.read(Util.java:366)                                                                                                                                                        
        at org.apache.parquet.format.Util.readPageHeader(Util.java:133)                                                                                                                                              
        at org.apache.parquet.format.Util.readPageHeader(Util.java:128)                                                                                                                                              
        at io.deephaven.parquet.base.ColumnChunkReaderImpl.readDictionary(ColumnChunkReaderImpl.java:208)                                                                                                            
        at io.deephaven.parquet.base.ColumnChunkReaderImpl.getDictionary(ColumnChunkReaderImpl.java:184)                                                                                                             
        ... 32 more                                                                                                                                                                                                  
Caused by: shaded.parquet.org.apache.thrift.protocol.TProtocolException: don't know what type: 14                                                                                                                    
        at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.getTType(TCompactProtocol.java:899)                                                                                                            
        at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:558)                                                                                                      
        at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:155)                                                                                                                    
        at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1032)                                                                                                                  
        at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1025)                                                                                                                  
        at org.apache.parquet.format.PageHeader.read(PageHeader.java:902)                                                                                                                                            
        at org.apache.parquet.format.Util.read(Util.java:363)                                                                                                                                                        
        ... 36 more                                                      
2024-02-05T18:55:51.530Z | heduler-Concurrent-2 | ERROR | i.d.s.s.SessionService    | Internal Error '33847ca6-bab0-497a-968b-cbd21051f4e7' java.io.UncheckedIOException: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@6952bdb8
        at io.deephaven.parquet.base.ColumnChunkReaderImpl.getDictionary(ColumnChunkReaderImpl.java:186)
        at io.deephaven.util.datastructures.LazyCachingFunction.apply(LazyCachingFunction.java:48)
        at io.deephaven.parquet.base.ColumnPageReaderImpl.getDictionary(ColumnPageReaderImpl.java:606)
        at io.deephaven.parquet.table.pagestore.topage.ToPageWithDictionary.getResult(ToPageWithDictionary.java:57)
        at io.deephaven.parquet.table.pagestore.topage.ToPage.toPage(ToPage.java:87)
        at io.deephaven.parquet.table.pagestore.ColumnChunkPageStore.toPage(ColumnChunkPageStore.java:142)
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPage(VariablePageSizeColumnChunkPageStore.java:128)
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:165)
        at io.deephaven.parquet.table.pagestore.VariablePageSizeColumnChunkPageStore.getPageContaining(VariablePageSizeColumnChunkPageStore.java:23)
        at io.deephaven.engine.page.PageStore.fillChunk(PageStore.java:81)
        at io.deephaven.parquet.table.region.ParquetColumnRegionBase.fillChunk(ParquetColumnRegionBase.java:57)
        at io.deephaven.engine.table.impl.sources.regioned.DeferredColumnRegionBase.fillChunk(DeferredColumnRegionBase.java:82)
        at io.deephaven.engine.page.PageStore.fillChunk(PageStore.java:85)
        at io.deephaven.engine.table.impl.sources.regioned.RegionedColumnSourceBase.fillChunk(RegionedColumnSourceBase.java:56)
        at io.deephaven.engine.table.impl.sources.regioned.RegionedColumnSourceObject$AsValues.fillChunk(RegionedColumnSourceObject.java:32)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.getSnapshotDataAsChunkList(ConstructSnapshot.java:1641)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.serializeAllTable(ConstructSnapshot.java:1531)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.lambda$constructBackplaneSnapshotInPositionSpace$2(ConstructSnapshot.java:698)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:1210)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.callDataSnapshotFunction(ConstructSnapshot.java:1152)
        at io.deephaven.engine.table.impl.remote.ConstructSnapshot.constructBackplaneSnapshotInPositionSpace(ConstructSnapshot.java:703)
        at io.deephaven.server.barrage.BarrageMessageProducer.getSnapshot(BarrageMessageProducer.java:2244)
        at io.deephaven.server.barrage.BarrageMessageProducer.updateSubscriptionsSnapshotAndPropagate(BarrageMessageProducer.java:1336)
        at io.deephaven.server.barrage.BarrageMessageProducer$UpdatePropagationJob.run(BarrageMessageProducer.java:1012)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:97)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@6952bdb8
        at org.apache.parquet.format.Util.read(Util.java:366)
        at org.apache.parquet.format.Util.readPageHeader(Util.java:133)
        at org.apache.parquet.format.Util.readPageHeader(Util.java:128)
        at io.deephaven.parquet.base.ColumnChunkReaderImpl.readDictionary(ColumnChunkReaderImpl.java:208)
        at io.deephaven.parquet.base.ColumnChunkReaderImpl.getDictionary(ColumnChunkReaderImpl.java:184)
        ... 30 more
Caused by: shaded.parquet.org.apache.thrift.protocol.TProtocolException: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@6952bdb8
        at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1114)
        at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1025)
        at org.apache.parquet.format.PageHeader.read(PageHeader.java:902)
        at org.apache.parquet.format.Util.read(Util.java:363)
        ... 34 more

@devinrsmith devinrsmith changed the title s3 read_ahead_count and FAILED_TO_UNCOMPRESS s3 read_ahead_count and reading errors Feb 5, 2024
@devinrsmith devinrsmith self-assigned this Feb 6, 2024
devinrsmith added a commit to devinrsmith/deephaven-core that referenced this issue Feb 10, 2024
devinrsmith added a commit that referenced this issue Feb 14, 2024
Adds in port of PooledObjectReference from DHE

Fixes #5096

---------

Co-authored-by: Ryan Caudy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parquet Related to the Parquet integration s3 triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant