Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] Random segmentation faults when calling Read() on a pqarrow.RecordReader #29

Open
reiades opened this issue Aug 1, 2024 · 7 comments
Labels
Type: bug Something isn't working

Comments

@reiades
Copy link

reiades commented Aug 1, 2024

Hello!

I am currently using github.com/apache/arrow/go/v16/parquet to read the records of a downloaded s3 parquet file (75KB, stored in bytes.Buffer). My implementation is the following:

mem := memory.NewCheckedAllocator(memory.DefaultAllocator)
pf, err := file.NewParquetReader(bytes.NewReader(buf.Bytes()), file.WithReadProps(parquet.NewReaderProperties(mem)))
if err != nil {
     return nil, err
}
defer pf.Close()
reader, err := pqarrow.NewFileReader(pf, pqarrow.ArrowReadProperties{Parallel: true, BatchSize: pf.NumRows()}, mem)
if err != nil {
     return nil, err
}
rr, err := reader.GetRecordReader(ctx, nil, nil)
if err != nil {
     return nil, err
}
defer rr.Release()
rec, err = rr.Read() <---- problem line
if err != nil && err != io.EOF {
     return nil, err
}
if rec == nil {
     return nil, nil
}
defer rec.Release()

... parse the file 

I am reading the same file each time and majority of the reads into rec are successful. However, on occasion, I get a segmentation fault inside of rr.Read(). I have confirmed that the file is successfully downloaded each time and that buf.Bytes() is the same on successful and failed reads. I have also confirmed that I can get the schema from the file on successful and failed reads which leads me more to believe something is happening inside the RecordReader.

schema := pf.MetaData().Schema
log.Info(fmt.Sprintf("Schema:%s", schema)) <--- prints out the right schema each time

Here are some logs from the stack trace that I thought could be helpful for debugging.

SIGSEGV: segmentation violation
PC=0x4cb0c8 m=11 sigcode=1 addr=0x7ffbfdf94013e8

goroutine 150888 gp=0x4006db0a80 m=11 mp=0x4000780808 [runnable]:
github.com/apache/arrow/go/v16/parquet/internal/bmi.extractBitsGo(0xffffffffffffffff?, 0xffffffffffffffff?)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/internal/bmi/bmi.go:242 +0xcc fp=0x41bc72bae0 sp=0x41bc72bae0 pc=0x12818ac
github.com/apache/arrow/go/v16/parquet/internal/bmi.ExtractBits(...)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/internal/bmi/bmi.go:38
github.com/apache/arrow/go/v16/parquet/file.defLevelsBatchToBitmap({0x45f221c000?, 0x1?, 0x1?}, 0x400, {0xbc72bbb8?, 0x41?, 0x0?, 0x874c?}, {0x3b0f7d0, 0x41bcba5cc0}, ...)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:155 +0x180 fp=0x41bc72bb70 sp=0x41bc72bae0 pc=0x12f2ad0
github.com/apache/arrow/go/v16/parquet/file.defLevelsToBitmapInternal({0x45f221c000, 0x400, 0x2c000}, {0x1?, 0x0?, 0x0?, 0x1?}, 0x41bc72bcc0, 0x1)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:175 +0x198 fp=0x41bc72bc40 sp=0x41bc72bb70 pc=0x12f2d68
github.com/apache/arrow/go/v16/parquet/file.DefLevelsToBitmap(...)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:186
github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecordData(0x41bb5c8000, 0x11)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/record_reader.go:545 +0x218 fp=0x41bc72bd40 sp=0x41bc72bc40 pc=0x12f8aa8
github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecords(0x41bb5c8000, 0xce)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/record_reader.go:632 +0x294 fp=0x41bc72bde0 sp=0x41bc72bd40 pc=0x12f8e84
github.com/apache/arrow/go/v16/parquet/pqarrow.(*leafReader).LoadBatch(0x41bb5c8060, 0xce)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/column_readers.go:104 +0xd8 fp=0x41bc72be30 sp=0x41bc72bde0 pc=0x1767e48
github.com/apache/arrow/go/v16/parquet/pqarrow.(*listReader).LoadBatch(0x41bc72bee8?, 0x41bc72bf3c?)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/column_readers.go:360 +0x2c fp=0x41bc72be50 sp=0x41bc72be30 pc=0x17690fc
github.com/apache/arrow/go/v16/parquet/pqarrow.(*ColumnReader).NextBatch(0x41b9013190, 0xce)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:131 +0x34 fp=0x41bc72be70 sp=0x41bc72be50 pc=0x176e9d4
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func1(0x5, 0x41bc72bf38?)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:655 +0x50 fp=0x41bc72bef0 sp=0x41bc72be70 pc=0x17729a0
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func2()
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:708 +0x100 fp=0x41bc72bfd0 sp=0x41bc72bef0 pc=0x1772850
runtime.goexit({})
	/root/.gimme/versions/go1.22.5.linux.arm64/src/runtime/asm_arm64.s:1222 +0x4 fp=0x41bc72bfd0 sp=0x41bc72bfd0 pc=0x4df0a4
created by github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next in goroutine 253
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:699 +0x2e8
...

It seems that the segmentation fault is happening inside of (*recordReader).next so was curious if anyone familiar with this library had some insight on why this was happening. I can share a longer stack trace if that would be helpful. I am also using v16 but saw the same error in v13 as well. Thanks in advance!

Component(s)

Go

@reiades reiades added the Type: bug Something isn't working label Aug 1, 2024
@joellubi
Copy link
Member

joellubi commented Aug 1, 2024

Hi @reiades, thanks for opening this issue and sharing so much detail. Can you share more about how often this issue occurs? If there's a certain number of iterations or level of concurrency at which it can be reliably reproduced, it will be much easier to isolate the problem.

@reiades
Copy link
Author

reiades commented Aug 1, 2024

Hi @joellubi, thanks for responding! It is pretty tough to reproduce as it happens very randomly. The above code gets run every minute in a single go routine (side note: I know that seems excessive and unneeded to do if the file is the exact same each time - it definitely is, but there are other requirements at play). I see this segfault any where from 0 to 4 times a day. I will try to think of other ways I can reproduce the problem because I know that's probably not very helpful :(

I am trying to iterate through rr (rr, err := reader.GetRecordReader(ctx, nil, nil)) by calling rr.Next() rather than just calling rr.Read() to see if I get segfaults there.

@joellubi
Copy link
Member

joellubi commented Aug 1, 2024

Ok got it, I'll take a look and see if I have any luck reproducing. One more question, what OS and architecture are you seeing this on?

@reiades
Copy link
Author

reiades commented Aug 2, 2024

linux and arm64 - thank you so much!

@joellubi
Copy link
Member

joellubi commented Aug 6, 2024

Hi @reiades. I haven't had any luck reproducing this yet. Do you have a sample parquet file you can share that you know has had this issue? If that's not possible, could you share the schema, numRows, any encodings used, etc?

@reiades
Copy link
Author

reiades commented Aug 13, 2024

Hello - sorry I have been a bit busy so had to put this aside. I don't think I will be able to share a sample parquet file but can tell you more about the schema, num rows, and encodings used. I will get back to you; thanks again for your help!

@assignUser assignUser transferred this issue from apache/arrow Aug 30, 2024
@zeroshade
Copy link
Member

@reiades I've been skimming through the open Issues here and saw this one. Just wanted to poke you to see if you can get back with any information that might help us reproduce this issue so we can address and fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants