Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify row index implementation #7

Open
scritchley opened this issue Mar 16, 2017 · 2 comments
Open

Verify row index implementation #7

scritchley opened this issue Mar 16, 2017 · 2 comments
Milestone

Comments

@scritchley
Copy link
Owner

scritchley commented Mar 16, 2017

Not sure whether the existing row index implementation is correct. The documentation is slightly hard to interpret. Particularly these sections from https://orc.apache.org/docs/spec-index.html:

To record positions, each stream needs a sequence of numbers. For uncompressed streams, the position is the byte offset of the RLE run’s start location followed by the number of values that need to be consumed from the run. In compressed streams, the first number is the start of the compression chunk in the stream, followed by the number of decompressed bytes that need to be consumed, and finally the number of values consumed in the RLE.

For columns with multiple streams, the sequences of positions in each stream are concatenated. That was an unfortunate decision on my part that we should fix at some point, because it makes code that uses the indexes error-prone.

@scritchley scritchley modified the milestone: v1 release Sep 5, 2017
@mattatcha
Copy link

I think I am running into a bug due to row indexes.

If I write over 10,000 rows to a single file then athena returns the following error

HIVE_CURSOR_ERROR: index (0) must be less than size (0)

orc-tools doesn't have a problem reading meta or scanning the file though

@athum
Copy link
Contributor

athum commented Jul 21, 2020

@scritchley is there any update on the correctness of using the row index implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants