Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic reading AWS-generated parquet file #2963

Closed
kmitchener opened this issue Jul 25, 2022 · 4 comments
Closed

panic reading AWS-generated parquet file #2963

kmitchener opened this issue Jul 25, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@kmitchener
Copy link
Contributor

Describe the bug
A clear and concise description of what the bug is.

Given some AWS Costs parquet files (parquet files generated by AWS cost and usage reporting), a query executed against those files results in a panic:

From datafusion-cli, using latest master as of the time of this bug report:

CREATE EXTERNAL TABLE costs 
    STORED AS PARQUET 
    PARTITIONED BY (year, month) 
    LOCATION 'C:/tmp/aws-costs/cur/cost-and-usage/cost-and-usage';

select line_item_resource_id, sum(line_item_unblended_cost) from costs where resource_tags_user_application = 'FERDA' group by 1;
thread 'thread 'thread 'thread 'tokio-runtime-workerthread 'thread 'thread 'tokio-runtime-workertokio-runtime-worker' panicked at 'tokio-runtime-workertokio-runtime-workertokio-runtime-workerthread '' panicked at 'thread 'range end index 105 out of range for slice of length 104tokio-runtime-worker' panicked at 'thread '' panicked at '' panicked at '' panicked at 'tokio-runtime-workerrange end index 113 out of range for slice of length 96tokio-runtime-worker', ' panicked at 'range end index 89 out of range for slice of length 80' panicked at 'range end index 97 out of range for slice of length 96range end index 113 out of range for slice of length 80range end index 111 out of range for slice of length 72range end index 89 out of range for slice of length 80', ', library\core\src\slice\index.rsrange end index 111 out of range for slice of length 72', tokio-runtime-worker', ' panicked at '', ', library\core\src\slice\index.rslibrary\core\src\slice\index.rslibrary\core\src\slice\index.rs', :' panicked at 'range end index 89 out of range for slice of length 80library\core\src\slice\index.rslibrary\core\src\slice\index.rslibrary\core\src\slice\index.rs:::library\core\src\slice\index.rs73range end index 113 out of range for slice of length 112', :::737373::', library\core\src\slice\index.rs737373:::735library\core\src\slice\index.rs::::555:
:73555


573:



:55

ArrowError(ExternalError(Execution("Join Error: task 764 panicked")))

It seems to be this line_item_resource_id field in particular, as I was able to group by other fields without issue.

To Reproduce
Steps to reproduce the behavior:

I don't think I should upload our AWS costs files for general consumption, so I'm open to suggestions how to recreate a test case.

Expected behavior
A clear and concise description of what you expected to happen.

not panic :)

Additional context
Add any other context about the problem here.

@kmitchener kmitchener added the bug Something isn't working label Jul 25, 2022
@kmitchener
Copy link
Contributor Author

kmitchener commented Jul 25, 2022

Can be triggered just be selecting that column, and also if only running a select against one of the parquet files rather than all 3.

select count(distinct(line_item_resource_id)) from costs;

thread 'tokio-runtime-worker' panicked at 'range end index 76 out of range for slice of length 72', library\core\src\slice\index.rs:73:5
ArrowError(ExternalError(Execution("Join Error: task 868 panicked")))

@kmitchener kmitchener changed the title panic executing specific simple query against partitioned parquet files in a listing directory panic reading AWS-generated parquet file Jul 25, 2022
@comphead
Copy link
Contributor

perhaps related to #2910

@kmitchener
Copy link
Contributor Author

perhaps related to #2910

Yes! I checked out your branch and built, these queries work with your patch. Can you update your PR to close this issue as well?

@comphead
Copy link
Contributor

@alamb hopefully this can also be closed as well as part of #2968

@alamb alamb closed this as completed Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants