-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to PR#1183: use apache/parquet-testing files #1194
Changes to PR#1183: use apache/parquet-testing files #1194
Conversation
The UsedModules.cfg file, which is generated from --saveUsedModules was not being ignored by git, and I don't think we would ever want a UsedModules.cfg pushed into the repo, so moving it into the gitignore will resolve any annoyance from having to selectively not include that file.
This PR adds a check for the Arrow float type on the Chapel side, which was left out previously, allowing float data types to be read with `ak.read_parquet()`. With this addition, the file `alltypes_plain.parquet` from the Apache website now runs through `ak.read_parquet()` without fail.
Add line to pass Arrow float to Chapel side
* Modified drop() to allow for row and column dropping. Added functionality for dropping rows in new function called by drop. Moved column drop to new function called by drop. Updated the drop row algorithm to set the new column values in place using the superclass's __setitem__. Updated docstrings to match new functionality. Set default axis to 0 to conform to pandas. * Added ak.DataFrame.drop() documentation. * Resolving Merge Conflicts * Added test to validate axis can only be 1 or 0. * Updated code based on PR comments. Removed unneeded if blocks and a few other style changes. * Updating unknown axis error. * Correcting typo
In Chapel 1.26, various C types from 'SysCTypes', 'SysBasic' and 'CPtr' are being brought together into a single (new) module, 'CTypes'. In this PR, I'm adding a new user-level 'CTypes' compatibility module that brings the pieces that Arkouda relies upon together so that it can use the new names and organization yet still be compiled with older compilers. --- Signed-off-by: Brad Chamberlain <[email protected]>
* Catch all Parquet errors and report them to the client In the initial Parquet error handling code, the only errors that were being reported were errors that came from status codes of calls to the Parquet API, but all errors in the C++ code went unhandled and would result in a server crash. This PR wraps all the C++ functions in a try/catch to catch all errors to be reported back to the client rather than crashing the server. This approach seems to align with the philosophy of "The server should never crash, errors should just be reported to the client". Additionally, a bug was identified with throwing errors in `forall` loops within a function, which would cause a try/catch wrapping the throwing function to not catch the resulting `TaskErrors` error throw, but a workaround for that is to wrap `forall` loops in a try/catch which just throws the error, which then allows the overarching try/catch to catch and report the error, where previously it would crash the server. * Add unsupported type error message to getType function * Remove the ARROWUNDEFINED variable and add some throws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this locally and the change that is being made to switch to --optional-parquet
works as expected, but there are still some failures in the file.
I would assume that is not part of this PR, so I think this is good to go, but should I be working to resolve those failures at the moment?
…r and updated columns1 to pass assertion against ans
4af6f8c
to
629636d
Compare
I rebased on top of master to get Ben's latest arrow error handling which is showing up here as a lot of noise (both in commits and files changed) but all of this is from master and not new functionality (this noise shouldn't show up in 1183 because it's the same git history as master) The only changes I've made are:
I'm not sure the column names change is what we want but with these changes it looks like we can get through the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, look ok to me, i guess these are a bunch of changes from master into this branch.
* add parquet test files with license and readme * added and skipped test against parquet standard files * add option for running parquet file tests * Update marker name to match decorator (#1193) Co-authored-by: Pierce Hayes <[email protected]> * Changes to PR#1183: use apache/parquet-testing files (#1194) * Add UsedModules.cfg to .gitignore (#1179) The UsedModules.cfg file, which is generated from --saveUsedModules was not being ignored by git, and I don't think we would ever want a UsedModules.cfg pushed into the repo, so moving it into the gitignore will resolve any annoyance from having to selectively not include that file. * Add line to pass Arrow float to Chapel side This PR adds a check for the Arrow float type on the Chapel side, which was left out previously, allowing float data types to be read with `ak.read_parquet()`. With this addition, the file `alltypes_plain.parquet` from the Apache website now runs through `ak.read_parquet()` without fail. * Closes #1165 drop on axis (#1177) * Modified drop() to allow for row and column dropping. Added functionality for dropping rows in new function called by drop. Moved column drop to new function called by drop. Updated the drop row algorithm to set the new column values in place using the superclass's __setitem__. Updated docstrings to match new functionality. Set default axis to 0 to conform to pandas. * Added ak.DataFrame.drop() documentation. * Resolving Merge Conflicts * Added test to validate axis can only be 1 or 0. * Updated code based on PR comments. Removed unneeded if blocks and a few other style changes. * Updating unknown axis error. * Correcting typo * Update Arkouda to work with CTypes (#1185) In Chapel 1.26, various C types from 'SysCTypes', 'SysBasic' and 'CPtr' are being brought together into a single (new) module, 'CTypes'. In this PR, I'm adding a new user-level 'CTypes' compatibility module that brings the pieces that Arkouda relies upon together so that it can use the new names and organization yet still be compiled with older compilers. --- Signed-off-by: Brad Chamberlain <[email protected]> * Catch all Parquet errors and report them to the client (#1188) * Catch all Parquet errors and report them to the client In the initial Parquet error handling code, the only errors that were being reported were errors that came from status codes of calls to the Parquet API, but all errors in the C++ code went unhandled and would result in a server crash. This PR wraps all the C++ functions in a try/catch to catch all errors to be reported back to the client rather than crashing the server. This approach seems to align with the philosophy of "The server should never crash, errors should just be reported to the client". Additionally, a bug was identified with throwing errors in `forall` loops within a function, which would cause a try/catch wrapping the throwing function to not catch the resulting `TaskErrors` error throw, but a workaround for that is to wrap `forall` loops in a try/catch which just throws the error, which then allows the overarching try/catch to catch and report the error, where previously it would crash the server. * Add unsupported type error message to getType function * Remove the ARROWUNDEFINED variable and add some throws * add parquet test files with license and readme * added and skipped test against parquet standard files * add option for running parquet file tests * Update marker name to match decorator * updated error message string in ParquetMsg with wrong format specifier and updated columns1 to pass assertion against ans Co-authored-by: Ben McDonald <[email protected]> Co-authored-by: Ben McDonald <[email protected]> Co-authored-by: reuster986 <[email protected]> Co-authored-by: Ethan-DeBandi99 <[email protected]> Co-authored-by: Brad Chamberlain <[email protected]> Co-authored-by: Pierce Hayes <[email protected]> Co-authored-by: pierce314159 <[email protected]> Co-authored-by: Pierce Hayes <[email protected]> Co-authored-by: Ben McDonald <[email protected]> Co-authored-by: Ben McDonald <[email protected]> Co-authored-by: Ethan-DeBandi99 <[email protected]> Co-authored-by: Brad Chamberlain <[email protected]>
I rebased on top of master to get Ben's latest arrow error handling which is showing up here as a lot of noise (both in commits and files changed) but all of this is from master and not new functionality (this noise shouldn't show up in 1183 because it's the same git history as master)
The only changes I've made are:
"--run-optional-parquet"
flag to"--optional-parquet"
columns1
to not failself.assertListEqual(columns, ans)
This PR updates the
parquet-testing
branch, so merging it will update PR #1183