-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #1181: use apache/parquet-testing files #1183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.optional_parquet
needs to be updated to @pytest.mark.run_optional_parquet
, or config.addinivalue_line("markers", "run_optional_parquet: mark test as slow to run")
should be updated to config.addinivalue_line("markers", "optional_parquet: mark test as slow to run")
. Otherwise, you will get the following warning that the custom marker is not registered:
tests/parquet_test.py:124
/Users/ethandebandi/Documents/git/arkouda/tests/parquet_test.py:124: PytestUnknownMarkWarning: Unknown pytest.mark.optional_parquet - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/mark.html
@pytest.mark.optional_parquet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good once the marker and decorator are updated to match like Ethan suggests
Co-authored-by: Pierce Hayes <[email protected]>
* Add UsedModules.cfg to .gitignore (#1179) The UsedModules.cfg file, which is generated from --saveUsedModules was not being ignored by git, and I don't think we would ever want a UsedModules.cfg pushed into the repo, so moving it into the gitignore will resolve any annoyance from having to selectively not include that file. * Add line to pass Arrow float to Chapel side This PR adds a check for the Arrow float type on the Chapel side, which was left out previously, allowing float data types to be read with `ak.read_parquet()`. With this addition, the file `alltypes_plain.parquet` from the Apache website now runs through `ak.read_parquet()` without fail. * Closes #1165 drop on axis (#1177) * Modified drop() to allow for row and column dropping. Added functionality for dropping rows in new function called by drop. Moved column drop to new function called by drop. Updated the drop row algorithm to set the new column values in place using the superclass's __setitem__. Updated docstrings to match new functionality. Set default axis to 0 to conform to pandas. * Added ak.DataFrame.drop() documentation. * Resolving Merge Conflicts * Added test to validate axis can only be 1 or 0. * Updated code based on PR comments. Removed unneeded if blocks and a few other style changes. * Updating unknown axis error. * Correcting typo * Update Arkouda to work with CTypes (#1185) In Chapel 1.26, various C types from 'SysCTypes', 'SysBasic' and 'CPtr' are being brought together into a single (new) module, 'CTypes'. In this PR, I'm adding a new user-level 'CTypes' compatibility module that brings the pieces that Arkouda relies upon together so that it can use the new names and organization yet still be compiled with older compilers. --- Signed-off-by: Brad Chamberlain <[email protected]> * Catch all Parquet errors and report them to the client (#1188) * Catch all Parquet errors and report them to the client In the initial Parquet error handling code, the only errors that were being reported were errors that came from status codes of calls to the Parquet API, but all errors in the C++ code went unhandled and would result in a server crash. This PR wraps all the C++ functions in a try/catch to catch all errors to be reported back to the client rather than crashing the server. This approach seems to align with the philosophy of "The server should never crash, errors should just be reported to the client". Additionally, a bug was identified with throwing errors in `forall` loops within a function, which would cause a try/catch wrapping the throwing function to not catch the resulting `TaskErrors` error throw, but a workaround for that is to wrap `forall` loops in a try/catch which just throws the error, which then allows the overarching try/catch to catch and report the error, where previously it would crash the server. * Add unsupported type error message to getType function * Remove the ARROWUNDEFINED variable and add some throws * add parquet test files with license and readme * added and skipped test against parquet standard files * add option for running parquet file tests * Update marker name to match decorator * updated error message string in ParquetMsg with wrong format specifier and updated columns1 to pass assertion against ans Co-authored-by: Ben McDonald <[email protected]> Co-authored-by: Ben McDonald <[email protected]> Co-authored-by: reuster986 <[email protected]> Co-authored-by: Ethan-DeBandi99 <[email protected]> Co-authored-by: Brad Chamberlain <[email protected]> Co-authored-by: Pierce Hayes <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok to me now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now
This PR incorporates standard test files from github.com/apache/parquet-testing and adds a unit tests that checks
ak.get_datasets
andak.read_parquet
against them. Attribution and license are included.Currently, the new unit test is skipped by default because it fails, but it can be run manually with
python3 -m pytest tests/parquet_test.py --optional-parquet
.