Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add basic predicate-pushdown optimization (#433) * basic predicate-pushdown support * remove explict Dispatch class * use _Frame.fillna * cleanup comments * test coverage * improve test coverage * add xfail test for dt accessor in predicate and fix test_show.py * fix some naming issues * add config and use assert_eq * add logging events when predicate-pushdown bails * move bail logic earlier in function * address easier code review comments * typo fix * fix creation_info access bug * convert any expression to DNF * csv test coverage * include IN coverage * improve test rigor * address code review * skip parquet tests when deps are not installed * fix bug * add pyarrow dep to cluster workers * roll back test skipping changes Co-authored-by: Charles Blackmon-Luca <[email protected]> * Add workflow to keep datafusion dev branch up to date (#440) * Condition for BinaryExpr, filter, input_ref, rexcall, and rexliteral * Updates for test_filter * more of test_filter.py working with the exception of some date pytests * Updates to dates and parsing dates like postgresql does * Update gpuCI `RAPIDS_VER` to `22.06` (#434) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Bump black to 22.3.0 (#443) * Check for ucx-py nightlies when updating gpuCI (#441) * Simplify gpuCI updating workflow * Add check for cuML nightly version * Refactored to adjust for better type management * Refactor schema and statements * update types * fix syntax issues and renamed function name calls * Add handling for newer `prompt_toolkit` versions in cmd tests (#447) * Add handling for newer prompt-toolkit version * Place compatibility code in _compat * Fix version for gha-find-replace (#446) * Improved error handling and code clean up * move pieces of logical.rs to seperated files to ensure code readability * left join working * Update versions of Java dependencies (#445) * Update versions for java dependencies with cves * Rerun tests * Update jackson databind version (#449) * Update versions for java dependencies with cves * Rerun tests * update jackson-databind dependency * Disable SQL server functionality (#448) * Disable SQL server functionality * Update docs/source/server.rst Co-authored-by: Ayush Dattagupta <[email protected]> * Disable server at lowest possible level * Skip all server tests * Add tests to ensure server is disabled * Fix CVE fix test Co-authored-by: Ayush Dattagupta <[email protected]> * Update dask pinnings for release (#450) * Add Java source code to source distribution (#451) * Bump `httpclient` dependency (#453) * Revert "Disable SQL server functionality (#448)" This reverts commit 37a3a61. * Bump httpclient version * Unpin Dask/distributed versions (#452) * Unpin dask/distributed post release * Remove dask/distributed version ceiling * Add jsonschema to ci testing (#454) * Add jsonschema to ci env * Fix typo in config schema * Switch tests from `pd.testing.assert_frame_equal` to `dd.assert_eq` (#365) * Start moving tests to dd.assert_eq * Use assert_eq in datetime filter test * Resolve most resulting test failures * Resolve remaining test failures * Convert over tests * Convert more tests * Consolidate select limit cpu/gpu test * Remove remaining assert_series_equal * Remove explicit cudf imports from many tests * Resolve rex test failures * Remove some additional compute calls * Consolidate sorting tests with getfixturevalue * Fix failed join test * Remove breakpoint * Use custom assert_eq function for tests * Resolve test failures / seg faults * Remove unnecessary testing utils * Resolve local test failures * Generalize RAND test * Avoid closing client if using independent cluster * Fix failures on Windows * Resolve black failures * Make random test variables more clear * First basic working checkpoint for group by * Set max pin on antlr4-python-runtime (#456) * Set max pin on antlr4-python-runtime due to incompatibilities with fugue_sql * update comment on antlr max pin version * Updates to style * stage pre-commit changes for upstream merge * Fix black failures * Updates to Rust formatting * Fix rust lint and clippy * Remove jar building step which is no longer needed * Remove Java from github workflows matrix * Removes jar and Java references from test.yml * Update Release workflow to remove references to Java * Update rust.yml to remove references from linux-build-lib * Add pre-commit.sh file to provide pre-commit support for Rust in a convenient script * Removed overlooked jdk references * cargo clippy auto fixes * Address all Rust clippy warnings * Include setuptools-rust in conda build recipie * Include setuptools-rust in conda build recipie, in host and run * Adjustments for conda build, committing for others to help with error and see it occurring in CI * Include sql.yaml in package files * Include pyarrow in run section of conda build to ensure tests pass * include setuptools-rust in host and run of conda since removing caused errors * to_string() method had been removed in rust and not removed here, caused conda run_test.py to fail when this line was hit * Replace commented out tests with pytest.skip and bump version of pyarrow to 7.0.0 * Fix setup.py syntax issue introduced on last commit by find/replace * Rename Datafusion -> DataFusion and Apache DataFusion -> Arrow DataFusion * Fix docs build environment * Include Rust compiler in docs environment * Bump Rust compiler version to 1.59 * Ok, well readthedocs didn't like that * Store libdask_planner.so and retrieve it between github workflows * Cache the Rust library binary * Remove Cargo.lock from git * Remove unused datafusion-expr crate * Build datafusion at each test step instead of caching binaries * Remove maven and jar cache steps from test-upstream.yaml * Removed dangling 'build' workflow step reference * Lowered PyArrow version to 6.0.1 since cudf has a hard requirement on that version for the version of cudf we are using * Add Rust build step to test in dask cluster * Install setuptools-rust for pip to use for bare requirements import * Include pyarrow 6.0.1 via conda as a bare minimum dependency * Remove cudf dependency for python 3.9 which is causing build issues on windows * Address documentation from review * Install Rust as readthedocs post_create_environment step * Run rust install non-interactively * Run rust install non-interactively * Rust isn't available in PyPi so remove that dependency * Append ~/.cargo/bin to the PATH * Print out some environment information for debugging * Print out some environment information for debugging * More - Increase verbosity * More - Increase verbosity * More - Increase verbosity * Switch RTD over to use Conda instead of Pip since having issues with Rust and pip * Try to use mamba for building docs environment * Partial review suggestion address, checking CI still works * Skip mistakenly enabled tests * Use DataFusion master branch, and fix syntax issues related to the version bump * More updates after bumping DataFusion version to master * Use actions-rs in github workflows debug flag for setup.py * Remove setuptools-rust from conda * Use re-exported Rust types for BuiltinScalarFunction * Move python imports to TYPE_CHECKING section where applicable * Address review concerns and remove pre-commit.sh file * Pin to a specific github rev for DataFusion Co-authored-by: Richard (Rick) Zamora <[email protected]> Co-authored-by: Charles Blackmon-Luca <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ayush Dattagupta <[email protected]>
- Loading branch information