-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge main
into datafusion-sql-planner
#654
Merged
charlesbluca
merged 45 commits into
dask-contrib:datafusion-sql-planner
from
charlesbluca:merge-upstream-main
Aug 2, 2022
Merged
Merge main
into datafusion-sql-planner
#654
charlesbluca
merged 45 commits into
dask-contrib:datafusion-sql-planner
from
charlesbluca:merge-upstream-main
Aug 2, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* basic predicate-pushdown support * remove explict Dispatch class * use _Frame.fillna * cleanup comments * test coverage * improve test coverage * add xfail test for dt accessor in predicate and fix test_show.py * fix some naming issues * add config and use assert_eq * add logging events when predicate-pushdown bails * move bail logic earlier in function * address easier code review comments * typo fix * fix creation_info access bug * convert any expression to DNF * csv test coverage * include IN coverage * improve test rigor * address code review * skip parquet tests when deps are not installed * fix bug * add pyarrow dep to cluster workers * roll back test skipping changes Co-authored-by: Charles Blackmon-Luca <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Simplify gpuCI updating workflow * Add check for cuML nightly version
…ontrib#447) * Add handling for newer prompt-toolkit version * Place compatibility code in _compat
* Update versions for java dependencies with cves * Rerun tests
* Update versions for java dependencies with cves * Rerun tests * update jackson-databind dependency
* Disable SQL server functionality * Update docs/source/server.rst Co-authored-by: Ayush Dattagupta <[email protected]> * Disable server at lowest possible level * Skip all server tests * Add tests to ensure server is disabled * Fix CVE fix test Co-authored-by: Ayush Dattagupta <[email protected]>
* Revert "Disable SQL server functionality (dask-contrib#448)" This reverts commit 37a3a61. * Bump httpclient version
* Unpin dask/distributed post release * Remove dask/distributed version ceiling
* Add jsonschema to ci env * Fix typo in config schema
…ask-contrib#365) * Start moving tests to dd.assert_eq * Use assert_eq in datetime filter test * Resolve most resulting test failures * Resolve remaining test failures * Convert over tests * Convert more tests * Consolidate select limit cpu/gpu test * Remove remaining assert_series_equal * Remove explicit cudf imports from many tests * Resolve rex test failures * Remove some additional compute calls * Consolidate sorting tests with getfixturevalue * Fix failed join test * Remove breakpoint * Use custom assert_eq function for tests * Resolve test failures / seg faults * Remove unnecessary testing utils * Resolve local test failures * Generalize RAND test * Avoid closing client if using independent cluster * Fix failures on Windows * Resolve black failures * Make random test variables more clear
* Set max pin on antlr4-python-runtime due to incompatibilities with fugue_sql * update comment on antlr max pin version
* Move / minimize number of cudf / dask-cudf imports * Add tests for GPU-related errors * Fix unbound local error * Fix ddf value error
* Use map_partitions to compute limit / offset * Use partition_info to extract partition_index
* Switch to dask dev images * Use mamba for conda installs in images * Remove sleep call for installation * Use timeout / until to wait for cluster to be initialized
* Add documentation for FugueSQL integrations * Minor nitpick around autodoc obj -> class
* added timestampdiff * initial work for timestampdiff * Added test cases for timestampdiff * Update interval month dtype mapping * Add datetimesubOperator * Uncomment timestampdiff literal tests * Update logic for handling interval_months for pandas/cudf series and scalars * Add negative diff testcases, and gpu tests * Update reinterpret and timedelta to explicitly cast to int64 instead of int * Simplify cast_column_to_type mapping logic * Add scalar handling to castOperation and reuse it for reinterpret Co-authored-by: rajagurnath <[email protected]>
* Use dask nightly conda packages for upstream testing * Add independent cluster testing to nightly upstream CI [test-upstream] * Remove unnecessary dask install [test-upstream] * Remove strict channel policy to allow nightly dask installs * Use nightly Dask packages in independent cluster test [test-upstream] * Use channels argument to install Dask conda nightlies [test-upstream] * Fix channel expression * [test-upstream] * Need to add mamba update command to get dask conda nightlies * Use conda nightlies for dask-sql import test * Add import test to upstream nightly tests * [test-upstream] * Make sure we have nightly Dask for import tests [test-upstream]
* Explicitly install libstdcxx-ng in clusters * Make pyarrow dependency consistent across testing * Make libstdcxx-ng dep a min version * Add cloudpickle to cluster dependencies * cloudpickle must be in the scheduler environment * Bump cloudpickle version * Move cloudpickle install to workers * Fix pyarrow constraint in cluster spec
* Unpin dask/distributed post release * Remove dask/distributed version ceiling
…ask-contrib#573) * Make sure scheduler has Dask nightlies in upstream cluster testing * empty commit to [test-upstream]
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…rib#555) * Remove black/isort deps as we prefer pre-commit * Unpin all non python/jdk dependencies * Minor package corrections for py3.9 jdk11 env * Set min version constraints for all non-testing dependencies * Pin all non-test deps for 3.8 testing * Bump sklearn min version to 1.0.0 * Bump pyarrow min version to 1.0.1 * Fix pip notation for fugue * Use unpinned deps for cluster testing for now * Add fugue deps to environments, bump pandas to 1.0.2 * Add back antlr4 version ceiling * Explicitly mark all fugue dependencies * Alter test_analyze to avoid rtol * Bump pandas to 1.0.5 to fix upstream numpy issues * Alter datetime casting util to dodge panda casting failures * Bump pandas to 1.1.0 for groupby dropna support * Simplify string dtype check for get_supported_aggregations * Add check_dtype=False back to test_group_by_nan * Bump cluster to python 3.9 * Bump fastapi to 0.69.0, resolve remaining JDBC failures * Typo - correct pandas version * Generalize test_multi_case_when's dtype check * Bump pandas to 1.1.1 to resolve flaky test failures * Constrain mlflow for windows python 3.8 testing * Selectors don't work for conda env files * Problems seem to persist in 1.1.1, bump to 1.1.2 * Remove accidental debug changes * [test-upstream] * Use python 3.9 for upstream cluster testing [test-upstream] * Updated missed pandas pinning * Unconstrain mlflow to see if Windows failures persist * Add min version for protobuf * Bump pyarrow min version to allow for newer protobuf versions
* Remove antlr4-python3-runtime constraint from 3.9+ test envs * Revert "Remove antlr4-python3-runtime constraint from 3.9+ test envs" This reverts commit ef30656. * Add max version constraint for fugue in 3.9+ envs * Constrain Fugue in remaining env/setup files * Clarify fugue constraint comments * Add pinning back to python 3.8 jdk11 tests * More reversions to python 3.8 jdk11 testing env
* Add gpuCI environment file * Add documentation for GPU tests / environment * Add GPU testing to docs page
* initial * improvements * bugfixes * Move UDF validation to registration, cache relevant info Co-authored-by: Charles Blackmon-Luca <[email protected]>
…ntrib#640) * Set Dask-sql as the default Fugue Dask engine when installed * Set Dask-sql as the default Fugue Dask engine when installed * Add max version constraint for `fugue` (dask-contrib#639) * Remove antlr4-python3-runtime constraint from 3.9+ test envs * Revert "Remove antlr4-python3-runtime constraint from 3.9+ test envs" This reverts commit ef30656. * Add max version constraint for fugue in 3.9+ envs * Constrain Fugue in remaining env/setup files * Clarify fugue constraint comments * Add pinning back to python 3.8 jdk11 tests * More reversions to python 3.8 jdk11 testing env * update * update * update * fix tests * update tests * update a few things * update * fix * conda install fugue in testing envs * Remove diff from features notebook * Alter documentation to mention automatic registration of execution engine * Expand FugueSQL notebook * Don't manually close client in simple statement test Co-authored-by: Charles Blackmon-Luca <[email protected]>
Codecov Report
@@ Coverage Diff @@
## datafusion-sql-planner #654 +/- ##
=========================================================
Coverage ? 62.13%
=========================================================
Files ? 73
Lines ? 3631
Branches ? 749
=========================================================
Hits ? 2256
Misses ? 1226
Partials ? 149 📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
ayushdg
approved these changes
Aug 1, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment but changes lgtm
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.