-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASV] add index, columns, shape benchmarks #2725
Comments
anmyachev
added a commit
to anmyachev/modin
that referenced
this issue
Feb 11, 2021
Signed-off-by: Anatoly Myachev <[email protected]>
6 tasks
dchigarev
pushed a commit
that referenced
this issue
Feb 12, 2021
Signed-off-by: Anatoly Myachev <[email protected]>
aregm
added a commit
to aregm/modin
that referenced
this issue
Feb 18, 2021
* FIX-modin-project#2195: fix describe error for datasets with datetimes (modin-project#2272) * FIX-modin-project#2195: fix describe error for datasets with datetimes Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2195: add test Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2195: enable fix Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2195: Update modin/pandas/test/dataframe/test_reduction.py Co-authored-by: Dmitry Chigarev <[email protected]> Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#1906: fixed incorrect behaviour of 'groupby.__getattr' (modin-project#2276) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2277: applied Title Case to the names of DATASET_SIZE_DICT keys (modin-project#2278) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2280: use 32 bytes in secrets.token_hex (modin-project#2286) Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2260: use recommended pandas testing api (modin-project#2273) * TEST-modin-project#2260: use recommended pandas testing api Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2260: replace getSeriesData with test_data Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2260: remove assert_categories_equal Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2254: handling dict functions at groupby.agg improved (modin-project#2267) Signed-off-by: Dmitry Chigarev <[email protected]> * FEAT-modin-project#2282: support DataFrame.[count|max|min|sum] for OmniSci backend (modin-project#2283) Signed-off-by: ienkovich <[email protected]> * FIX-modin-project#1976: indices matching at reduction functions fixed (modin-project#2270) Signed-off-by: Dmitry Chigarev <[email protected]> * FEAT-modin-project#2299: support value_counts in OmniSci backend. (modin-project#2300) Signed-off-by: ienkovich <[email protected]> * FIX-modin-project#1765: Fix support of s3 in `read_parquet` (modin-project#2287) Signed-off-by: Alexey Prutskov <[email protected]> * FIX-modin-project#2285: Default to pandas warning message improved (modin-project#2302) Signed-off-by: Dmitry Chigarev <[email protected]> * FEAT-modin-project#2303: fix OmniSci aggregates and add mean (modin-project#2304) Signed-off-by: ienkovich <[email protected]> * FIX-modin-project#2258: return 'Commit Message formatting' topic (modin-project#2306) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2133 modin-project#2265: Fix binary operations for modin frames in case when partitioning isn't aligned (modin-project#2256) Signed-off-by: Alexey Prutskov <[email protected]> * FIX-modin-project#2239: Compute row index start using pandas (modin-project#2240) * FIX-modin-project#2239: Compute row index start using pandas Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#2239: Documentation Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#2239: Improve testing for case Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#2253: loc assignment fixed in case of (1, 1) shape frame (modin-project#2316) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2311: fixed performance bottleneck at reduction operations (modin-project#2314) Signed-off-by: Dmitry Chigarev <[email protected]> * TEST-modin-project#2288: Cover by tests delimiters parameters of read_csv (modin-project#2310) Signed-off-by: Alexander Myskov <[email protected]> * FIX-modin-project#2234: update dask_deps in setup.py (modin-project#2325) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2326: move s3fs import in _read function (modin-project#2327) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2329: TypeError while creating cluster (modin-project#2330) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-#0000: Indexing regression (modin-project#2333) * FIX-#0000: Indexing regression Signed-off-by: Devin Petersohn <[email protected]> * FIX-#0000: Fix `loc` Signed-off-by: Devin Petersohn <[email protected]> * FIX-#0000: Fix DatetimeIndex Signed-off-by: Devin Petersohn <[email protected]> * FIX-#0000: Fix Datetime and checks Signed-off-by: Devin Petersohn <[email protected]> * DOCS-modin-project#2334: Add tutorials to main repo (modin-project#2335) Signed-off-by: Devin Petersohn <[email protected]> * DOCS-modin-project#2193: Add contributing doc in checklist (modin-project#2216) * DOCS-modin-project#2193: update contributing doc Signed-off-by: Anatoly Myachev <[email protected]> * REFACTOR-modin-project#2343: refactor offset, _read_rows, partitioned_file (modin-project#2344) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#1927: Fix performance issue related to `sparse` attribute access (modin-project#2318) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2269: Move `default_to_pandas` logic from API layer to backend (modin-project#2332) * FIX-modin-project#2269: Move `default_to_pandas` logic from API layer to backend Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2269: Added a test which calls _apply_agg_function Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2269: Added required arguments for groupby_agg Moved wrap_udf_function into backend because omnisci doesn't support executing lambdas. Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2269: Use correct default_to_pandas for groupby in backend, refactor default to pandas functions in BaseQC Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2269: Renamed new default_to_pandas_groupby function into private function of Pandas backend because it is not used anywhere else. Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2269: Fixed specification of backend now it is possible to specify --backend=PandasOnDask, --backend=PandasOnRay or --backend=PandasOnPython, not just --backend=BaseOnPython. Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2269: Fix BaseOnPython tests Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2269: Remove default_to_pandas_groupby Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2269: logic of dropping 'by' moved back to API level Signed-off-by: Dmitry Chigarev <[email protected]> Co-authored-by: Gregory Shimansky <[email protected]> Co-authored-by: Dmitry Chigarev <[email protected]> * TEST-modin-project#2292: Cover by tests Datetime Handling parameters of read_csv (modin-project#2336) Signed-off-by: Alexander Myskov <[email protected]> * FEAT-modin-project#2271: Add implementation of `groupby.shift` (modin-project#2323) Signed-off-by: Alexey Prutskov <[email protected]> * FIX-modin-project#2348: Fix default to pandas warnings (modin-project#2349) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2357: Fix path to documentation for contributing (modin-project#2358) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2352: remove deprecated option: 'num-redis-shards' (modin-project#2353) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2339: Fix links to documentation (modin-project#2361) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2354: use conda activate instead of conda run (modin-project#2355) Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2363: introduce getter and setter for index name (modin-project#2368) Signed-off-by: ienkovich <[email protected]> * FEAT-modin-project#1844: upgrade pyarrow to 1.0 (modin-project#2347) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2365: Fix `Series.value_counts` when `dropna=False` (modin-project#2366) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2369: Update pandas version to 1.1.4 (modin-project#2371) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2322: add aligning partition' blocks (modin-project#2367) Signed-off-by: Anatoly Myachev <[email protected]> * Bump version to 0.8.2 (modin-project#2383) Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#2386: add new location for import ray functions (modin-project#2387) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2388: Fixed requirements for omnisci binaries (modin-project#2389) Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2380: don't ignore lengths parameter for dask engine (modin-project#2381) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2390: Fix inserting Series into DataFrame (modin-project#2391) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-2200: Enable Calcite by default in OmniSci backend (modin-project#2385) Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2289: Columns, Index Locations and Names parameters of read_csv (modin-project#2319) Signed-off-by: Alexander Myskov <[email protected]> * REFACTOR-modin-project#2397: remove redundant assigment (modin-project#2398) Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2363: fix index name setter in OmniSci backend (modin-project#2379) Signed-off-by: ienkovich <[email protected]> * Merged groupby_agg and groupby_dict_agg to implement dictionary functions aggregations (modin-project#2317) * FIX-modin-project#2254: Added dictionary functions to groupby aggregate tests Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2254: Initial implementation of dictionary functions aggregation Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2254: Remove lambda wrapper to allow dictionary to go to backend Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2254: Fixed AttributeError not being thrown from getattr Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2254: Lint fixes Signed-off-by: Gregory Shimansky <[email protected]> * FEAT-modin-project#2363: fix index name setter in OmniSci backend Signed-off-by: ienkovich <[email protected]> * FIX-modin-project#2254: Removed obsolete groupby_dict_agg API function Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2254: Fixed dict aggregate for base backend Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2254: Address reformatting comments Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2254: Remove whitespace Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2254: Removed redundant argument conversion because it is already done inside of base backend. Signed-off-by: Gregory Shimansky <[email protected]> Co-authored-by: ienkovich <[email protected]> * FIX-modin-project#2406: filter dictionary aggregation keys to limit them to keys only present in current partition (modin-project#2407) * FIX-modin-project#2406: Added test to detect this bug Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2406: Added filter for keys absent in current partition Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2406: Attemt to fix broken test on BaseOnPython backend This test gets a corrupted dataframe with "col2" removed by previous test cases. Signed-off-by: Gregory Shimansky <[email protected]> * DOCS-modin-project#2413: Add examples page to documentation (modin-project#2414) * Resolves modin-project#2413 Signed-off-by: Devin Petersohn <[email protected]> * DOCS-modin-project#2415: Add comparisons section to documentation with stubs (modin-project#2416) Signed-off-by: Devin Petersohn <[email protected]> * DOCS-modin-project#2417: add sklearn example (modin-project#2425) Signed-off-by: reshamas <[email protected]> * DOCS-modin-project#2421: Fixes bad link on contributing from architecture.rst (modin-project#2427) Signed-off-by: Victor Fomin <[email protected]> * DOCS-modin-project#2419: Updated CONTRIBUTING.rst (modin-project#2423) Signed-off-by: Victor Fomin <[email protected]> * DOCS-modin-project#2426,DOCS-modin-project#2424: Fixed two issues (modin-project#2431) - Closes modin-project#2424, CONTRIBUTING.rst does not render the commit message formatting example - Closes modin-project#2426, Bad links in index.rst - Renamed CONTRIBUTING.rst into contributing.rst Signed-off-by: Victor Fomin <[email protected]> * DOCS-modin-project#2420: Changed documentation to numpydoc style (modin-project#2429) Signed-off-by: Mohammed Kashif <[email protected]> Co-authored-by: Mohammed Kashif <[email protected]> * DOCS-modin-project#2433: Updated README.md with modin_vs_dask.md doc (modin-project#2435) Signed-off-by: Abdulelah S. Al Mesfer <[email protected]> * FIX-modin-project#2450: fix CI recipe (modin-project#2449) Signed-off-by: Dmitry Chigarev <[email protected]> * DOCS-modin-project#2437: Add documentation contrasting Modin and Dask (modin-project#2441) * Resolves modin-project#2437 Signed-off-by: Devin Petersohn <[email protected]> * FEAT-modin-project#2444: add docker file for nyc on omnisci (modin-project#2445) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2458: fix 'psutil' install (modin-project#2452) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2456: update taxi queries with .copy usage (modin-project#2457) Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2447: add docker file for census on omnisci (modin-project#2448) Also add instructions for building docker images Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2470: revert b867edf (modin-project#2471) Signed-off-by: Alexander Myskov <[email protected]> * FIX-modin-project#2473: Some configuration values should not be transformed (modin-project#2476) * FIX-modin-project#2473: Some configuration values should not be transformed Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#2473: Add tests for ExactStr Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#2402: Fix read_excel when files come from older windows (modin-project#2403) * Resolves modin-project#2402 * Search for the content files instead of assuming location Signed-off-by: Devin Petersohn <[email protected]> * REFACTOR-modin-project#2467: Convert internal base dataframe objects to ABC (modin-project#2468) Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#2459: Updated TeamCity tests image to use Ray as base image (modin-project#2460) Signed-off-by: Gregory Shimansky <[email protected]> * TEST-modin-project#2488: Increase commitlint message length limit to 88 characters from 70 (modin-project#2489) Signed-off-by: Devin Petersohn <[email protected]> * DOCS-modin-project#2439: Add Documentation for Modin vs. pandas (modin-project#2487) Signed-off-by: Devin Petersohn <[email protected]> * TEST-modin-project#2290: Cover by tests General Parsing Configuration parameters of read_csv (modin-project#2331) Signed-off-by: Alexander Myskov <[email protected]> * FIX-modin-project#2453: Remove sorting indices for equal values in `Series.value_counts` (modin-project#2454) Signed-off-by: Igoshev, Yaroslav <[email protected]> * TEST-modin-project#2291: Cover by tests NA and Missing Data Handling parameters of read_csv (modin-project#2337) Signed-off-by: Alexander Myskov <[email protected]> * REFACTOR-modin-project#2496: Change internal reader names to dispatcher (modin-project#2497) * Resolves modin-project#2496 Signed-off-by: Devin Petersohn <[email protected]> * TEST-modin-project#2294: add iteration parameters for read_csv tests (modin-project#2477) Signed-off-by: Alexander Myskov <[email protected]> * FIX-modin-project#2463: Added test with callable functions as aggregate argument (modin-project#2503) Signed-off-by: Gregory Shimansky <[email protected]> * TEST-modin-project#2296: Error Handling parameters of read_csv (modin-project#2501) Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2295: Cover by tests Quoting, Compression, and File Format parameters of read_csv (modin-project#2495) Co-authored-by: Anatoly Myachev <[email protected]> Signed-off-by: Alexander Myskov <[email protected]> * FEAT-modin-project#2479: integrate asv (modin-project#2484) * FEAT-modin-project#2479: integrate asv Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2479: add merge pytest-benchmark in asv style Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2479: add CI job for check asv benchmarks Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2479: increase verbosity Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2479: use launch-method=spawn Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2479: add CpuCount usage to control number of partitions Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2479: change: TestDatasetSize -> MODIN_TEST_DATASET_SIZE Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2374: remove extra code; add pandas way to handle duplicate values in reindex func for binary operations (modin-project#2378) Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2297: Cover by tests Internal parameters of read_csv (modin-project#2502) Signed-off-by: Alexander Myskov <[email protected]> * Ensure excel reader closes file if it is passed as path (modin-project#2514) Signed-off-by: Vasilij Litvinov <[email protected]> * FEAT-modin-project#2375: implementation of multi-column groupby aggregation (modin-project#2461) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2442: fixed Series assignment with different indices (modin-project#2443) Signed-off-by: Dmitry Chigarev <[email protected]> * FEAT-modin-project#2013: merge_asof that is a little more efficient (modin-project#2510) * FEAT-modin-project#2013: merge_asof that is a little more efficient. Signed-off-by: Itamar Turner-Trauring <[email protected]> Signed-off-by: Devin Petersohn <[email protected]> * DOCS-modin-project#2436: Explicit local / single node backend (modin-project#2483) Signed-off-by: raphaelauv <[email protected]> * Fix indices when reading Excel files in parallel (modin-project#2526) Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#2527: Use random name for hdf file test, clean file after testing (modin-project#2528) Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#2524: Update pandas version to 1.1.5 (modin-project#2525) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2408: Fix read_csv and read_table args when used inside a decora… (modin-project#2486) Signed-off-by: Weiwen Gu <[email protected]> * FIX-modin-project#2169: avoid unnecessary index access in groupby (modin-project#2469) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2313: improved handling non-numeric types at 'mean' when 'axis=1' (modin-project#2535) Signed-off-by: Dmitry Chigarev <[email protected]> * TEST-modin-project#2509: Io tests refactoring (modin-project#2523) * TEST-modin-project#2509: refactor read_csv tests Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: refactor tests with warnings Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: read_parquet tests refactoring Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: read_json tests refactoring Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: read_excel tests refactoring Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: read_hdf tests refactoring Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: add html and sql tests Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: fwf tests refactoring Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: further tests refactoring Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: mark xfailed tests and fix Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: fix Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: further refactoring Signed-off-by: Alexander Myskov <[email protected]> TEST-modin-project#2509: correct teardown stage Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2509: mark failed tests Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2509: fix Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2509: correct test_HDFStore test Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2509: use common teardown function Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2509: typo fix Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2509: fix Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2509: addressing review comments Co-authored-by: Anatoly Myachev <[email protected]> Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2509: addressing review comments Signed-off-by: Alexander Myskov <[email protected]> Co-authored-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2540: add __iter__ implementation (modin-project#2541) Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2520: add most important operations for asv benchmarks (modin-project#2539) * FEAT-modin-project#2520: add most important operations for asv benchmarks Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2520: add groupby microbenchmarks Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2520: address review comments Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2498: Fix possible number of partitions for Dask engine (modin-project#2532) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2550: remove decorators usage for asv tested functions (modin-project#2551) Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2236: Handling of space limited Ray Plasma directories (modin-project#2547) Signed-off-by: Alexander Myskov <[email protected]> * DOCS-modin-project#2518: add asv usage topic (modin-project#2549) * DOCS-modin-project#2518: add asv usage topic Signed-off-by: Anatoly Myachev <[email protected]> * DOCS-modin-project#2518: fix style Signed-off-by: Anatoly Myachev <[email protected]> * DOCS-modin-project#2518: address review comments Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2491: optimized groupby dictionary aggregation (modin-project#2534) Signed-off-by: Dmitry Chigarev <[email protected]> * FEAT-modin-project#2553: add ability to run microbenchmarks for old Modin version (modin-project#2554) Signed-off-by: Anatoly Myachev <[email protected]> * Fix .loc[] assignment for Modin Series (modin-project#2555) Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#2482: improved handling non-str 'by' (modin-project#2548) Signed-off-by: Dmitry Chigarev <[email protected]> * Fix taxi-runner.py cluster example (modin-project#2557) * Added regression test * Fix modin package installation Signed-off-by: Anatoly Myachev <[email protected]> * Fix loc/iloc assignments when columns are selected (modin-project#2536) * FIX-modin-project#1620: Add test for reported issue Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#1620: Use pandas.reindex() properly Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#1620: Improve tests Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#1620: Convert lookups to values for both indices and columns Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#1620: Add test for .loc[] ordering Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#1620: XFail a test that unearths internal sorting Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#1620: Improve test robustness a bit per code review Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#2559: Ignore files from /proc/ when detecting file leaks (modin-project#2560) Signed-off-by: Vasilij Litvinov <[email protected]> * Switch to Ray from conda-forge (modin-project#2562) * FIX-modin-project#2561: Switch to Ray from conda-forge, abandon pip caching Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#2561: Remove pip caching from push CI actions Signed-off-by: Vasilij Litvinov <[email protected]> * FIX-modin-project#2566: Ensure `Series.unique` does not return a scalar when there is only one unique value (modin-project#2567) * FIX-modin-project#2566: Ensure unique doesn't return a scalar using np.atleast_1d Signed-off-by: Richard Lin <[email protected]> * FIX-modin-project#2566: Check array shapes match for test_unique Signed-off-by: Richard Lin <[email protected]> * FIX-modin-project#2566: Reduce unique dimensions using constructor instead Signed-off-by: Richard Lin <[email protected]> * FIX-modin-project#2572: fixed arrow version in OmniSci dependencies (modin-project#2571) Signed-off-by: Dmitry Chigarev <[email protected]> * DOCS-modin-project#2578: fix simple typo, parition -> partition (modin-project#2573) There is a small typo in modin/engines/dask/pandas_on_dask/frame/partition.py, modin/engines/ray/pandas_on_ray/frame/partition.py. Should read `partition` rather than `parition`. Signed-off-by: Tim Gates <[email protected]> * FIX-#0000: pin xlrd<=1.2.0 (modin-project#2594) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2543: fixed handling 'as_index' at groupby dictionary renaming aggregation (modin-project#2592) Signed-off-by: Dmitry Chigarev <[email protected]> * Release commit for version 0.8.3 (modin-project#2597) Signed-off-by: Devin Petersohn <[email protected]> * REFACTOR-modin-project#2580: Move automatic engine init to after data ingestion (modin-project#2581) * REFACTOR-modin-project#2580: Move automatic engine init to after data ingestion * Resovles modin-project#2580 Instead of automatically starting the engine when Modin is imported, we start it after the first time the user reads or creates a dataframe. This is intended to help downstream libraries not need the engine to check for typing, as well as clear up some transient errors that can occur with certain engines on large machines. I have also added a warning message that informs the user how to clear the message. We will likely need a way to suppress these errors, because many users will not care about them and potentially want to suppress. We will probably also want to add a benchmarking page on best practices for benchmarking because this change can give the impression of a performance degradation on data ingestion even though nothing is changing from that perspective. Signed-off-by: Devin Petersohn <[email protected]> * REFACTOR-modin-project#2580: Add to experimental API Signed-off-by: Devin Petersohn <[email protected]> * REFACTOR-modin-project#2580: Add `read_feather` and `read_clipboard` Signed-off-by: Devin Petersohn <[email protected]> * REFACTOR-modin-project#2580: Remove redundant error message Signed-off-by: Devin Petersohn <[email protected]> * TEST-modin-project#2598: Add test for clean install from source (modin-project#2599) * TEST-modin-project#2598: Add test for clean install from source * Resolves modin-project#2598 This change adds a test for installing Modin without all of the testing dependencies. It is intended to test how a user who does not have all of the test dependencies will see a Modin import. * TEST-modin-project#2598: Target Python3 Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#976: add encoding parameter to read_csv call (modin-project#2593) * FIX-modin-project#976: add failed test Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#976: add encoding parameter to read_csv call Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#976: fix test in experimental mode Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2342: Add axis partitions API (modin-project#2515) Signed-off-by: Igoshev, Yaroslav <[email protected]> Co-authored-by: Devin Petersohn <[email protected]> * Fixed MultiIndex.from_frame implementation (modin-project#2587) Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2608: Disable proxy for commands running inside container (modin-project#2609) Signed-off-by: Gregory Shimansky <[email protected]> * FIX-modin-project#2601: reduce data size for some asv tests (modin-project#2602) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2611: Fixed crash and sklearn version (modin-project#2612) Signed-off-by: Gregory Shimansky <[email protected]> * FEAT-modin-project#2604: add docker file with plasticc benchmark on omnisci (modin-project#2605) * FEAT-modin-project#2604: add docker file with plasticc benchmark on omnisci * FEAT-modin-project#2604: change xgboost verbose_eval Signed-off-by: Anatoly Myachev <[email protected]> * DOCS-modin-project#2618: Add code of conduct (modin-project#2619) * Resolves modin-project#2618 Signed-off-by: Devin Petersohn <[email protected]> * FEAT-modin-project#2373: Add distributed xgboost on Modin with Ray (modin-project#2545) Signed-off-by: Alexey Prutskov <[email protected]> Co-authored-by: Devin Petersohn <[email protected]> * FEAT-2624: Improve performance of read_* methods when file handles are passed in (modin-project#2625) Signed-off-by: Zain Patel <[email protected]> * FIX-modin-project#2616: Add config for num partitions, deprecate DEFAULT_NPARTITIONS (modin-project#2622) Signed-off-by: Devin Petersohn <[email protected]> * FEAT-modin-project#2091: add distributed dataframe compare (modin-project#2579) Signed-off-by: Khang Vu <[email protected]> * DOCS-modin-project#2649: Fix github pr template's dead link. (modin-project#2650) Signed-off-by: William Ma <[email protected]> * FEAT-modin-project#2606: Support creating DataFrame from remote partitions (modin-project#2613) Signed-off-by: Igoshev, Yaroslav <[email protected]> * FIX-modin-project#2637: Fix deprecation warnings due to invalid escape sequences. (modin-project#2641) Signed-off-by: Karthikeyan Singaravelan <[email protected]> * REFACTOR-modin-project#2648: Correct uses of MapReduceFunction and metadata manipu… (modin-project#2655) * REFACTOR-modin-project#2648: Correct uses of MapReduceFunction and metadata manipulation Resolves modin-project#2648 Removes some code that is problematic for performance. There was a mix of use cases for modifying the external metadata and internal metadata, and some problematic components of these APIs that could hide bugs. The implementation has been updated to ensure that these bugs do not resurface. Previously, the internal and external indices were compared, and then updated according to some arguments that were passed in. This is not scalable because collecting the indices is expensive. The possible bugs hidden in this implementation decision could end up being very difficult to detect: it implicitly updates the internal or external indices based on a somewhat cryptic string pattern combined with a boolean flag. Another very large issue is that sometimes external indices are updated based on the partition lengths metadata. This was likely done to solve a use case of not using the APIs properly. This implementation has been removed and replaced with something more explicit. If the internal indices need to be updated, they are updated explicitly via existing APIs. Likewise if external indices need to be updated, they are updated with a different API. Several QueryCompiler APIs had to be reverted because they were misusing the ReductionFunction or MapReduceFunction, thus the need for the implicit modification of metadata. When this implicit modification was removed, these APIs no longer worked, and so were reverted until they can be reimplemented using correct APIs. The following APIs were reverted as a part of this commit: * `is_monotonic_increasing` * `is_monotonic_decreasing` * `value_counts` * `searchsorted` * `dt_tz` * `dt_freq` Signed-off-by: Devin Petersohn <[email protected]> * REFACTOR-modin-project#2648: Remove debug code Signed-off-by: Devin Petersohn <[email protected]> * REFACTOR-modin-project#2648: Fix explicit rename Signed-off-by: Devin Petersohn <[email protected]> * DOCS-2653: Fix links in Modin's documentation (modin-project#2654) Signed-off-by: Alexey Prutskov <[email protected]> * FEAT-modin-project#2663: Add algebraic operator `from_labels` (modin-project#2665) Resolves modin-project#2663 This operator is necessary for efficient `reset_index` operations. See this paper for more information on the operator: http://www.vldb.org/pvldb/vol13/p2033-petersohn.pdf Co-authored-by: William Ma <[email protected]> Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#2672: pin numpy>=1.16.5,<1.20 (modin-project#2673) Signed-off-by: Anatoly Myachev <[email protected]> * FEAT-modin-project#2675: Added benchmark for sort_values (modin-project#2676) Signed-off-by: Gregory Shimansky <[email protected]> * FEAT-modin-project#2664: Add `to_labels` algebraic operator (modin-project#2666) Resolves modin-project#2664 This add the algebraic operator for `to_labels`, which enables Modin to better optimize the movement of data to metadata. See more in the paper about the algebraic operator: http://www.vldb.org/pvldb/vol13/p2033-petersohn.pdf Co-authored-by: William Ma <[email protected]> Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#1806: Resolved error when reverting to Pandas for Multiindex (modin-project#2660) Signed-off-by: Todd Yu <[email protected]> * FIX-modin-project#2614: Up python version for test jobs (modin-project#2615) Signed-off-by: Igoshev, Yaroslav <[email protected]> * DOCS-2633: Add documentation for distributed XGBoost on Modin (modin-project#2640) Signed-off-by: Alexey Prutskov <[email protected]> * FIX-modin-project#2667: Change names of files for development env (modin-project#2668) Signed-off-by: Alexey Prutskov <[email protected]> * FIX-modin-project#2658: Move backend check in xgb to train/predict (modin-project#2659) Signed-off-by: Alexey Prutskov <[email protected]> * FEAT-modin-project#2451: Read multiple csv files simultaneously via glob paths (modin-project#2662) Signed-off-by: William Ma <[email protected]> * FIX-modin-project#2681: pin numpy<1.20.0 for docker containers with omnisci (modin-project#2682) Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: some updates to improve asv tests stability (modin-project#2671) * TEST-modin-project#2670: some updates to improve asv tests stability Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: fixes Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: data_size -> shape Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: use dict approach Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: use CpuCount when Npartitions isn't defined Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: fix ASV_DATASET_SIZE Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: update TimeSortValues Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: modify asv tests for using with old modin version Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: reply to review comments Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2670: use env variables for default values Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2686: add fillna benchmark (modin-project#2687) * TEST-modin-project#2686: add fillna benchmark Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2686: reply to review comments Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2686: add inplace parameter Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2692: add drop benchmark (modin-project#2693) * TEST-modin-project#2692: add drop benchmark Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2692: add one column case Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2688: Update ray.ObjectID to ray.ObjectRef for Ray 2.0 (modin-project#2695) * FIX-modin-project#2688: Update ray.ObjectID to ray.ObjectRef for Ray 2.0 Resovles modin-project#2688 Signed-off-by: Devin Petersohn <[email protected]> * FIX-modin-project#2688: Address comments Signed-off-by: Devin Petersohn <[email protected]> * TEST-modin-project#2707: add lint check for ASV benchmarks (modin-project#2708) Signed-off-by: Dmitry Chigarev <[email protected]> * TEST-modin-project#2699: add append benchmark (modin-project#2700) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2684: Add method level docs for Modin XGBoost (modin-project#2685) Signed-off-by: Alexey Prutskov <[email protected]> * TEST-modin-project#2694: add head benchmark (modin-project#2696) * TEST-modin-project#2694: add head benchmark Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2694: add small number for head op Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2705: add 'value_counts' benchmarks (modin-project#2706) * TEST-modin-project#2705: add 'value_counts' benchmarks Signed-off-by: Dmitry Chigarev <[email protected]> * TEST-modin-project#2705: apply suggestions from review Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2709: fixed typo in '_copartition' (modin-project#2710) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2596: Update pandas version to 1.2.1 (modin-project#2600) Co-authored-by: Alexey Prutskov <[email protected]> Co-authored-by: Devin Petersohn <[email protected]> Co-authored-by: Dmitry Chigarev <[email protected]> Co-authored-by: Devin Petersohn <[email protected]> Signed-off-by: Igoshev, Yaroslav <[email protected]> * TEST-modin-project#2690: add astype benchmark (modin-project#2691) * TEST-modin-project#2690: add astype benchmark Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2690: add category dtype; use df.types Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2690: add case with one column Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2702: add loc/iloc benchmark (modin-project#2703) * TEST-modin-project#2702: add loc/iloc benchmark Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2702: add multiindex loc bench Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2702: add row_loc check Signed-off-by: Anatoly Myachev <[email protected]> * TEST-modin-project#2716: add describe bench (modin-project#2718) Signed-off-by: Anatoly Myachev <[email protected]> * DOCS-modin-project#2717: Fix version of Modin for building latest docs (modin-project#2719) Signed-off-by: Alexey Prutskov <[email protected]> * FEAT-modin-project#1611: Add mod operation (modin-project#2726) Signed-off-by: Alina <[email protected]> * TEST-modin-project#2725: add index, columns, shape benchmarks (modin-project#2727) Signed-off-by: Anatoly Myachev <[email protected]> * FIX-modin-project#2305: fix handling of renaming aggregation (modin-project#2732) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2362: fix key handling in 'Series.__setitem__' (modin-project#2731) Signed-off-by: Dmitry Chigarev <[email protected]> * TEST-modin-project#2722: add ASV read_csv skiprows benchmark (modin-project#2724) * TEST-modin-project#2722: add ASV read_csv skiprows benchmark Co-authored-by: Anatoly Myachev <[email protected]> Signed-off-by: Alexander Myskov <[email protected]> * FIX-modin-project#2735: move '.reindex' logic about axis dispatching from the base class (modin-project#2736) Signed-off-by: Dmitry Chigarev <[email protected]> * TEST-modin-project#1496: add tests for setting new column with different from frame length (modin-project#2733) Signed-off-by: Dmitry Chigarev <[email protected]> * REFACTOR-modin-project#2739: io tests refactoring (modin-project#2740) Signed-off-by: Alexander Myskov <[email protected]> * TEST-modin-project#2753: add GroupBy benchmarsk with huge amount of groups (modin-project#2754) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2362: fix handling slices in 'DataFrame.__setitem__' (modin-project#2741) Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2742: fix performance degradation for dictionary GroupBy aggregation (modin-project#2743) * FIX-modin-project#2742: changed callable functions to its names in dict aggregation Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2742: commends added Signed-off-by: Dmitry Chigarev <[email protected]> * FIX-modin-project#2737: fix handling of dates for read_csv with OmniSci backend (modin-project#2738) Co-authored-by: Anatoly Myachev <[email protected]> Signed-off-by: Alexander Myskov <[email protected]> * DOCS-modin-project#2584: Add CODEOWNERS file (modin-project#2759) * Resolves modin-project#2584 Signed-off-by: Devin Petersohn <[email protected]> Co-authored-by: Anatoly Myachev <[email protected]> Co-authored-by: Dmitry Chigarev <[email protected]> Co-authored-by: ienkovich <[email protected]> Co-authored-by: Alexey Prutskov <[email protected]> Co-authored-by: Devin Petersohn <[email protected]> Co-authored-by: amyskov <[email protected]> Co-authored-by: YarShev <[email protected]> Co-authored-by: Gregory Shimansky <[email protected]> Co-authored-by: Dmitry Chigarev <[email protected]> Co-authored-by: Gregory Shimansky <[email protected]> Co-authored-by: Reshama Shaikh <[email protected]> Co-authored-by: vfdev <[email protected]> Co-authored-by: Mohammed Kashif <[email protected]> Co-authored-by: Mohammed Kashif <[email protected]> Co-authored-by: Abdulelah S. Al Mesfer <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Vasily Litvinov <[email protected]> Co-authored-by: Itamar Turner-Trauring <[email protected]> Co-authored-by: raphaelauv <[email protected]> Co-authored-by: Weiwen Gu <[email protected]> Co-authored-by: Richard Lin <[email protected]> Co-authored-by: Tim Gates <[email protected]> Co-authored-by: Devin Petersohn <[email protected]> Co-authored-by: Zain Patel <[email protected]> Co-authored-by: Khang Vu <[email protected]> Co-authored-by: William Ma <[email protected]> Co-authored-by: Karthikeyan Singaravelan <[email protected]> Co-authored-by: Todd Yu <[email protected]> Co-authored-by: Alina Bykovskaya <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: