-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add _SINGLE_PROCESS property to CachedDataSet #1905
Conversation
83f784e
to
a47b514
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! 😄 I've fixed the linter in another branch, so it should pass now.
I've left one minor suggestion and then it can be merged!
kedro/io/cached_dataset.py
Outdated
# for parallelism within a Spark pipeline please consider | ||
# ``ThreadRunner`` instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove these two sentences, because for the CachedDataSet
this is not related to Spark in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Thanks for the suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MerelTheisenQB may we could keep the suggestion to use ThreadRunner
?
# for parallelism please consider ``ThreadRunner`` instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulations on your first PR! 🎉 Great work.
For future reference, working on a branch on kedro-org/kedro repo is totally fine and is the way we normally do it. It simplifies the workflow by quite a bit 🙂.
Thanks @jmholzer! I've followed the process in the contribution guidelines. Next time, I'll create a branch direct on |
Ohh I see 😃 I did the same for my first PR. Thanks for reminding us, let me see about creating an issue to update our contributor guidelines. |
a341e89
to
05ffe6a
Compare
Signed-off-by: Carla Vieira <[email protected]> Signed-off-by: carlaprv <[email protected]>
Signed-off-by: carlaprv <[email protected]>
05ffe6a
to
28f9ee8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the contribution! ⭐ ⭐ ⭐
Signed-off-by: Carla Vieira <[email protected]> Signed-off-by: Ahdra Merali <[email protected]>
Signed-off-by: Carla Vieira <[email protected]> Signed-off-by: nickolasrm <[email protected]>
* Release/0.18.3 (#1856) * Update release version and release notes Signed-off-by: Nok Chan <[email protected]> * Update missing release notes Signed-off-by: Nok Chan <[email protected]> * update vresion Signed-off-by: Nok Chan <[email protected]> * update release notes Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Remove comment from code example Signed-off-by: Ahdra Merali <[email protected]> * Remove more comments Signed-off-by: Ahdra Merali <[email protected]> * Add YAML formatting Signed-off-by: Ahdra Merali <[email protected]> * Add missing import Signed-off-by: Ahdra Merali <[email protected]> * Remove even more comments Signed-off-by: Ahdra Merali <[email protected]> * Remove more even more comments Signed-off-by: Ahdra Merali <[email protected]> * Add pickle requirement to extras_require Signed-off-by: Ahdra Merali <[email protected]> * Try fix YAML docs Signed-off-by: Ahdra Merali <[email protected]> * Try fix YAML docs pt 2 Signed-off-by: Ahdra Merali <[email protected]> * Fix code snippets in docs (#1876) * Fix code snippets Signed-off-by: Ahdra Merali <[email protected]> * Separate code blocks Signed-off-by: Ahdra Merali <[email protected]> * Lint Signed-off-by: Ahdra Merali <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Fix issue with specifying format for SparkHiveDataSet (#1857) Signed-off-by: jstammers <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Update RELEASE.md (#1883) * Update RELEASE.md * fix broken link * Update RELEASE.md Co-authored-by: Merel Theisen <[email protected]> Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Deprecate `kedro test` and `kedro lint` (#1873) * Deprecating `kedro test` and `kedro lint` Signed-off-by: Nok Chan <[email protected]> * Deprecate commands Signed-off-by: Nok Chan <[email protected]> * Make kedro looks prettier * Update Linting Signed-off-by: Nok <[email protected]> Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Nok <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Fix micro package pull from PyPI (#1848) Signed-off-by: Florian Gaudin-Delrieu <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Update Error message for `VersionNotFoundError` to handle Permission related issues better (#1881) * Update message for VersionNotFoundError Signed-off-by: Ankita Katiyar <[email protected]> * Add test for VersionNotFoundError for cloud protocols * Update test_data_catalog.py Update NoVersionFoundError test * minor linting update * update docs link + styling changes * Revert "update docs link + styling changes" This reverts commit 6088e00. * Update test with styling changes * Update RELEASE.md Signed-off-by: ankatiyar <[email protected]> Signed-off-by: Ankita Katiyar <[email protected]> Signed-off-by: ankatiyar <[email protected]> Co-authored-by: Ahdra Merali <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Update experiment tracking documentation with working examples (#1893) Signed-off-by: Merel Theisen <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Add NHS AI Lab and ReSpo.Vision to companies list (#1878) Signed-off-by: Ahdra Merali <[email protected]> * Document how users can use pytest instead of kedro test (#1879) * Add best_practices.md with introductory sections Signed-off-by: Jannic Holzer <[email protected]> * Add pytest and pytest-cov sections Signed-off-by: Jannic Holzer <[email protected]> * Add pytest-cov coverage report Signed-off-by: Jannic Holzer <[email protected]> * Add sections on pytest-cov Signed-off-by: Jannic Holzer <[email protected]> * Add automated_testing to index.rst Signed-off-by: Jannic Holzer <[email protected]> * Reformat third-party library names and clean grammar. Signed-off-by: Jannic Holzer <[email protected]> * Add link to virtual environment docs Signed-off-by: Jannic Holzer <[email protected]> * Add example of good test naming Signed-off-by: Jannic Holzer <[email protected]> * Improve link accessibility Signed-off-by: Jannic Holzer <[email protected]> * Improve pytest docs link accessibility Signed-off-by: Jannic Holzer <[email protected]> * Add reminder link to virtual environment docs Signed-off-by: Jannic Holzer <[email protected]> * Fix formatting in link to coverage docs Signed-off-by: Jannic Holzer <[email protected]> * Remove reference to /src under 'Run your tests' Signed-off-by: Jannic Holzer <[email protected]> * Modify references to <project_name> to <package_name> Signed-off-by: Jannic Holzer <[email protected]> * Fix sentence structure Signed-off-by: Jannic Holzer <[email protected]> * Fix broken databricks doc link Signed-off-by: Jannic Holzer <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Capitalise Kedro-Viz in the "Visualize layers" section (#1899) * Capitalised kedro-viz Signed-off-by: yash6318 <[email protected]> * capitalised Kedro viz Signed-off-by: yash6318 <[email protected]> * Updated set_up_experiment_tracking.md Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: yash6318 <[email protected]> Signed-off-by: yash6318 <[email protected]> Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Fix linting on autmated test page (#1906) Signed-off-by: Merel Theisen <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Add _SINGLE_PROCESS property to CachedDataSet (#1905) Signed-off-by: Carla Vieira <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Update the tutorial of "Visualise pipelines" (#1913) * Change a file extention to match the previous article Signed-off-by: dinotuku <[email protected]> * Add a missing import Signed-off-by: dinotuku <[email protected]> * Change both preprocessed datasets to parquet files Signed-off-by: dinotuku <[email protected]> * Change data type to ParquetDataSet for parquet files Signed-off-by: dinotuku <[email protected]> * Add a note for installing seaborn if it is not installed Signed-off-by: dinotuku <[email protected]> Signed-off-by: dinotuku <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Document how users can use linting tools instead of `kedro lint` (#1904) * Add documentation for linting tools Signed-off-by: Ankita Katiyar <[email protected]> * Revert changes to commands_reference.md Signed-off-by: Ankita Katiyar <[email protected]> * Update linting docs with suggestions Signed-off-by: Ankita Katiyar <[email protected]> * Update linting doc Signed-off-by: Ankita Katiyar <[email protected]> Signed-off-by: Ankita Katiyar <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Make core config accessible in dict get way (#1870) Signed-off-by: Merel Theisen <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Create dependabot.yml configuration file for version updates (#1862) * Create dependabot.yml configuration file * Update dependabot.yml Signed-off-by: SajidAlamQB <[email protected]> * add target-branch Signed-off-by: SajidAlamQB <[email protected]> * Update dependabot.yml Signed-off-by: SajidAlamQB <[email protected]> * limit dependabot to just dependency folder Signed-off-by: SajidAlamQB <[email protected]> * Update test_requirements.txt Signed-off-by: SajidAlamQB <[email protected]> * Update MANIFEST.in Signed-off-by: SajidAlamQB <[email protected]> * fix e2e Signed-off-by: SajidAlamQB <[email protected]> * Update continue_config.yml Signed-off-by: SajidAlamQB <[email protected]> * Update requirements.txt Signed-off-by: SajidAlamQB <[email protected]> * Update requirements.txt Signed-off-by: SajidAlamQB <[email protected]> * fix link Signed-off-by: SajidAlamQB <[email protected]> * revert Signed-off-by: SajidAlamQB <[email protected]> * Delete requirements.txt Signed-off-by: SajidAlamQB <[email protected]> Signed-off-by: SajidAlamQB <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Update dependabot config (#1928) Signed-off-by: Ahdra Merali <[email protected]> * Update robots.txt (#1929) Signed-off-by: Ahdra Merali <[email protected]> * fix broken link (#1950) Signed-off-by: Ahdra Merali <[email protected]> * Update dependabot.yml config (#1938) * Update dependabot.yml Signed-off-by: SajidAlamQB <[email protected]> * pin jupyterlab_services to requirments Signed-off-by: SajidAlamQB <[email protected]> * lint Signed-off-by: SajidAlamQB <[email protected]> Signed-off-by: SajidAlamQB <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Update setup.py Jinja2 dependencies (#1954) Signed-off-by: Ahdra Merali <[email protected]> * Update pip-tools requirement from ~=6.5 to ~=6.9 in /dependency (#1957) Updates the requirements on [pip-tools](https://github.com/jazzband/pip-tools) to permit the latest version. - [Release notes](https://github.com/jazzband/pip-tools/releases) - [Changelog](https://github.com/jazzband/pip-tools/blob/master/CHANGELOG.md) - [Commits](jazzband/pip-tools@6.5.0...6.9.0) --- updated-dependencies: - dependency-name: pip-tools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Ahdra Merali <[email protected]> * Update toposort requirement from ~=1.5 to ~=1.7 in /dependency (#1956) Updates the requirements on [toposort]() to permit the latest version. --- updated-dependencies: - dependency-name: toposort dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Sajid Alam <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Add deprecation warning to package_name argument in session create() (#1953) Signed-off-by: Merel Theisen <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Remove redundant `resolve_load_version` call (#1911) * remove a redundant function call Signed-off-by: Nok Chan <[email protected]> * Remove redundant resolove_load_version & fix test Signed-off-by: Nok Chan <[email protected]> * Fix HoloviewWriter tests with more specific error message pattern & Lint Signed-off-by: Nok Chan <[email protected]> * Rename tests Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> * Make docstring in test starter match real starters (#1916) Signed-off-by: Ahdra Merali <[email protected]> * Try to fix formatting error Signed-off-by: Merel Theisen <[email protected]> * Specify pickle import Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> Signed-off-by: jstammers <[email protected]> Signed-off-by: Nok <[email protected]> Signed-off-by: Florian Gaudin-Delrieu <[email protected]> Signed-off-by: Ankita Katiyar <[email protected]> Signed-off-by: ankatiyar <[email protected]> Signed-off-by: Merel Theisen <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> Signed-off-by: yash6318 <[email protected]> Signed-off-by: Carla Vieira <[email protected]> Signed-off-by: dinotuku <[email protected]> Signed-off-by: Ankita Katiyar <[email protected]> Signed-off-by: SajidAlamQB <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Nok <[email protected]> Co-authored-by: Jimmy Stammers <[email protected]> Co-authored-by: Merel Theisen <[email protected]> Co-authored-by: Florian Gaudin-Delrieu <[email protected]> Co-authored-by: Ankita Katiyar <[email protected]> Co-authored-by: Yetunde Dada <[email protected]> Co-authored-by: Jannic <[email protected]> Co-authored-by: Yash Agrawal <[email protected]> Co-authored-by: Deepyaman Datta <[email protected]> Co-authored-by: Carla Vieira <[email protected]> Co-authored-by: Kuan Tung <[email protected]> Co-authored-by: Sajid Alam <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Merel Theisen <[email protected]> Co-authored-by: Merel Theisen <[email protected]>
Description
Solves #1888
Development notes
The CachedDataSet cannot be used with the ParellelRunner this PR adds the
_SINGLE_PROCESS
property just like in DeltaTableDataSetBefore, this PR trying to use CachedDataSet and ParallelRunner together was failing.
Checklist
RELEASE.md
file