-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle hive-partitioning in NVTabular.dataset.Dataset #677
Conversation
Click to view CI ResultsGitHub pull request #677 of commit 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3, no merge conflicts. Running as SYSTEM Setting status of 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1989/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3^{commit} # timeout=10 Checking out Revision 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 # timeout=10 Commit message: "expand testing and fix bug" > git rev-list --no-walk 1f60ba950f935d104c7c8fa21742158698eba3eb # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins427843502843120314.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 93 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 763 items / 2 skipped / 761 selected |
Click to view CI ResultsGitHub pull request #677 of commit 89f29ce11125543cc1b4ca94b74cbbe0d4583adc, no merge conflicts. Running as SYSTEM Setting status of 89f29ce11125543cc1b4ca94b74cbbe0d4583adc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1990/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 89f29ce11125543cc1b4ca94b74cbbe0d4583adc^{commit} # timeout=10 Checking out Revision 89f29ce11125543cc1b4ca94b74cbbe0d4583adc (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 89f29ce11125543cc1b4ca94b74cbbe0d4583adc # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins1838599612458303638.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 94 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 770 items / 2 skipped / 768 selected |
Sounds great @rjzamora ! This PR will allow incremental training and evaluation of sequential recommender models and time series, as allows splitting data by time windows. |
Click to view CI ResultsGitHub pull request #677 of commit 0eedde3f05679f9948ceda0136fbe06b704e6c6b, no merge conflicts. Running as SYSTEM Setting status of 0eedde3f05679f9948ceda0136fbe06b704e6c6b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1999/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 0eedde3f05679f9948ceda0136fbe06b704e6c6b^{commit} # timeout=10 Checking out Revision 0eedde3f05679f9948ceda0136fbe06b704e6c6b (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 0eedde3f05679f9948ceda0136fbe06b704e6c6b # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 2dcc4a9ed07a2ff8254449e1291ebc2e4281ddbf # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins5306605500818761616.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 94 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 770 items / 2 skipped / 768 selected |
Click to view CI ResultsGitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts. Running as SYSTEM Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2011/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10 Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 42f58af7c1b8c1b29f31c482329dbf6bdd410c24 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins5873751992607622620.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
Is it possible that the dataloader CI failures are being caused by this PR? |
I don't think so - we've had some flaky tests around this for a while now (#397) , but for some reason these errors seem more common now |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts. Running as SYSTEM Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2015/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10 Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 40a7f2b1c8e4e6743f499c4904ba1db32fbba0a2 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins6076946174219903460.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts. Running as SYSTEM Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2016/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10 Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins2330873152729625782.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
Click to view CI ResultsGitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts. Running as SYSTEM Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2017/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10 Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins4907412045382266287.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts. Running as SYSTEM Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2018/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10 Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins1242628838666643737.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
@jperez999 This CI error looks familiar and I think you tried to explain it to me, but I didn't fully understand what's causing it. Thoughts on how to resolve it? |
@karlhigley ok so I think... and follow me on this. So the reason for the error is that the median stat operator collects a median value of NA for the continuous columns (sometimes both x and y, sometimes just one of them). When we go to apply the NA value we hit the error we are seeing. I think this is because of the way we setup the dataset and the beginning here https://github.com/NVIDIA/NVTabular/blob/main/tests/conftest.py#L108-L112 This inserts the NA values, and I think we are seeing the random positions coincidentally land on the median indexes. |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts. Running as SYSTEM Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2019/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10 Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins3416488090804215825.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
Click to view CI ResultsGitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts. Running as SYSTEM Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2026/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10 Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 2d552f6806f843cc0da94110e76e281e1db982b8 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins5070192051355054301.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts. Running as SYSTEM Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2027/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10 Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins3656656512144996904.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
@karlhigley @jperez999 @rjzamora I think the NA failure in the unittests is unrelated to this PR - I spent some time debugging this and left my thoughts here #687 (comment) . That PR has a 'fix' - but would like to figure out why this is happening |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts. Running as SYSTEM Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2030/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10 Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 78cea240d94600c01b749619aaaa33154ad88555 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins1600729026063402115.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts. Running as SYSTEM Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2032/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10 Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 4a11db2df80cadc1a37a6e53798d2aed3855e556 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins5352641028358161637.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts. Running as SYSTEM Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2038/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10 Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk d3a6f11b0464ef8d4e1ccbffd2d8900bbc8309d0 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins4894229412186291544.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts. Running as SYSTEM Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2039/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10 Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins7351562382516827366.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
Rerun tests |
Click to view CI ResultsGitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts. Running as SYSTEM Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2040/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10 Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins5011005984566441076.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected |
* add Dataset.shuffle_by_keys * support npartitions * adding partition_on option to to_parquet * fix _metadata creation for partitioned data * expand testing and fix bug * avoid shuffle when we dont need it
Closes #642
Addresses global shuffle component of #641
Thu purpose of this PR is to improve handling of hive-partitioned parquet data in NVTabular. Since the Dataset API already uses
dask.dataframe.read_parquet
, there is currenlty no "correcness" issue with reading hive-partitioned data. However, (1) there is no convenient mechanism to write hive-partitioned data, and (2) the read stage typically results in many small partitions (rather than a single partition for each input directory).Dataset.to_parquet
method now supports apartition_on=
argument. This is designed to match the same option indask.dataframe
/dask_cudf
. If the user passes a list of 1+ columns with this argument, the output data will be shuffled at IO time into a distinct directory for each unique combination of thosepartition_on
column values. When multiple columns are use for partitioning (e.g.["month", "day"]
), the directory structure is nested (so that the full path for an output file will look something like"/month=Mar/day=30/part.0.parquet"
).Dataset.partition_by_keys
method to perform a global shuffle on the specified column group (keys
) and return a new (shuffled) Dataset. For general Dataset objects, this method will simply callddf.shuffle()
under the hood. For Dataset objects that are backed by hive-partitioned data, however, we use the metadata stored in the file paths to avoid a full shuffle. In the future, this optimization can be pushed even further by directly agregating all IO tasks within the same hive-partition. However, I suspect that shuch an optimization should be implemented indask.dataframe
.Example Usage
This will produce a directory structure like:
Then, you can read the data back in with NVT, and ensure that the ddf partitions are shuffled by
keys
: