Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get embedding sizes refactor #1127

Merged
merged 11 commits into from
Sep 21, 2021

Conversation

jperez999
Copy link
Contributor

No description provided.

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1127 of commit e2f9d5700e3e5811df6cdef789f5032b5d9aa961, no merge conflicts.
Running as SYSTEM
Setting status of e2f9d5700e3e5811df6cdef789f5032b5d9aa961 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3489/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1127/*:refs/remotes/origin/pr/1127/* # timeout=10
 > git rev-parse e2f9d5700e3e5811df6cdef789f5032b5d9aa961^{commit} # timeout=10
Checking out Revision e2f9d5700e3e5811df6cdef789f5032b5d9aa961 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e2f9d5700e3e5811df6cdef789f5032b5d9aa961 # timeout=10
Commit message: "get embedding sizes now working"
 > git rev-list --no-walk 5333ebff2ed0a69be248f36577b2257ec2255c1b # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3137795139931049970.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1)
Terminated
Build was aborted
Aborted by �[8mha:////4I6AZwo/1Z8Fal8AhZTEatjIwqNwCcqT21311HdysuK+AAAAlx+LCAAAAAAAAP9b85aBtbiIQTGjNKU4P08vOT+vOD8nVc83PyU1x6OyILUoJzMv2y+/JJUBAhiZGBgqihhk0NSjKDWzXb3RdlLBUSYGJk8GtpzUvPSSDB8G5tKinBIGIZ+sxLJE/ZzEvHT94JKizLx0a6BxUmjGOUNodHsLgAzWEgZu/dLi1CL9xJTczDwAj6GcLcAAAAA=�[0madmin
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8067199914536865628.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1127 of commit df397656eb01c05c19e12dabe8ff2f7b15aa3488, no merge conflicts.
Running as SYSTEM
Setting status of df397656eb01c05c19e12dabe8ff2f7b15aa3488 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3501/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1127/*:refs/remotes/origin/pr/1127/* # timeout=10
 > git rev-parse df397656eb01c05c19e12dabe8ff2f7b15aa3488^{commit} # timeout=10
Checking out Revision df397656eb01c05c19e12dabe8ff2f7b15aa3488 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f df397656eb01c05c19e12dabe8ff2f7b15aa3488 # timeout=10
Commit message: "Merge branch 'main' into get-embedding-sizes-fix"
 > git rev-list --no-walk bbf74327e67177bdb82fea187ba7aae8193b40d3 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins554783608279010990.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1)
running develop
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cpp
creating build/temp.linux-x86_64-3.8/cpp/nvtabular
creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+73.gdf39765 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+73.gdf39765 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+73.gdf39765 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+73.gdf39765 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
creating build/lib.linux-x86_64-3.8
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 0.6.0+73.gdf39765 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Processing dependencies for nvtabular==0.6.0+73.gdf39765
Searching for protobuf==3.17.3
Best match: protobuf 3.17.3
Adding protobuf 3.17.3 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tensorflow-metadata==1.2.0
Best match: tensorflow-metadata 1.2.0
Processing tensorflow_metadata-1.2.0-py3.8.egg
tensorflow-metadata 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tensorflow_metadata-1.2.0-py3.8.egg
Searching for pyarrow==4.0.1
Best match: pyarrow 4.0.1
Adding pyarrow 4.0.1 to easy-install.pth file
Installing plasma_store script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tqdm==4.61.2
Best match: tqdm 4.61.2
Processing tqdm-4.61.2-py3.8.egg
tqdm 4.61.2 is already the active version in easy-install.pth
Installing tqdm script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tqdm-4.61.2-py3.8.egg
Searching for numba==0.54.0
Best match: numba 0.54.0
Processing numba-0.54.0-py3.8-linux-x86_64.egg
numba 0.54.0 is already the active version in easy-install.pth
Installing pycc script to /var/jenkins_home/.local/bin
Installing numba script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg
Searching for pandas==1.2.5
Best match: pandas 1.2.5
Processing pandas-1.2.5-py3.8-linux-x86_64.egg
pandas 1.2.5 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg
Searching for distributed==2021.4.1
Best match: distributed 2021.4.1
Processing distributed-2021.4.1-py3.8.egg
distributed 2021.4.1 is already the active version in easy-install.pth
Installing dask-ssh script to /var/jenkins_home/.local/bin
Installing dask-scheduler script to /var/jenkins_home/.local/bin
Installing dask-worker script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/distributed-2021.4.1-py3.8.egg
Searching for dask==2021.4.1
Best match: dask 2021.4.1
Processing dask-2021.4.1-py3.8.egg
dask 2021.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg
Searching for PyYAML==5.4.1
Best match: PyYAML 5.4.1
Processing PyYAML-5.4.1-py3.8-linux-x86_64.egg
PyYAML 5.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for googleapis-common-protos==1.53.0
Best match: googleapis-common-protos 1.53.0
Processing googleapis_common_protos-1.53.0-py3.8.egg
googleapis-common-protos 1.53.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/googleapis_common_protos-1.53.0-py3.8.egg
Searching for absl-py==0.12.0
Best match: absl-py 0.12.0
Processing absl_py-0.12.0-py3.8.egg
absl-py 0.12.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg
Searching for numpy==1.20.2
Best match: numpy 1.20.2
Adding numpy 1.20.2 to easy-install.pth file
Installing f2py script to /var/jenkins_home/.local/bin
Installing f2py3 script to /var/jenkins_home/.local/bin
Installing f2py3.8 script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for setuptools==58.0.4
Best match: setuptools 58.0.4
Adding setuptools 58.0.4 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for llvmlite==0.37.0
Best match: llvmlite 0.37.0
Processing llvmlite-0.37.0-py3.8-linux-x86_64.egg
llvmlite 0.37.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/llvmlite-0.37.0-py3.8-linux-x86_64.egg
Searching for pytz==2021.1
Best match: pytz 2021.1
Adding pytz 2021.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for zict==2.0.0
Best match: zict 2.0.0
Processing zict-2.0.0-py3.8.egg
zict 2.0.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg
Searching for tornado==6.1
Best match: tornado 6.1
Processing tornado-6.1-py3.8-linux-x86_64.egg
tornado 6.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg
Searching for toolz==0.11.1
Best match: toolz 0.11.1
Processing toolz-0.11.1-py3.8.egg
toolz 0.11.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/toolz-0.11.1-py3.8.egg
Searching for tblib==1.7.0
Best match: tblib 1.7.0
Processing tblib-1.7.0-py3.8.egg
tblib 1.7.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg
Searching for sortedcontainers==2.4.0
Best match: sortedcontainers 2.4.0
Processing sortedcontainers-2.4.0-py3.8.egg
sortedcontainers 2.4.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg
Searching for psutil==5.8.0
Best match: psutil 5.8.0
Processing psutil-5.8.0-py3.8-linux-x86_64.egg
psutil 5.8.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg
Searching for msgpack==1.0.2
Best match: msgpack 1.0.2
Processing msgpack-1.0.2-py3.8-linux-x86_64.egg
msgpack 1.0.2 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/msgpack-1.0.2-py3.8-linux-x86_64.egg
Searching for cloudpickle==1.6.0
Best match: cloudpickle 1.6.0
Processing cloudpickle-1.6.0-py3.8.egg
cloudpickle 1.6.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/cloudpickle-1.6.0-py3.8.egg
Searching for click==8.0.1
Best match: click 8.0.1
Processing click-8.0.1-py3.8.egg
click 8.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/click-8.0.1-py3.8.egg
Searching for partd==1.2.0
Best match: partd 1.2.0
Processing partd-1.2.0-py3.8.egg
partd 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg
Searching for fsspec==2021.8.1
Best match: fsspec 2021.8.1
Processing fsspec-2021.8.1-py3.8.egg
fsspec 2021.8.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/fsspec-2021.8.1-py3.8.egg
Searching for HeapDict==1.0.1
Best match: HeapDict 1.0.1
Processing HeapDict-1.0.1-py3.8.egg
HeapDict 1.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg
Searching for locket==0.2.1
Best match: locket 0.2.1
Processing locket-0.2.1-py3.8.egg
locket 0.2.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg
Finished processing dependencies for nvtabular==0.6.0+73.gdf39765
Running black --check
All done! ✨ 🍰 ✨
128 files would be left unchanged.
Running flake8
Running isort
Skipped 2 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:504:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:67:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1526 items / 1 skipped / 1525 selected

tests/unit/test_dask_nvt.py ............................................ [ 2%]
..................................................................... [ 7%]
tests/unit/test_io.py .................................................. [ 10%]
........................................................................ [ 15%]
..........ssssssss.....................................................s [ 20%]
s [ 20%]
tests/unit/test_notebooks.py ...F.. [ 20%]
tests/unit/test_tf4rec.py . [ 20%]
tests/unit/test_tools.py ...................... [ 22%]
tests/unit/test_triton_inference.py .............................. [ 24%]
tests/unit/columns/test_column_schemas.py .............................. [ 26%]
................................................... [ 29%]
tests/unit/columns/test_column_selector.py .................... [ 30%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 30%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 32%]
................................................... [ 35%]
tests/unit/framework_utils/test_torch_layers.py . [ 35%]
tests/unit/loader/test_dataloader_backend.py .. [ 36%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 38%]
........................................s.. [ 40%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 42%]
...................................................FF.. [ 46%]
tests/unit/ops/test_column_similarity.py ........................ [ 48%]
tests/unit/ops/test_ops.py ............................................. [ 50%]
........................................................................ [ 55%]
........................................................................ [ 60%]
........................................................................ [ 65%]
........................................................................ [ 69%]
........................................................................ [ 74%]
............................................. [ 77%]
tests/unit/ops/test_ops_schema.py ................................FFFF.. [ 80%]
..........................................................FFFF.......... [ 84%]
..................................................FFFF.................. [ 89%]
.......................... [ 91%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 91%]
tests/unit/workflow/test_workflow.py ................................... [ 93%]
.......................................................... [ 97%]
tests/unit/workflow/test_workflow_node.py ........... [ 98%]
tests/unit/workflow/test_workflow_ops.py .. [ 98%]
tests/unit/workflow/test_workflow_schemas.py ....................... [100%]

=================================== FAILURES ===================================
____________________________ test_movielens_example ____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-28/test_movielens_example0')

def test_movielens_example(tmpdir):
    _get_random_movielens_data(tmpdir, 10000, dataset="movie")
    _get_random_movielens_data(tmpdir, 10000, dataset="ratings")
    _get_random_movielens_data(tmpdir, 5000, dataset="ratings", valid=True)

    triton_model_path = os.path.join(tmpdir, "models")
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)
    os.environ["MODEL_PATH"] = triton_model_path

    notebook_path = os.path.join(
        dirname(TEST_PATH),
        "examples/getting-started-movielens/",
        "02-ETL-with-NVTabular.ipynb",
    )
    _run_notebook(tmpdir, notebook_path)

    def _modify_tf_nb(line):
        return line.replace(
            # don't require graphviz/pydot
            "tf.keras.utils.plot_model(model)",
            "# tf.keras.utils.plot_model(model)",
        )

    def _modify_tf_triton(line):
        # models are already preloaded
        line = line.replace("triton_client.load_model", "# triton_client.load_model")
        line = line.replace("triton_client.unload_model", "# triton_client.unload_model")
        return line

    notebooks = []
    try:
        import torch  # noqa

        notebooks.append("03-Training-with-PyTorch.ipynb")
    except Exception:
        pass
    try:
        import nvtabular.inference.triton  # noqa
        import nvtabular.loader.tensorflow  # noqa

        notebooks.append("03-Training-with-TF.ipynb")
        has_tf = True

    except Exception:
        has_tf = False

    for notebook in notebooks:
        notebook_path = os.path.join(
            dirname(TEST_PATH),
            "examples/getting-started-movielens/",
            notebook,
        )
        if notebook == "03-Training-with-TF.ipynb":
            _run_notebook(tmpdir, notebook_path, transform=_modify_tf_nb)
        else:
          _run_notebook(tmpdir, notebook_path)

tests/unit/test_notebooks.py:211:


tests/unit/test_notebooks.py:305: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python', '/tmp/pytest-of-jenkins/pytest-28/test_movielens_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fa176467c10>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python', '/tmp/pytest-of-jenkins/pytest-28/test_movielens_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-28/test_movielens_example0/notebook.py", line 60, in
EMBEDDING_TABLE_SHAPES, MH_EMBEDDING_TABLE_SHAPES = nvt.ops.get_embedding_sizes(proc)
ValueError: too many values to unpack (expected 2)
____________________________ test_mh_model_support _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-28/test_mh_model_support0')

def test_mh_model_support(tmpdir):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Reviewers": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Null_User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
            "Cont1": [0.3, 0.4, 0.5, 0.6],
            "Cont2": [0.3, 0.4, 0.5, 0.6],
            "Cat1": ["A", "B", "A", "C"],
        }
    )
    cat_names = ["Cat1", "Null_User", "Authors", "Reviewers"]  # , "Engaging User"]
    cont_names = ["Cont1", "Cont2"]
    label_name = ["Post"]
    out_path = os.path.join(tmpdir, "train/")
    os.mkdir(out_path)

    cats = cat_names >> ops.Categorify()
    conts = cont_names >> ops.Normalize()

    processor = nvt.Workflow(cats + conts + label_name)
    df_out = processor.fit_transform(nvt.Dataset(df)).to_ddf().compute()
    data_itr = torch_dataloader.TorchAsyncItr(
        nvt.Dataset(df_out),
        cats=cat_names,
        conts=cont_names,
        labels=label_name,
        batch_size=2,
    )
    emb_sizes = nvt.ops.get_embedding_sizes(processor)
    # check  for correct  embedding representation
  assert len(emb_sizes[1].keys()) == 2  # Authors, Reviewers

E KeyError: 1

tests/unit/loader/test_torch_dataloader.py:547: KeyError
____________________________ test_horovod_multigpu _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-28/test_horovod_multigpu0')

@pytest.mark.skipif(importlib.util.find_spec("horovod") is None, reason="needs horovod")
@pytest.mark.skipif(
    cupy.cuda.runtime.getDeviceCount() <= 1, reason="This unittest requires multiple gpu's to run"
)
def test_horovod_multigpu(tmpdir):

    json_sample = {
        "conts": {},
        "cats": {
            "genres": {
                "dtype": None,
                "cardinality": 50,
                "min_entry_size": 1,
                "max_entry_size": 5,
                "multi_min": 2,
                "multi_max": 4,
                "multi_avg": 3,
            },
            "movieId": {
                "dtype": None,
                "cardinality": 500,
                "min_entry_size": 1,
                "max_entry_size": 5,
            },
            "userId": {"dtype": None, "cardinality": 500, "min_entry_size": 1, "max_entry_size": 5},
        },
        "labels": {"rating": {"dtype": None, "cardinality": 2}},
    }
    cols = datagen._get_cols_from_schema(json_sample)
    df_gen = datagen.DatasetGen(datagen.UniformDistro(), gpu_frac=0.0001)

    target_path = os.path.join(tmpdir, "input/")
    os.mkdir(target_path)
    df_files = df_gen.full_df_create(10000, cols, output=target_path)

    # process them
    cat_features = ColumnSelector(["userId", "movieId", "genres"]) >> nvt.ops.Categorify()
    ratings = ColumnSelector(["rating"]) >> (lambda col: (col > 3).astype("int8"))
    output = cat_features + ratings

    proc = nvt.Workflow(output)
    train_iter = nvt.Dataset(df_files, part_size="10MB")
    proc.fit(train_iter)

    target_path_train = os.path.join(tmpdir, "train/")
    os.mkdir(target_path_train)

    proc.transform(train_iter).to_parquet(output_path=target_path_train, out_files_per_proc=5)

    # add new location
    target_path = os.path.join(tmpdir, "workflow/")
    os.mkdir(target_path)
    proc.save(target_path)

    curr_path = os.path.abspath(__file__)
    repo_root = os.path.relpath(os.path.normpath(os.path.join(curr_path, "../../../..")))
    hvd_example_path = os.path.join(repo_root, "examples/multi-gpu-movielens/torch_trainer.py")

    with subprocess.Popen(
        [
            "horovodrun",
            "-np",
            "2",
            "-H",
            "localhost:2",
            "python",
            hvd_example_path,
            "--dir_in",
            f"{tmpdir}",
            "--batch_size",
            "1024",
        ],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    ) as process:
        process.wait()
        stdout, stderr = process.communicate()
        print(str(stdout))
        print(str(stderr))
      assert "Training complete" in str(stdout)

E assert 'Training complete' in "b''"
E + where "b''" = str(b'')

tests/unit/loader/test_torch_dataloader.py:663: AssertionError
----------------------------- Captured stdout call -----------------------------
b''
b'[1,1]:Traceback (most recent call last):\n[1,1]: File "./examples/multi-gpu-movielens/torch_trainer.py", line 47, in \n[1,1]: EMBEDDING_TABLE_SHAPES, MH_EMBEDDING_TABLE_SHAPES = nvt.ops.get_embedding_sizes(proc)\n[1,1]:ValueError: too many values to unpack (expected 2)\n[1,0]:Traceback (most recent call last):\n[1,0]: File "./examples/multi-gpu-movielens/torch_trainer.py", line 47, in \n[1,0]: EMBEDDING_TABLE_SHAPES, MH_EMBEDDING_TABLE_SHAPES = nvt.ops.get_embedding_sizes(proc)\n[1,0]:ValueError: too many values to unpack (expected 2)\n--------------------------------------------------------------------------\nPrimary job terminated normally, but 1 process returned\na non-zero exit code. Per user-direction, the job has been aborted.\n--------------------------------------------------------------------------\n--------------------------------------------------------------------------\nmpirun detected that one or more processes exited with non-zero status, thus causing\nthe job to be terminated. The first process to do so was:\n\n Process name: [[35548,1],0]\n Exit code: 1\n--------------------------------------------------------------------------\n'
______________ test_schema_out[selection0-op8-tags0-properties0] _______________

tags = [], properties = {}, selection = ['1']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...mension': 16}} == {}
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection0-op8-tags0-properties1] _______________

tags = [], properties = {'p1': '1'}, selection = ['1']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...6}, 'p1': '1'} == {'p1': '1'}
E Omitting 1 identical items, use -vv to show
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection0-op8-tags1-properties0] _______________

tags = ['TAG1', 'TAG2'], properties = {}, selection = ['1']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...mension': 16}} == {}
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection0-op8-tags1-properties1] _______________

tags = ['TAG1', 'TAG2'], properties = {'p1': '1'}, selection = ['1']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...6}, 'p1': '1'} == {'p1': '1'}
E Omitting 1 identical items, use -vv to show
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection1-op8-tags0-properties0] _______________

tags = [], properties = {}, selection = ['2', '3']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...mension': 16}} == {}
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection1-op8-tags0-properties1] _______________

tags = [], properties = {'p1': '1'}, selection = ['2', '3']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...6}, 'p1': '1'} == {'p1': '1'}
E Omitting 1 identical items, use -vv to show
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection1-op8-tags1-properties0] _______________

tags = ['TAG1', 'TAG2'], properties = {}, selection = ['2', '3']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...mension': 16}} == {}
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection1-op8-tags1-properties1] _______________

tags = ['TAG1', 'TAG2'], properties = {'p1': '1'}, selection = ['2', '3']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...6}, 'p1': '1'} == {'p1': '1'}
E Omitting 1 identical items, use -vv to show
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection2-op8-tags0-properties0] _______________

tags = [], properties = {}, selection = ['1', '2', '3', '4']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...mension': 16}} == {}
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection2-op8-tags0-properties1] _______________

tags = [], properties = {'p1': '1'}, selection = ['1', '2', '3', '4']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...6}, 'p1': '1'} == {'p1': '1'}
E Omitting 1 identical items, use -vv to show
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection2-op8-tags1-properties0] _______________

tags = ['TAG1', 'TAG2'], properties = {}, selection = ['1', '2', '3', '4']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...mension': 16}} == {}
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
______________ test_schema_out[selection2-op8-tags1-properties1] _______________

tags = ['TAG1', 'TAG2'], properties = {'p1': '1'}
selection = ['1', '2', '3', '4']
op = <nvtabular.ops.hash_bucket.HashBucket object at 0x7fa08c903610>

@pytest.mark.parametrize("properties", [{}, {"p1": "1"}])
@pytest.mark.parametrize("tags", [[], ["TAG1", "TAG2"]])
@pytest.mark.parametrize(
    "op",
    [
        ops.Bucketize([1]),
        ops.Rename(postfix="_trim"),
        ops.Categorify(),
        ops.Categorify(encode_type="combo"),
        ops.Clip(0),
        ops.DifferenceLag("1"),
        ops.FillMissing(),
        ops.Groupby(["1"]),
        ops.HashBucket(1),
        ops.HashedCross(1),
        ops.JoinGroupby(["1"]),
        ops.ListSlice(0),
        ops.LogOp(),
        ops.Normalize(),
        ops.TargetEncoding(["1"]),
        ops.AddMetadata(tags=["excellent"], properties={"domain": {"min": 0, "max": 20}}),
    ],
)
@pytest.mark.parametrize("selection", [["1"], ["2", "3"], ["1", "2", "3", "4"]])
def test_schema_out(tags, properties, selection, op):
    # Create columnSchemas
    column_schemas = []
    all_cols = []
    for x in range(5):
        all_cols.append(str(x))
        column_schemas.append(ColumnSchema(str(x), tags=tags, properties=properties))

    # Turn to Schema
    schema = Schema(column_schemas)

    # run schema through op
    selector = ColumnSelector(selection)
    new_schema = op.compute_output_schema(schema, selector)

    # should have dtype float
    for col_name in selector.names:
        names_group = [name for name in new_schema.column_schemas if col_name in name]
        if names_group:
            for name in names_group:
                schema1 = new_schema.column_schemas[name]

                # should not be exactly the same name, having gone through operator
                assert schema1.dtype == op.output_dtype()
                if name in selector.names:
                  assert schema1.properties == properties

E AssertionError: assert {'domain': {'...6}, 'p1': '1'} == {'p1': '1'}
E Omitting 1 identical items, use -vv to show
E Left contains 2 more items:
E {'domain': {'max': 1, 'min': 0},
E 'embedding_sizes': {'cardinality': 1, 'dimension': 16}}
E Use -v to get the full diff

tests/unit/ops/test_ops_schema.py:57: AssertionError
=============================== warnings summary ===============================
tests/unit/test_dask_nvt.py: 3 warnings
tests/unit/test_io.py: 24 warnings
tests/unit/test_tf4rec.py: 2 warnings
tests/unit/test_tools.py: 2 warnings
tests/unit/test_triton_inference.py: 5 warnings
tests/unit/loader/test_tf_dataloader.py: 50 warnings
tests/unit/loader/test_torch_dataloader.py: 16 warnings
tests/unit/ops/test_column_similarity.py: 7 warnings
tests/unit/ops/test_ops.py: 74 warnings
tests/unit/workflow/test_workflow.py: 31 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))

tests/unit/test_io.py::test_validate_dataset_bad_schema
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:1105: UserWarning: Unable to sample column dtypes to infer nvt.Dataset schema, schema is empty.
warnings.warn(

tests/unit/test_io.py: 96 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/init.py:38: DeprecationWarning: ColumnGroup is deprecated, use ColumnSelector instead
warnings.warn("ColumnGroup is deprecated, use ColumnSelector instead", DeprecationWarning)

tests/unit/test_io.py: 24 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/node.py:47: FutureWarning: The ["a", "b", "c"] >> ops.Operator syntax for creating a ColumnGroup has been deprecated in NVTabular 21.09 and will be removed in a future version.
warnings.warn(

tests/unit/test_io.py: 36 warnings
tests/unit/workflow/test_workflow.py: 44 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py:89: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for execution. Please use the client argument to initialize a Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 52 warnings
tests/unit/workflow/test_workflow.py: 35 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dask.py:372: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for this write operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 36 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:511: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler is being used for this shuffle operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.1]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[f"{col}_filled"] = df[col].isna()

tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.1]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:126: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = df[col].fillna(self.medians[col])

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/indexing.py:1637: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:54: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[f"{col}_filled"] = df[col].isna()

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:55: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = df[col].fillna(self.fill_val)

tests/unit/ops/test_ops.py: 96 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:190: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[tmp] = _arange(len(df), like_df=df, dtype="int32")

tests/unit/ops/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/ops/test_ops.py::test_groupby_op[id-True]
tests/unit/ops/test_ops.py::test_groupby_op[id-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

tests/unit/workflow/test_cpu_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 33 6 1 46% 32->36, 48-145
nvtabular/init.py 18 0 0 0 100%
nvtabular/columns/init.py 2 0 0 0 100%
nvtabular/columns/schema.py 209 17 103 20 88% 46->62, 49, 51, 53-56, 58, 98->109, 104, 147, 174, 260->267, 262, 263->265, 275, 292->297, 295->297, 308, 332, 339, 348, 351, 356->355
nvtabular/columns/selector.py 74 1 34 0 99% 121
nvtabular/dispatch.py 273 55 132 22 78% 36-40, 45-47, 53-63, 70-71, 99-101, 106-109, 113-118, 125, 144, 155, 161, 166->168, 179, 202-205, 244, 247, 253, 269, 276, 307->312, 310, 313, 316->320, 353, 364-367, 394-397, 427, 431, 472, 496, 498, 505
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 78 90 15 39% 30, 99, 103, 114-130, 140, 143-158, 162, 166-167, 173-198, 207-217, 220-227, 229->233, 234, 239-279, 282
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 32 15 14 1 52% 50, 74-82, 85-95
nvtabular/framework_utils/torch/models.py 45 6 28 10 75% 56, 57->61, 62, 67, 87->89, 90-91, 93->96, 96->100, 103, 107->109
nvtabular/framework_utils/torch/utils.py 75 10 30 9 82% 51->53, 53->55, 64, 70, 71->76, 75, 109, 118-120, 129-131
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 385 215 180 11 44% 82-86, 141-174, 195-218, 263-307, 338, 364-372, 380-387, 406, 428-444, 485-489, 527-537, 583-623, 629-645, 649-716, 723->726, 726->722, 762-772, 781, 791, 812, 818-844, 850-876, 883, 888-894
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 176 176 98 0 0% 27-332
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/inference/triton/model_pt.py 101 101 40 0 0% 27-220
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 183 18 72 11 87% 111, 114, 150, 235-246, 398, 408, 425->428, 436, 440->442, 442->438, 447, 449
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 353 76 166 28 75% 46-47, 257, 259, 272, 281, 299-313, 436->510, 441-444, 450-457, 462-506, 510->519, 570-571, 572->576, 619, 741, 743, 745, 751, 755-757, 759, 819-820, 847, 854-855, 861, 867, 963-964, 1081-1086, 1092, 1171, 1180
nvtabular/io/dataset_engine.py 24 1 0 0 96% 48
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 551 45 180 26 89% 34-35, 57, 76, 80->92, 89, 112, 122->127, 140, 142, 166->170, 173-179, 225-233, 248, 254, 272->274, 287, 306-316, 457-462, 500-505, 621->628, 689->694, 695-696, 816, 820, 824, 830, 862, 879, 883, 890->892, 1000->exit, 1010->1015, 1020->1030, 1035, 1057, 1080-1081
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 175 13 68 5 92% 24-25, 51, 79, 125, 128, 212, 221, 224, 267, 288-290
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 330 15 140 13 94% 111, 128, 143-144, 242->244, 254-258, 304-305, 344->348, 345->344, 419, 423-424, 454, 534, 559, 567
nvtabular/loader/tensorflow.py 163 22 52 7 86% 58, 66-69, 84, 98, 308, 344, 359-361, 390-392, 402-410, 413-416
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 22 0 0 0 100%
nvtabular/ops/add_metadata.py 9 0 0 0 100%
nvtabular/ops/bucketize.py 37 10 18 3 69% 53-55, 59->exit, 62-65, 84-87, 94
nvtabular/ops/categorify.py 624 74 332 48 85% 245, 247, 264, 268, 276, 284, 286, 313, 332-333, 357, 366, 377->381, 385-392, 474-475, 499-504, 591, 603-605, 622, 715, 733, 769, 847-848, 863-867, 868->832, 886, 894, 901->exit, 925, 928->931, 983, 988, 1004->1008, 1015-1018, 1029, 1033, 1035, 1042, 1047-1050, 1128, 1130, 1200->1223, 1206->1223, 1224-1229, 1266, 1285->1290, 1289, 1299->1296, 1304->1296, 1311, 1314, 1322-1332
nvtabular/ops/clip.py 18 2 6 3 79% 44, 52->54, 55
nvtabular/ops/column_similarity.py 118 25 38 5 74% 19-20, 78->exit, 108, 134, 198-199, 208-210, 218-234, 251->254, 255, 265
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 31 1 8 1 95% 69->71, 94
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 91 12 36 3 82% 63-67, 93, 121, 147, 151, 162-165
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 119 3 70 4 96% 73, 84, 94->96, 106->111, 141
nvtabular/ops/hash_bucket.py 41 2 20 2 93% 72, 106->112, 118
nvtabular/ops/hashed_cross.py 36 4 15 3 86% 53, 66, 81, 91
nvtabular/ops/internal/init.py 3 0 0 0 100%
nvtabular/ops/internal/concat_columns.py 11 0 0 0 100%
nvtabular/ops/internal/identity.py 6 1 0 0 83% 42
nvtabular/ops/internal/subset_columns.py 13 1 0 0 92% 53
nvtabular/ops/join_external.py 89 7 36 6 90% 20-21, 113, 115, 117, 159, 176->178, 215
nvtabular/ops/join_groupby.py 101 7 36 4 92% 108, 115, 124, 131->130, 215-216, 219-220
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 66 24 26 1 58% 21-22, 53-54, 104-118, 126-137
nvtabular/ops/logop.py 13 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 81 10 14 1 86% 70, 78-79, 85, 118-119, 141-142, 146, 157
nvtabular/ops/operator.py 66 3 14 1 95% 111, 189, 196
nvtabular/ops/rename.py 41 3 22 3 90% 47, 88-90
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 153 11 66 4 91% 167->171, 175->184, 232-233, 236-237, 249-255, 346->349, 362
nvtabular/tags.py 16 0 0 0 100%
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 321
nvtabular/tools/dataset_inspector.py 50 7 18 1 79% 32-39
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 102 43 46 8 52% 31-32, 36-37, 50, 61-62, 64-66, 69, 72, 78, 84, 90-126, 145, 149->153
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow/init.py 2 0 0 0 100%
nvtabular/workflow/node.py 240 18 116 10 89% 55, 93->98, 146, 248->252, 288, 302, 311, 329-334, 339, 388-389, 400->395, 453-458
nvtabular/workflow/workflow.py 221 15 112 7 93% 28-29, 47, 139, 195, 222-224, 332, 347-348, 366-367, 502, 514

TOTAL 7521 1547 3025 353 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.78%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': No module named 's3fs'
SKIPPED [8] tests/unit/test_io.py:555: could not import 'uavro': No module named 'uavro'
SKIPPED [2] tests/unit/test_io.py:914: Dask>=2021.07.1 required for file aggregation
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:521: not working correctly in ci environment
==== 15 failed, 1500 passed, 12 skipped, 794 warnings in 2067.71s (0:34:27) ====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins3787678591802848262.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1127 of commit cc1fe6709c19c9ccca1df772fba08dde2972ab63, no merge conflicts.
Running as SYSTEM
Setting status of cc1fe6709c19c9ccca1df772fba08dde2972ab63 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3503/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1127/*:refs/remotes/origin/pr/1127/* # timeout=10
 > git rev-parse cc1fe6709c19c9ccca1df772fba08dde2972ab63^{commit} # timeout=10
Checking out Revision cc1fe6709c19c9ccca1df772fba08dde2972ab63 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f cc1fe6709c19c9ccca1df772fba08dde2972ab63 # timeout=10
Commit message: "Merge branch 'get-embedding-sizes-fix' of https://github.com/jperez999/NVTabular into get-embedding-sizes-fix"
 > git rev-list --no-walk d1dd81f4e577dede3376d36c3bcea9de2919a943 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins7330693510522698979.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1)
running develop
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cpp
creating build/temp.linux-x86_64-3.8/cpp/nvtabular
creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+75.gcc1fe67 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+75.gcc1fe67 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+75.gcc1fe67 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+75.gcc1fe67 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
creating build/lib.linux-x86_64-3.8
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 0.6.0+75.gcc1fe67 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Processing dependencies for nvtabular==0.6.0+75.gcc1fe67
Searching for protobuf==3.17.3
Best match: protobuf 3.17.3
Adding protobuf 3.17.3 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tensorflow-metadata==1.2.0
Best match: tensorflow-metadata 1.2.0
Processing tensorflow_metadata-1.2.0-py3.8.egg
tensorflow-metadata 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tensorflow_metadata-1.2.0-py3.8.egg
Searching for pyarrow==4.0.1
Best match: pyarrow 4.0.1
Adding pyarrow 4.0.1 to easy-install.pth file
Installing plasma_store script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tqdm==4.61.2
Best match: tqdm 4.61.2
Processing tqdm-4.61.2-py3.8.egg
tqdm 4.61.2 is already the active version in easy-install.pth
Installing tqdm script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tqdm-4.61.2-py3.8.egg
Searching for numba==0.54.0
Best match: numba 0.54.0
Processing numba-0.54.0-py3.8-linux-x86_64.egg
numba 0.54.0 is already the active version in easy-install.pth
Installing pycc script to /var/jenkins_home/.local/bin
Installing numba script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg
Searching for pandas==1.2.5
Best match: pandas 1.2.5
Processing pandas-1.2.5-py3.8-linux-x86_64.egg
pandas 1.2.5 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg
Searching for distributed==2021.4.1
Best match: distributed 2021.4.1
Processing distributed-2021.4.1-py3.8.egg
distributed 2021.4.1 is already the active version in easy-install.pth
Installing dask-ssh script to /var/jenkins_home/.local/bin
Installing dask-scheduler script to /var/jenkins_home/.local/bin
Installing dask-worker script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/distributed-2021.4.1-py3.8.egg
Searching for dask==2021.4.1
Best match: dask 2021.4.1
Processing dask-2021.4.1-py3.8.egg
dask 2021.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg
Searching for PyYAML==5.4.1
Best match: PyYAML 5.4.1
Processing PyYAML-5.4.1-py3.8-linux-x86_64.egg
PyYAML 5.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for googleapis-common-protos==1.53.0
Best match: googleapis-common-protos 1.53.0
Processing googleapis_common_protos-1.53.0-py3.8.egg
googleapis-common-protos 1.53.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/googleapis_common_protos-1.53.0-py3.8.egg
Searching for absl-py==0.12.0
Best match: absl-py 0.12.0
Processing absl_py-0.12.0-py3.8.egg
absl-py 0.12.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg
Searching for numpy==1.20.2
Best match: numpy 1.20.2
Adding numpy 1.20.2 to easy-install.pth file
Installing f2py script to /var/jenkins_home/.local/bin
Installing f2py3 script to /var/jenkins_home/.local/bin
Installing f2py3.8 script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for setuptools==58.0.4
Best match: setuptools 58.0.4
Adding setuptools 58.0.4 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for llvmlite==0.37.0
Best match: llvmlite 0.37.0
Processing llvmlite-0.37.0-py3.8-linux-x86_64.egg
llvmlite 0.37.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/llvmlite-0.37.0-py3.8-linux-x86_64.egg
Searching for pytz==2021.1
Best match: pytz 2021.1
Adding pytz 2021.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for zict==2.0.0
Best match: zict 2.0.0
Processing zict-2.0.0-py3.8.egg
zict 2.0.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg
Searching for tornado==6.1
Best match: tornado 6.1
Processing tornado-6.1-py3.8-linux-x86_64.egg
tornado 6.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg
Searching for toolz==0.11.1
Best match: toolz 0.11.1
Processing toolz-0.11.1-py3.8.egg
toolz 0.11.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/toolz-0.11.1-py3.8.egg
Searching for tblib==1.7.0
Best match: tblib 1.7.0
Processing tblib-1.7.0-py3.8.egg
tblib 1.7.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg
Searching for sortedcontainers==2.4.0
Best match: sortedcontainers 2.4.0
Processing sortedcontainers-2.4.0-py3.8.egg
sortedcontainers 2.4.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg
Searching for psutil==5.8.0
Best match: psutil 5.8.0
Processing psutil-5.8.0-py3.8-linux-x86_64.egg
psutil 5.8.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg
Searching for msgpack==1.0.2
Best match: msgpack 1.0.2
Processing msgpack-1.0.2-py3.8-linux-x86_64.egg
msgpack 1.0.2 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/msgpack-1.0.2-py3.8-linux-x86_64.egg
Searching for cloudpickle==1.6.0
Best match: cloudpickle 1.6.0
Processing cloudpickle-1.6.0-py3.8.egg
cloudpickle 1.6.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/cloudpickle-1.6.0-py3.8.egg
Searching for click==8.0.1
Best match: click 8.0.1
Processing click-8.0.1-py3.8.egg
click 8.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/click-8.0.1-py3.8.egg
Searching for partd==1.2.0
Best match: partd 1.2.0
Processing partd-1.2.0-py3.8.egg
partd 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg
Searching for fsspec==2021.8.1
Best match: fsspec 2021.8.1
Processing fsspec-2021.8.1-py3.8.egg
fsspec 2021.8.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/fsspec-2021.8.1-py3.8.egg
Searching for HeapDict==1.0.1
Best match: HeapDict 1.0.1
Processing HeapDict-1.0.1-py3.8.egg
HeapDict 1.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg
Searching for locket==0.2.1
Best match: locket 0.2.1
Processing locket-0.2.1-py3.8.egg
locket 0.2.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg
Finished processing dependencies for nvtabular==0.6.0+75.gcc1fe67
Running black --check
All done! ✨ 🍰 ✨
128 files would be left unchanged.
Running flake8
Running isort
Skipped 2 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:504:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:67:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1526 items / 1 skipped / 1525 selected

tests/unit/test_dask_nvt.py ............................................ [ 2%]
..................................................................... [ 7%]
tests/unit/test_io.py .................................................. [ 10%]
........................................................................ [ 15%]
..........ssssssss.....................................................s [ 20%]
s [ 20%]
tests/unit/test_notebooks.py ...F.. [ 20%]
tests/unit/test_tf4rec.py . [ 20%]
tests/unit/test_tools.py ...................... [ 22%]
tests/unit/test_triton_inference.py .............................. [ 24%]
tests/unit/columns/test_column_schemas.py .............................. [ 26%]
................................................... [ 29%]
tests/unit/columns/test_column_selector.py .................... [ 30%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 30%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 32%]
................................................... [ 35%]
tests/unit/framework_utils/test_torch_layers.py . [ 35%]
tests/unit/loader/test_dataloader_backend.py .. [ 36%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 38%]
........................................s.. [ 40%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 42%]
...................................................FF.. [ 46%]
tests/unit/ops/test_column_similarity.py ........................ [ 48%]
tests/unit/ops/test_ops.py ............................................. [ 50%]
........................................................................ [ 55%]
........................................................................ [ 60%]
........................................................................ [ 65%]
........................................................................ [ 69%]
........................................................................ [ 74%]
............................................. [ 77%]
tests/unit/ops/test_ops_schema.py ...................................... [ 80%]
........................................................................ [ 84%]
........................................................................ [ 89%]
.......................... [ 91%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 91%]
tests/unit/workflow/test_workflow.py ................................... [ 93%]
.......................................................... [ 97%]
tests/unit/workflow/test_workflow_node.py ........... [ 98%]
tests/unit/workflow/test_workflow_ops.py .. [ 98%]
tests/unit/workflow/test_workflow_schemas.py ....................... [100%]

=================================== FAILURES ===================================
____________________________ test_movielens_example ____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-30/test_movielens_example0')

def test_movielens_example(tmpdir):
    _get_random_movielens_data(tmpdir, 10000, dataset="movie")
    _get_random_movielens_data(tmpdir, 10000, dataset="ratings")
    _get_random_movielens_data(tmpdir, 5000, dataset="ratings", valid=True)

    triton_model_path = os.path.join(tmpdir, "models")
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)
    os.environ["MODEL_PATH"] = triton_model_path

    notebook_path = os.path.join(
        dirname(TEST_PATH),
        "examples/getting-started-movielens/",
        "02-ETL-with-NVTabular.ipynb",
    )
    _run_notebook(tmpdir, notebook_path)

    def _modify_tf_nb(line):
        return line.replace(
            # don't require graphviz/pydot
            "tf.keras.utils.plot_model(model)",
            "# tf.keras.utils.plot_model(model)",
        )

    def _modify_tf_triton(line):
        # models are already preloaded
        line = line.replace("triton_client.load_model", "# triton_client.load_model")
        line = line.replace("triton_client.unload_model", "# triton_client.unload_model")
        return line

    notebooks = []
    try:
        import torch  # noqa

        notebooks.append("03-Training-with-PyTorch.ipynb")
    except Exception:
        pass
    try:
        import nvtabular.inference.triton  # noqa
        import nvtabular.loader.tensorflow  # noqa

        notebooks.append("03-Training-with-TF.ipynb")
        has_tf = True

    except Exception:
        has_tf = False

    for notebook in notebooks:
        notebook_path = os.path.join(
            dirname(TEST_PATH),
            "examples/getting-started-movielens/",
            notebook,
        )
        if notebook == "03-Training-with-TF.ipynb":
            _run_notebook(tmpdir, notebook_path, transform=_modify_tf_nb)
        else:
          _run_notebook(tmpdir, notebook_path)

tests/unit/test_notebooks.py:211:


tests/unit/test_notebooks.py:305: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python', '/tmp/pytest-of-jenkins/pytest-30/test_movielens_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f3346fa2d90>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python', '/tmp/pytest-of-jenkins/pytest-30/test_movielens_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-30/test_movielens_example0/notebook.py", line 60, in
EMBEDDING_TABLE_SHAPES, MH_EMBEDDING_TABLE_SHAPES = nvt.ops.get_embedding_sizes(proc)
ValueError: too many values to unpack (expected 2)
____________________________ test_mh_model_support _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-30/test_mh_model_support0')

def test_mh_model_support(tmpdir):
    df = cudf.DataFrame(
        {
            "Authors": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Reviewers": [["User_A"], ["User_A", "User_E"], ["User_B", "User_C"], ["User_C"]],
            "Engaging User": ["User_B", "User_B", "User_A", "User_D"],
            "Null_User": ["User_B", "User_B", "User_A", "User_D"],
            "Post": [1, 2, 3, 4],
            "Cont1": [0.3, 0.4, 0.5, 0.6],
            "Cont2": [0.3, 0.4, 0.5, 0.6],
            "Cat1": ["A", "B", "A", "C"],
        }
    )
    cat_names = ["Cat1", "Null_User", "Authors", "Reviewers"]  # , "Engaging User"]
    cont_names = ["Cont1", "Cont2"]
    label_name = ["Post"]
    out_path = os.path.join(tmpdir, "train/")
    os.mkdir(out_path)

    cats = cat_names >> ops.Categorify()
    conts = cont_names >> ops.Normalize()

    processor = nvt.Workflow(cats + conts + label_name)
    df_out = processor.fit_transform(nvt.Dataset(df)).to_ddf().compute()
    data_itr = torch_dataloader.TorchAsyncItr(
        nvt.Dataset(df_out),
        cats=cat_names,
        conts=cont_names,
        labels=label_name,
        batch_size=2,
    )
    emb_sizes = nvt.ops.get_embedding_sizes(processor)
    # check  for correct  embedding representation
  assert len(emb_sizes[1].keys()) == 2  # Authors, Reviewers

E KeyError: 1

tests/unit/loader/test_torch_dataloader.py:547: KeyError
____________________________ test_horovod_multigpu _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-30/test_horovod_multigpu0')

@pytest.mark.skipif(importlib.util.find_spec("horovod") is None, reason="needs horovod")
@pytest.mark.skipif(
    cupy.cuda.runtime.getDeviceCount() <= 1, reason="This unittest requires multiple gpu's to run"
)
def test_horovod_multigpu(tmpdir):

    json_sample = {
        "conts": {},
        "cats": {
            "genres": {
                "dtype": None,
                "cardinality": 50,
                "min_entry_size": 1,
                "max_entry_size": 5,
                "multi_min": 2,
                "multi_max": 4,
                "multi_avg": 3,
            },
            "movieId": {
                "dtype": None,
                "cardinality": 500,
                "min_entry_size": 1,
                "max_entry_size": 5,
            },
            "userId": {"dtype": None, "cardinality": 500, "min_entry_size": 1, "max_entry_size": 5},
        },
        "labels": {"rating": {"dtype": None, "cardinality": 2}},
    }
    cols = datagen._get_cols_from_schema(json_sample)
    df_gen = datagen.DatasetGen(datagen.UniformDistro(), gpu_frac=0.0001)

    target_path = os.path.join(tmpdir, "input/")
    os.mkdir(target_path)
    df_files = df_gen.full_df_create(10000, cols, output=target_path)

    # process them
    cat_features = ColumnSelector(["userId", "movieId", "genres"]) >> nvt.ops.Categorify()
    ratings = ColumnSelector(["rating"]) >> (lambda col: (col > 3).astype("int8"))
    output = cat_features + ratings

    proc = nvt.Workflow(output)
    train_iter = nvt.Dataset(df_files, part_size="10MB")
    proc.fit(train_iter)

    target_path_train = os.path.join(tmpdir, "train/")
    os.mkdir(target_path_train)

    proc.transform(train_iter).to_parquet(output_path=target_path_train, out_files_per_proc=5)

    # add new location
    target_path = os.path.join(tmpdir, "workflow/")
    os.mkdir(target_path)
    proc.save(target_path)

    curr_path = os.path.abspath(__file__)
    repo_root = os.path.relpath(os.path.normpath(os.path.join(curr_path, "../../../..")))
    hvd_example_path = os.path.join(repo_root, "examples/multi-gpu-movielens/torch_trainer.py")

    with subprocess.Popen(
        [
            "horovodrun",
            "-np",
            "2",
            "-H",
            "localhost:2",
            "python",
            hvd_example_path,
            "--dir_in",
            f"{tmpdir}",
            "--batch_size",
            "1024",
        ],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    ) as process:
        process.wait()
        stdout, stderr = process.communicate()
        print(str(stdout))
        print(str(stderr))
      assert "Training complete" in str(stdout)

E assert 'Training complete' in "b''"
E + where "b''" = str(b'')

tests/unit/loader/test_torch_dataloader.py:663: AssertionError
----------------------------- Captured stdout call -----------------------------
b''
b'[1,0]:Traceback (most recent call last):\n[1,0]: File "./examples/multi-gpu-movielens/torch_trainer.py", line 47, in \n[1,0]: EMBEDDING_TABLE_SHAPES, MH_EMBEDDING_TABLE_SHAPES = nvt.ops.get_embedding_sizes(proc)\n[1,0]:ValueError: too many values to unpack (expected 2)\n[1,1]:Traceback (most recent call last):\n[1,1]: File "./examples/multi-gpu-movielens/torch_trainer.py", line 47, in \n[1,1]: EMBEDDING_TABLE_SHAPES, MH_EMBEDDING_TABLE_SHAPES = nvt.ops.get_embedding_sizes(proc)\n[1,1]:ValueError: too many values to unpack (expected 2)\n--------------------------------------------------------------------------\nPrimary job terminated normally, but 1 process returned\na non-zero exit code. Per user-direction, the job has been aborted.\n--------------------------------------------------------------------------\n--------------------------------------------------------------------------\nmpirun detected that one or more processes exited with non-zero status, thus causing\nthe job to be terminated. The first process to do so was:\n\n Process name: [[59441,1],1]\n Exit code: 1\n--------------------------------------------------------------------------\n'
=============================== warnings summary ===============================
tests/unit/test_dask_nvt.py: 3 warnings
tests/unit/test_io.py: 24 warnings
tests/unit/test_tf4rec.py: 2 warnings
tests/unit/test_tools.py: 2 warnings
tests/unit/test_triton_inference.py: 5 warnings
tests/unit/loader/test_tf_dataloader.py: 50 warnings
tests/unit/loader/test_torch_dataloader.py: 16 warnings
tests/unit/ops/test_column_similarity.py: 7 warnings
tests/unit/ops/test_ops.py: 74 warnings
tests/unit/workflow/test_workflow.py: 31 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))

tests/unit/test_io.py::test_validate_dataset_bad_schema
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:1105: UserWarning: Unable to sample column dtypes to infer nvt.Dataset schema, schema is empty.
warnings.warn(

tests/unit/test_io.py: 96 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/init.py:38: DeprecationWarning: ColumnGroup is deprecated, use ColumnSelector instead
warnings.warn("ColumnGroup is deprecated, use ColumnSelector instead", DeprecationWarning)

tests/unit/test_io.py: 24 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/node.py:47: FutureWarning: The ["a", "b", "c"] >> ops.Operator syntax for creating a ColumnGroup has been deprecated in NVTabular 21.09 and will be removed in a future version.
warnings.warn(

tests/unit/test_io.py: 36 warnings
tests/unit/workflow/test_workflow.py: 44 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py:89: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for execution. Please use the client argument to initialize a Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 52 warnings
tests/unit/workflow/test_workflow.py: 35 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dask.py:372: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for this write operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 36 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:511: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler is being used for this shuffle operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.1]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[f"{col}_filled"] = df[col].isna()

tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.1]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:126: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = df[col].fillna(self.medians[col])

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/indexing.py:1637: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:54: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[f"{col}_filled"] = df[col].isna()

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:55: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = df[col].fillna(self.fill_val)

tests/unit/ops/test_ops.py: 96 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:190: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[tmp] = _arange(len(df), like_df=df, dtype="int32")

tests/unit/ops/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/ops/test_ops.py::test_groupby_op[id-True]
tests/unit/ops/test_ops.py::test_groupby_op[id-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

tests/unit/workflow/test_cpu_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 33 6 1 46% 32->36, 48-145
nvtabular/init.py 18 0 0 0 100%
nvtabular/columns/init.py 2 0 0 0 100%
nvtabular/columns/schema.py 209 17 103 20 88% 46->62, 49, 51, 53-56, 58, 98->109, 104, 147, 174, 260->267, 262, 263->265, 275, 292->297, 295->297, 308, 332, 339, 348, 351, 356->355
nvtabular/columns/selector.py 74 1 34 0 99% 121
nvtabular/dispatch.py 273 55 132 22 78% 36-40, 45-47, 53-63, 70-71, 99-101, 106-109, 113-118, 125, 144, 155, 161, 166->168, 179, 202-205, 244, 247, 253, 269, 276, 307->312, 310, 313, 316->320, 353, 364-367, 394-397, 427, 431, 472, 496, 498, 505
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 78 90 15 39% 30, 99, 103, 114-130, 140, 143-158, 162, 166-167, 173-198, 207-217, 220-227, 229->233, 234, 239-279, 282
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 32 15 14 1 52% 50, 74-82, 85-95
nvtabular/framework_utils/torch/models.py 45 6 28 10 75% 56, 57->61, 62, 67, 87->89, 90-91, 93->96, 96->100, 103, 107->109
nvtabular/framework_utils/torch/utils.py 75 10 30 9 82% 51->53, 53->55, 64, 70, 71->76, 75, 109, 118-120, 129-131
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 385 215 180 11 44% 82-86, 141-174, 195-218, 263-307, 338, 364-372, 380-387, 406, 428-444, 485-489, 527-537, 583-623, 629-645, 649-716, 723->726, 726->722, 762-772, 781, 791, 812, 818-844, 850-876, 883, 888-894
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 176 176 98 0 0% 27-332
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/inference/triton/model_pt.py 101 101 40 0 0% 27-220
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 183 18 72 11 87% 111, 114, 150, 235-246, 398, 408, 425->428, 436, 440->442, 442->438, 447, 449
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 353 76 166 28 75% 46-47, 257, 259, 272, 281, 299-313, 436->510, 441-444, 450-457, 462-506, 510->519, 570-571, 572->576, 619, 741, 743, 745, 751, 755-757, 759, 819-820, 847, 854-855, 861, 867, 963-964, 1081-1086, 1092, 1171, 1180
nvtabular/io/dataset_engine.py 24 1 0 0 96% 48
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 551 45 180 26 89% 34-35, 57, 76, 80->92, 89, 112, 122->127, 140, 142, 166->170, 173-179, 225-233, 248, 254, 272->274, 287, 306-316, 457-462, 500-505, 621->628, 689->694, 695-696, 816, 820, 824, 830, 862, 879, 883, 890->892, 1000->exit, 1010->1015, 1020->1030, 1035, 1057, 1080-1081
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 175 13 68 5 92% 24-25, 51, 79, 125, 128, 212, 221, 224, 267, 288-290
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 330 15 140 13 94% 111, 128, 143-144, 242->244, 254-258, 304-305, 344->348, 345->344, 419, 423-424, 454, 534, 559, 567
nvtabular/loader/tensorflow.py 163 22 52 7 86% 58, 66-69, 84, 98, 308, 344, 359-361, 390-392, 402-410, 413-416
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 22 0 0 0 100%
nvtabular/ops/add_metadata.py 9 0 0 0 100%
nvtabular/ops/bucketize.py 37 10 18 3 69% 53-55, 59->exit, 62-65, 84-87, 94
nvtabular/ops/categorify.py 624 74 332 48 85% 245, 247, 264, 268, 276, 284, 286, 313, 332-333, 357, 366, 377->381, 385-392, 474-475, 499-504, 591, 603-605, 622, 715, 733, 769, 847-848, 863-867, 868->832, 886, 894, 901->exit, 925, 928->931, 983, 988, 1004->1008, 1015-1018, 1029, 1033, 1035, 1042, 1047-1050, 1128, 1130, 1200->1223, 1206->1223, 1224-1229, 1266, 1285->1290, 1289, 1299->1296, 1304->1296, 1311, 1314, 1322-1332
nvtabular/ops/clip.py 18 2 6 3 79% 44, 52->54, 55
nvtabular/ops/column_similarity.py 118 25 38 5 74% 19-20, 78->exit, 108, 134, 198-199, 208-210, 218-234, 251->254, 255, 265
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 31 1 8 1 95% 69->71, 94
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 91 12 36 3 82% 63-67, 93, 121, 147, 151, 162-165
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 119 3 70 4 96% 73, 84, 94->96, 106->111, 141
nvtabular/ops/hash_bucket.py 41 2 20 2 93% 72, 106->112, 118
nvtabular/ops/hashed_cross.py 36 4 15 3 86% 53, 66, 81, 91
nvtabular/ops/internal/init.py 3 0 0 0 100%
nvtabular/ops/internal/concat_columns.py 11 0 0 0 100%
nvtabular/ops/internal/identity.py 6 1 0 0 83% 42
nvtabular/ops/internal/subset_columns.py 13 1 0 0 92% 53
nvtabular/ops/join_external.py 89 7 36 6 90% 20-21, 113, 115, 117, 159, 176->178, 215
nvtabular/ops/join_groupby.py 101 7 36 4 92% 108, 115, 124, 131->130, 215-216, 219-220
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 66 24 26 1 58% 21-22, 53-54, 104-118, 126-137
nvtabular/ops/logop.py 13 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 81 10 14 1 86% 70, 78-79, 85, 118-119, 141-142, 146, 157
nvtabular/ops/operator.py 66 3 14 1 95% 111, 189, 196
nvtabular/ops/rename.py 41 3 22 3 90% 47, 88-90
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 153 11 66 4 91% 167->171, 175->184, 232-233, 236-237, 249-255, 346->349, 362
nvtabular/tags.py 16 0 0 0 100%
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 321
nvtabular/tools/dataset_inspector.py 50 7 18 1 79% 32-39
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 102 43 46 8 52% 31-32, 36-37, 50, 61-62, 64-66, 69, 72, 78, 84, 90-126, 145, 149->153
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow/init.py 2 0 0 0 100%
nvtabular/workflow/node.py 240 18 116 10 89% 55, 93->98, 146, 248->252, 288, 302, 311, 329-334, 339, 388-389, 400->395, 453-458
nvtabular/workflow/workflow.py 221 15 112 7 93% 28-29, 47, 139, 195, 222-224, 332, 347-348, 366-367, 502, 514

TOTAL 7521 1547 3025 353 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.78%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': No module named 's3fs'
SKIPPED [8] tests/unit/test_io.py:555: could not import 'uavro': No module named 'uavro'
SKIPPED [2] tests/unit/test_io.py:914: Dask>=2021.07.1 required for file aggregation
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:521: not working correctly in ci environment
==== 3 failed, 1512 passed, 12 skipped, 794 warnings in 2119.99s (0:35:19) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins9150767269801821051.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1127 of commit c76f67b8049d053658ab327c8969199735341105, no merge conflicts.
Running as SYSTEM
Setting status of c76f67b8049d053658ab327c8969199735341105 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3509/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1127/*:refs/remotes/origin/pr/1127/* # timeout=10
 > git rev-parse c76f67b8049d053658ab327c8969199735341105^{commit} # timeout=10
Checking out Revision c76f67b8049d053658ab327c8969199735341105 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c76f67b8049d053658ab327c8969199735341105 # timeout=10
Commit message: "Merge branch 'main' into get-embedding-sizes-fix"
 > git rev-list --no-walk 015c9d1b59ba1d6ff668b3d2161937ccfd960f77 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1510096760308657384.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1)
running develop
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cpp
creating build/temp.linux-x86_64-3.8/cpp/nvtabular
creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.gc76f67b -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.gc76f67b -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.gc76f67b -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+78.gc76f67b -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
creating build/lib.linux-x86_64-3.8
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 0.6.0+78.gc76f67b is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Processing dependencies for nvtabular==0.6.0+78.gc76f67b
Searching for protobuf==3.17.3
Best match: protobuf 3.17.3
Adding protobuf 3.17.3 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tensorflow-metadata==1.2.0
Best match: tensorflow-metadata 1.2.0
Processing tensorflow_metadata-1.2.0-py3.8.egg
tensorflow-metadata 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tensorflow_metadata-1.2.0-py3.8.egg
Searching for pyarrow==4.0.1
Best match: pyarrow 4.0.1
Adding pyarrow 4.0.1 to easy-install.pth file
Installing plasma_store script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tqdm==4.61.2
Best match: tqdm 4.61.2
Processing tqdm-4.61.2-py3.8.egg
tqdm 4.61.2 is already the active version in easy-install.pth
Installing tqdm script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tqdm-4.61.2-py3.8.egg
Searching for numba==0.54.0
Best match: numba 0.54.0
Processing numba-0.54.0-py3.8-linux-x86_64.egg
numba 0.54.0 is already the active version in easy-install.pth
Installing pycc script to /var/jenkins_home/.local/bin
Installing numba script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg
Searching for pandas==1.2.5
Best match: pandas 1.2.5
Processing pandas-1.2.5-py3.8-linux-x86_64.egg
pandas 1.2.5 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg
Searching for distributed==2021.4.1
Best match: distributed 2021.4.1
Processing distributed-2021.4.1-py3.8.egg
distributed 2021.4.1 is already the active version in easy-install.pth
Installing dask-ssh script to /var/jenkins_home/.local/bin
Installing dask-scheduler script to /var/jenkins_home/.local/bin
Installing dask-worker script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/distributed-2021.4.1-py3.8.egg
Searching for dask==2021.4.1
Best match: dask 2021.4.1
Processing dask-2021.4.1-py3.8.egg
dask 2021.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg
Searching for PyYAML==5.4.1
Best match: PyYAML 5.4.1
Processing PyYAML-5.4.1-py3.8-linux-x86_64.egg
PyYAML 5.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for googleapis-common-protos==1.53.0
Best match: googleapis-common-protos 1.53.0
Processing googleapis_common_protos-1.53.0-py3.8.egg
googleapis-common-protos 1.53.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/googleapis_common_protos-1.53.0-py3.8.egg
Searching for absl-py==0.12.0
Best match: absl-py 0.12.0
Processing absl_py-0.12.0-py3.8.egg
absl-py 0.12.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg
Searching for numpy==1.20.2
Best match: numpy 1.20.2
Adding numpy 1.20.2 to easy-install.pth file
Installing f2py script to /var/jenkins_home/.local/bin
Installing f2py3 script to /var/jenkins_home/.local/bin
Installing f2py3.8 script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for setuptools==58.0.4
Best match: setuptools 58.0.4
Adding setuptools 58.0.4 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for llvmlite==0.37.0
Best match: llvmlite 0.37.0
Processing llvmlite-0.37.0-py3.8-linux-x86_64.egg
llvmlite 0.37.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/llvmlite-0.37.0-py3.8-linux-x86_64.egg
Searching for pytz==2021.1
Best match: pytz 2021.1
Adding pytz 2021.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for zict==2.0.0
Best match: zict 2.0.0
Processing zict-2.0.0-py3.8.egg
zict 2.0.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg
Searching for tornado==6.1
Best match: tornado 6.1
Processing tornado-6.1-py3.8-linux-x86_64.egg
tornado 6.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg
Searching for toolz==0.11.1
Best match: toolz 0.11.1
Processing toolz-0.11.1-py3.8.egg
toolz 0.11.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/toolz-0.11.1-py3.8.egg
Searching for tblib==1.7.0
Best match: tblib 1.7.0
Processing tblib-1.7.0-py3.8.egg
tblib 1.7.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg
Searching for sortedcontainers==2.4.0
Best match: sortedcontainers 2.4.0
Processing sortedcontainers-2.4.0-py3.8.egg
sortedcontainers 2.4.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg
Searching for psutil==5.8.0
Best match: psutil 5.8.0
Processing psutil-5.8.0-py3.8-linux-x86_64.egg
psutil 5.8.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg
Searching for msgpack==1.0.2
Best match: msgpack 1.0.2
Processing msgpack-1.0.2-py3.8-linux-x86_64.egg
msgpack 1.0.2 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/msgpack-1.0.2-py3.8-linux-x86_64.egg
Searching for cloudpickle==1.6.0
Best match: cloudpickle 1.6.0
Processing cloudpickle-1.6.0-py3.8.egg
cloudpickle 1.6.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/cloudpickle-1.6.0-py3.8.egg
Searching for click==8.0.1
Best match: click 8.0.1
Processing click-8.0.1-py3.8.egg
click 8.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/click-8.0.1-py3.8.egg
Searching for partd==1.2.0
Best match: partd 1.2.0
Processing partd-1.2.0-py3.8.egg
partd 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg
Searching for fsspec==2021.8.1
Best match: fsspec 2021.8.1
Processing fsspec-2021.8.1-py3.8.egg
fsspec 2021.8.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/fsspec-2021.8.1-py3.8.egg
Searching for HeapDict==1.0.1
Best match: HeapDict 1.0.1
Processing HeapDict-1.0.1-py3.8.egg
HeapDict 1.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg
Searching for locket==0.2.1
Best match: locket 0.2.1
Processing locket-0.2.1-py3.8.egg
locket 0.2.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg
Finished processing dependencies for nvtabular==0.6.0+78.gc76f67b
Running black --check
All done! ✨ 🍰 ✨
128 files would be left unchanged.
Running flake8
Running isort
Skipped 2 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:504:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:67:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1530 items / 1 skipped / 1529 selected

tests/unit/test_dask_nvt.py ............................................ [ 2%]
..................................................................... [ 7%]
tests/unit/test_io.py .................................................. [ 10%]
........................................................................ [ 15%]
..........ssssssss.....................................................s [ 20%]
s [ 20%]
tests/unit/test_notebooks.py ...F.. [ 20%]
tests/unit/test_tf4rec.py . [ 20%]
tests/unit/test_tools.py ...................... [ 22%]
tests/unit/test_triton_inference.py .............................. [ 23%]
tests/unit/columns/test_column_schemas.py .............................. [ 25%]
................................................... [ 29%]
tests/unit/columns/test_column_selector.py .................... [ 30%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 30%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 32%]
................................................... [ 35%]
tests/unit/framework_utils/test_torch_layers.py . [ 35%]
tests/unit/loader/test_dataloader_backend.py .. [ 35%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 38%]
........................................s.. [ 40%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 42%]
....................................................... [ 46%]
tests/unit/ops/test_column_similarity.py ........................ [ 47%]
tests/unit/ops/test_ops.py ............................................. [ 50%]
........................................................................ [ 55%]
........................................................................ [ 60%]
........................................................................ [ 64%]
........................................................................ [ 69%]
........................................................................ [ 74%]
................................................. [ 77%]
tests/unit/ops/test_ops_schema.py ...................................... [ 80%]
........................................................................ [ 84%]
........................................................................ [ 89%]
.......................... [ 91%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 91%]
tests/unit/workflow/test_workflow.py ................................... [ 93%]
.......................................................... [ 97%]
tests/unit/workflow/test_workflow_node.py ........... [ 98%]
tests/unit/workflow/test_workflow_ops.py .. [ 98%]
tests/unit/workflow/test_workflow_schemas.py ....................... [100%]

=================================== FAILURES ===================================
____________________________ test_movielens_example ____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-36/test_movielens_example0')

def test_movielens_example(tmpdir):
    _get_random_movielens_data(tmpdir, 10000, dataset="movie")
    _get_random_movielens_data(tmpdir, 10000, dataset="ratings")
    _get_random_movielens_data(tmpdir, 5000, dataset="ratings", valid=True)

    triton_model_path = os.path.join(tmpdir, "models")
    os.environ["INPUT_DATA_DIR"] = str(tmpdir)
    os.environ["MODEL_PATH"] = triton_model_path

    notebook_path = os.path.join(
        dirname(TEST_PATH),
        "examples/getting-started-movielens/",
        "02-ETL-with-NVTabular.ipynb",
    )
    _run_notebook(tmpdir, notebook_path)

    def _modify_tf_nb(line):
        return line.replace(
            # don't require graphviz/pydot
            "tf.keras.utils.plot_model(model)",
            "# tf.keras.utils.plot_model(model)",
        )

    def _modify_tf_triton(line):
        # models are already preloaded
        line = line.replace("triton_client.load_model", "# triton_client.load_model")
        line = line.replace("triton_client.unload_model", "# triton_client.unload_model")
        return line

    notebooks = []
    try:
        import torch  # noqa

        notebooks.append("03-Training-with-PyTorch.ipynb")
    except Exception:
        pass
    try:
        import nvtabular.inference.triton  # noqa
        import nvtabular.loader.tensorflow  # noqa

        notebooks.append("03-Training-with-TF.ipynb")
        has_tf = True

    except Exception:
        has_tf = False

    for notebook in notebooks:
        notebook_path = os.path.join(
            dirname(TEST_PATH),
            "examples/getting-started-movielens/",
            notebook,
        )
        if notebook == "03-Training-with-TF.ipynb":
            _run_notebook(tmpdir, notebook_path, transform=_modify_tf_nb)
        else:
          _run_notebook(tmpdir, notebook_path)

tests/unit/test_notebooks.py:211:


tests/unit/test_notebooks.py:305: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python', '/tmp/pytest-of-jenkins/pytest-36/test_movielens_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f65e6a84430>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python', '/tmp/pytest-of-jenkins/pytest-36/test_movielens_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-36/test_movielens_example0/notebook.py", line 60, in
EMBEDDING_TABLE_SHAPES, MH_EMBEDDING_TABLE_SHAPES = nvt.ops.get_embedding_sizes(proc)
ValueError: too many values to unpack (expected 2)
=============================== warnings summary ===============================
tests/unit/test_dask_nvt.py: 3 warnings
tests/unit/test_io.py: 24 warnings
tests/unit/test_tf4rec.py: 2 warnings
tests/unit/test_tools.py: 2 warnings
tests/unit/test_triton_inference.py: 5 warnings
tests/unit/loader/test_tf_dataloader.py: 50 warnings
tests/unit/loader/test_torch_dataloader.py: 16 warnings
tests/unit/ops/test_column_similarity.py: 7 warnings
tests/unit/ops/test_ops.py: 74 warnings
tests/unit/workflow/test_workflow.py: 31 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))

tests/unit/test_io.py::test_validate_dataset_bad_schema
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:1118: UserWarning: Unable to sample column dtypes to infer nvt.Dataset schema, schema is empty.
warnings.warn(

tests/unit/test_io.py: 96 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/init.py:38: DeprecationWarning: ColumnGroup is deprecated, use ColumnSelector instead
warnings.warn("ColumnGroup is deprecated, use ColumnSelector instead", DeprecationWarning)

tests/unit/test_io.py: 24 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/node.py:47: FutureWarning: The ["a", "b", "c"] >> ops.Operator syntax for creating a ColumnGroup has been deprecated in NVTabular 21.09 and will be removed in a future version.
warnings.warn(

tests/unit/test_io.py: 36 warnings
tests/unit/workflow/test_workflow.py: 44 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py:89: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for execution. Please use the client argument to initialize a Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 52 warnings
tests/unit/workflow/test_workflow.py: 35 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dask.py:372: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for this write operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 36 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:512: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler is being used for this shuffle operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/ops/test_column_similarity.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/column_similarity.py:109: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[name] = similarities

tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.1]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[f"{col}_filled"] = df[col].isna()

tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.1]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:126: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = df[col].fillna(self.medians[col])

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/indexing.py:1637: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:54: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[f"{col}_filled"] = df[col].isna()

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:55: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = df[col].fillna(self.fill_val)

tests/unit/ops/test_ops.py: 96 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:190: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[tmp] = _arange(len(df), like_df=df, dtype="int32")

tests/unit/ops/test_ops.py::test_join_external[True-True-left-host-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-left-device-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-inner-host-pandas-parquet]
tests/unit/ops/test_ops.py::test_join_external[True-True-inner-device-pandas-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:171: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
_ext.drop_duplicates(ignore_index=True, inplace=True)

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/ops/test_ops.py::test_groupby_op[keys0-True]
tests/unit/ops/test_ops.py::test_groupby_op[keys0-False]
tests/unit/ops/test_ops.py::test_groupby_op[id-True]
tests/unit/ops/test_ops.py::test_groupby_op[id-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

tests/unit/workflow/test_cpu_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 18 0 0 0 100%
nvtabular/columns/init.py 2 0 0 0 100%
nvtabular/columns/schema.py 209 17 103 20 88% 46->62, 49, 51, 53-56, 58, 98->109, 104, 147, 174, 260->267, 262, 263->265, 275, 292->297, 295->297, 308, 332, 339, 348, 351, 356->355
nvtabular/columns/selector.py 74 1 34 0 99% 121
nvtabular/dispatch.py 280 55 138 23 78% 36-40, 45-47, 53-63, 70-71, 99-101, 106-109, 113-118, 125, 144, 155, 161, 166->168, 179, 202-205, 244->246, 253, 256, 262, 278, 285, 316->321, 319, 322, 325->329, 362, 373-376, 402-405, 435, 439, 480, 504, 506, 513
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 78 90 15 39% 30, 99, 103, 114-130, 140, 143-158, 162, 166-167, 173-198, 207-217, 220-227, 229->233, 234, 239-279, 282
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 32 2 14 2 91% 50, 91
nvtabular/framework_utils/torch/models.py 45 1 28 4 93% 57->61, 87->89, 93->96, 103
nvtabular/framework_utils/torch/utils.py 75 5 30 5 90% 51->53, 64, 71->76, 75, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 385 215 180 11 44% 82-86, 141-174, 195-218, 263-307, 338, 364-372, 380-387, 406, 428-444, 485-489, 527-537, 583-623, 629-645, 649-716, 723->726, 726->722, 762-772, 781, 791, 812, 818-844, 850-876, 883, 888-894
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 176 176 98 0 0% 27-332
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/inference/triton/model_pt.py 101 101 40 0 0% 27-220
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 183 18 72 11 87% 111, 114, 150, 235-246, 398, 408, 425->428, 436, 440->442, 442->438, 447, 449
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 361 76 174 28 76% 47-48, 258, 260, 273, 282, 300-314, 437->511, 442-445, 451-458, 463-507, 511->520, 571-572, 573->577, 620, 742, 744, 746, 752, 756-758, 760, 820-821, 848, 855-856, 862, 868, 964-965, 1082-1087, 1093, 1185, 1194
nvtabular/io/dataset_engine.py 24 1 0 0 96% 48
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 551 45 180 26 89% 34-35, 57, 76, 80->92, 89, 112, 122->127, 140, 142, 166->170, 173-179, 225-233, 248, 254, 272->274, 287, 306-316, 457-462, 500-505, 621->628, 689->694, 695-696, 816, 820, 824, 830, 862, 879, 883, 890->892, 1000->exit, 1010->1015, 1020->1030, 1035, 1057, 1080-1081
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 175 13 68 5 92% 24-25, 51, 79, 125, 128, 212, 221, 224, 267, 288-290
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 330 13 140 11 95% 128, 143-144, 242->244, 254-258, 304-305, 344->348, 345->344, 419, 423-424, 454, 559, 567
nvtabular/loader/tensorflow.py 163 22 52 7 86% 58, 66-69, 84, 98, 308, 344, 359-361, 390-392, 402-410, 413-416
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 22 0 0 0 100%
nvtabular/ops/add_metadata.py 9 0 0 0 100%
nvtabular/ops/bucketize.py 37 10 18 3 69% 53-55, 59->exit, 62-65, 84-87, 94
nvtabular/ops/categorify.py 626 70 334 47 86% 245, 247, 264, 268, 276, 284, 286, 313, 332-333, 357, 366, 377->381, 385-392, 474-475, 499-504, 622, 715, 733, 769, 847-848, 863-867, 868->832, 886, 894, 901->exit, 925, 928->931, 983, 988, 1010->1014, 1016->973, 1022-1025, 1037, 1041, 1043, 1050, 1055-1058, 1136, 1138, 1208->1231, 1214->1231, 1232-1237, 1274, 1293->1298, 1297, 1307->1304, 1312->1304, 1319, 1322, 1330-1340
nvtabular/ops/clip.py 18 2 6 3 79% 44, 52->54, 55
nvtabular/ops/column_similarity.py 118 25 38 5 74% 19-20, 78->exit, 108, 134, 198-199, 208-210, 218-234, 251->254, 255, 265
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 31 1 8 1 95% 69->71, 94
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 91 12 36 3 82% 63-67, 93, 121, 147, 151, 162-165
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 119 3 70 4 96% 73, 84, 94->96, 106->111, 141
nvtabular/ops/hash_bucket.py 41 2 20 2 93% 72, 106->112, 118
nvtabular/ops/hashed_cross.py 36 4 15 3 86% 53, 66, 81, 91
nvtabular/ops/internal/init.py 3 0 0 0 100%
nvtabular/ops/internal/concat_columns.py 11 0 0 0 100%
nvtabular/ops/internal/identity.py 6 1 0 0 83% 42
nvtabular/ops/internal/subset_columns.py 13 1 0 0 92% 53
nvtabular/ops/join_external.py 89 7 36 6 90% 20-21, 113, 115, 117, 159, 176->178, 215
nvtabular/ops/join_groupby.py 101 7 36 4 92% 108, 115, 124, 131->130, 215-216, 219-220
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 66 24 26 1 58% 21-22, 53-54, 104-118, 126-137
nvtabular/ops/logop.py 13 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 81 10 14 1 86% 70, 78-79, 85, 118-119, 141-142, 146, 157
nvtabular/ops/operator.py 66 3 14 1 95% 111, 189, 196
nvtabular/ops/rename.py 41 3 22 3 90% 47, 88-90
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 153 11 66 4 91% 167->171, 175->184, 232-233, 236-237, 249-255, 346->349, 362
nvtabular/tags.py 16 0 0 0 100%
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 321
nvtabular/tools/dataset_inspector.py 50 7 18 1 79% 32-39
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 102 43 46 8 52% 31-32, 36-37, 50, 61-62, 64-66, 69, 72, 78, 84, 90-126, 145, 149->153
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow/init.py 2 0 0 0 100%
nvtabular/workflow/node.py 240 18 116 10 89% 55, 93->98, 146, 248->252, 288, 302, 311, 329-334, 339, 388-389, 400->395, 453-458
nvtabular/workflow/workflow.py 221 15 112 7 93% 28-29, 47, 139, 195, 222-224, 332, 347-348, 366-367, 502, 514

TOTAL 7538 1485 3041 342 78%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 77.69%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': No module named 's3fs'
SKIPPED [8] tests/unit/test_io.py:555: could not import 'uavro': No module named 'uavro'
SKIPPED [2] tests/unit/test_io.py:914: Dask>=2021.07.1 required for file aggregation
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:521: not working correctly in ci environment
==== 1 failed, 1518 passed, 12 skipped, 808 warnings in 2398.52s (0:39:58) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins6901837598072310846.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1127 of commit 6675990fcf757141c11dd257bb59f984d10fecb5, no merge conflicts.
Running as SYSTEM
Setting status of 6675990fcf757141c11dd257bb59f984d10fecb5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3512/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1127/*:refs/remotes/origin/pr/1127/* # timeout=10
 > git rev-parse 6675990fcf757141c11dd257bb59f984d10fecb5^{commit} # timeout=10
Checking out Revision 6675990fcf757141c11dd257bb59f984d10fecb5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6675990fcf757141c11dd257bb59f984d10fecb5 # timeout=10
Commit message: "Merge branch 'get-embedding-sizes-fix' of https://github.com/jperez999/NVTabular into get-embedding-sizes-fix"
 > git rev-list --no-walk 8c42d9db4bfaae9baf91c8115abd1d68490bca58 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7372782873294563199.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1)
running develop
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cpp
creating build/temp.linux-x86_64-3.8/cpp/nvtabular
creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+80.g6675990 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+80.g6675990 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+80.g6675990 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+80.g6675990 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
creating build/lib.linux-x86_64-3.8
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 0.6.0+80.g6675990 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Processing dependencies for nvtabular==0.6.0+80.g6675990
Searching for protobuf==3.17.3
Best match: protobuf 3.17.3
Adding protobuf 3.17.3 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tensorflow-metadata==1.2.0
Best match: tensorflow-metadata 1.2.0
Processing tensorflow_metadata-1.2.0-py3.8.egg
tensorflow-metadata 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tensorflow_metadata-1.2.0-py3.8.egg
Searching for pyarrow==4.0.1
Best match: pyarrow 4.0.1
Adding pyarrow 4.0.1 to easy-install.pth file
Installing plasma_store script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tqdm==4.61.2
Best match: tqdm 4.61.2
Processing tqdm-4.61.2-py3.8.egg
tqdm 4.61.2 is already the active version in easy-install.pth
Installing tqdm script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tqdm-4.61.2-py3.8.egg
Searching for numba==0.54.0
Best match: numba 0.54.0
Processing numba-0.54.0-py3.8-linux-x86_64.egg
numba 0.54.0 is already the active version in easy-install.pth
Installing pycc script to /var/jenkins_home/.local/bin
Installing numba script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg
Searching for pandas==1.2.5
Best match: pandas 1.2.5
Processing pandas-1.2.5-py3.8-linux-x86_64.egg
pandas 1.2.5 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg
Searching for distributed==2021.4.1
Best match: distributed 2021.4.1
Processing distributed-2021.4.1-py3.8.egg
distributed 2021.4.1 is already the active version in easy-install.pth
Installing dask-ssh script to /var/jenkins_home/.local/bin
Installing dask-scheduler script to /var/jenkins_home/.local/bin
Installing dask-worker script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/distributed-2021.4.1-py3.8.egg
Searching for dask==2021.4.1
Best match: dask 2021.4.1
Processing dask-2021.4.1-py3.8.egg
dask 2021.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg
Searching for PyYAML==5.4.1
Best match: PyYAML 5.4.1
Processing PyYAML-5.4.1-py3.8-linux-x86_64.egg
PyYAML 5.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for googleapis-common-protos==1.53.0
Best match: googleapis-common-protos 1.53.0
Processing googleapis_common_protos-1.53.0-py3.8.egg
googleapis-common-protos 1.53.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/googleapis_common_protos-1.53.0-py3.8.egg
Searching for absl-py==0.12.0
Best match: absl-py 0.12.0
Processing absl_py-0.12.0-py3.8.egg
absl-py 0.12.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg
Searching for numpy==1.20.2
Best match: numpy 1.20.2
Adding numpy 1.20.2 to easy-install.pth file
Installing f2py script to /var/jenkins_home/.local/bin
Installing f2py3 script to /var/jenkins_home/.local/bin
Installing f2py3.8 script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for setuptools==58.0.4
Best match: setuptools 58.0.4
Adding setuptools 58.0.4 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for llvmlite==0.37.0
Best match: llvmlite 0.37.0
Processing llvmlite-0.37.0-py3.8-linux-x86_64.egg
llvmlite 0.37.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/llvmlite-0.37.0-py3.8-linux-x86_64.egg
Searching for pytz==2021.1
Best match: pytz 2021.1
Adding pytz 2021.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for zict==2.0.0
Best match: zict 2.0.0
Processing zict-2.0.0-py3.8.egg
zict 2.0.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg
Searching for tornado==6.1
Best match: tornado 6.1
Processing tornado-6.1-py3.8-linux-x86_64.egg
tornado 6.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg
Searching for toolz==0.11.1
Best match: toolz 0.11.1
Processing toolz-0.11.1-py3.8.egg
toolz 0.11.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/toolz-0.11.1-py3.8.egg
Searching for tblib==1.7.0
Best match: tblib 1.7.0
Processing tblib-1.7.0-py3.8.egg
tblib 1.7.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg
Searching for sortedcontainers==2.4.0
Best match: sortedcontainers 2.4.0
Processing sortedcontainers-2.4.0-py3.8.egg
sortedcontainers 2.4.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg
Searching for psutil==5.8.0
Best match: psutil 5.8.0
Processing psutil-5.8.0-py3.8-linux-x86_64.egg
psutil 5.8.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg
Searching for msgpack==1.0.2
Best match: msgpack 1.0.2
Processing msgpack-1.0.2-py3.8-linux-x86_64.egg
msgpack 1.0.2 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/msgpack-1.0.2-py3.8-linux-x86_64.egg
Searching for cloudpickle==1.6.0
Best match: cloudpickle 1.6.0
Processing cloudpickle-1.6.0-py3.8.egg
cloudpickle 1.6.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/cloudpickle-1.6.0-py3.8.egg
Searching for click==8.0.1
Best match: click 8.0.1
Processing click-8.0.1-py3.8.egg
click 8.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/click-8.0.1-py3.8.egg
Searching for partd==1.2.0
Best match: partd 1.2.0
Processing partd-1.2.0-py3.8.egg
partd 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg
Searching for fsspec==2021.8.1
Best match: fsspec 2021.8.1
Processing fsspec-2021.8.1-py3.8.egg
fsspec 2021.8.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/fsspec-2021.8.1-py3.8.egg
Searching for HeapDict==1.0.1
Best match: HeapDict 1.0.1
Processing HeapDict-1.0.1-py3.8.egg
HeapDict 1.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg
Searching for locket==0.2.1
Best match: locket 0.2.1
Processing locket-0.2.1-py3.8.egg
locket 0.2.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg
Finished processing dependencies for nvtabular==0.6.0+80.g6675990
Running black --check
All done! ✨ 🍰 ✨
128 files would be left unchanged.
Running flake8
Running isort
Skipped 2 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:504:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:67:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1530 items / 1 skipped / 1529 selected

tests/unit/test_dask_nvt.py ............................................ [ 2%]
...........................................................F......... [ 7%]
tests/unit/test_io.py .................................................. [ 10%]
........................................................................ [ 15%]
..........ssssssss.....................................................s [ 20%]
s [ 20%]
tests/unit/test_notebooks.py ...... [ 20%]
tests/unit/test_tf4rec.py . [ 20%]
tests/unit/test_tools.py ...................... [ 22%]
tests/unit/test_triton_inference.py .............................. [ 23%]
tests/unit/columns/test_column_schemas.py .............................. [ 25%]
................................................... [ 29%]
tests/unit/columns/test_column_selector.py .................... [ 30%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 30%]
tests/unit/framework_utils/test_tf_layers.py ..F........................ [ 32%]
................................................... [ 35%]
tests/unit/framework_utils/test_torch_layers.py . [ 35%]
tests/unit/loader/test_dataloader_backend.py .. [ 35%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 38%]
........................................s.. [ 40%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 42%]
....................................................... [ 46%]
tests/unit/ops/test_column_similarity.py ........................ [ 47%]
tests/unit/ops/test_ops.py ............................................. [ 50%]
........................................................................ [ 55%]
........................................................................ [ 60%]
........................................................................ [ 64%]
........................................................................ [ 69%]
........................................................................ [ 74%]
................................................. [ 77%]
tests/unit/ops/test_ops_schema.py ...................................... [ 80%]
........................................................................ [ 84%]
........................................................................ [ 89%]
.......................... [ 91%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 91%]
tests/unit/workflow/test_workflow.py ................................... [ 93%]
.......................................................... [ 97%]
tests/unit/workflow/test_workflow_node.py ........... [ 98%]
tests/unit/workflow/test_workflow_ops.py .. [ 98%]
tests/unit/workflow/test_workflow_schemas.py ....................... [100%]

=================================== FAILURES ===================================
_________ test_dask_preproc_cpu[None-Shuffle.PER_WORKER-csv-no-header] _________

client = <Client: 'tcp://127.0.0.1:43141' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-39/test_dask_preproc_cpu_None_Shu2')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-39/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-39/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-39/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-39/parquet0')}
engine = 'csv-no-header', shuffle = <Shuffle.PER_WORKER: 1>, cpu = None

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_preproc_cpu(client, tmpdir, datasets, engine, shuffle, cpu):
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, part_size="1MB", cpu=cpu)
    else:
        dataset = Dataset(paths, names=allcols_csv, part_size="1MB", cpu=cpu)

    # Simple transform (normalize)
    cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]
    conts = cont_names >> ops.FillMissing() >> ops.Normalize()
    workflow = Workflow(conts + cat_names + label_name, client=client)
    transformed = workflow.fit_transform(dataset)

    # Write out dataset
    output_path = os.path.join(tmpdir, "processed")
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=4)

    # Check the final result
  df_disk = dd_read_parquet(output_path, engine="pyarrow").compute()

tests/unit/test_dask_nvt.py:273:


../../../.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/base.py:285: in compute
(result,) = compute(self, traverse=False, **kwargs)
../../../.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/base.py:567: in compute
results = schedule(dsk, keys, **kwargs)
../../../.local/lib/python3.8/site-packages/distributed/client.py:2666: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
../../../.local/lib/python3.8/site-packages/distributed/client.py:1975: in gather
return self.sync(
../../../.local/lib/python3.8/site-packages/distributed/client.py:843: in sync
return sync(
../../../.local/lib/python3.8/site-packages/distributed/utils.py:353: in sync
raise exc.with_traceback(tb)
../../../.local/lib/python3.8/site-packages/distributed/utils.py:336: in f
result[0] = yield future
../../../.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py:762: in run
value = future.result()
../../../.local/lib/python3.8/site-packages/distributed/client.py:1840: in _gather
raise exception.with_traceback(traceback)
../../../.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/io/parquet/core.py:381: in read_parquet_part
dfs = [
../../../.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/io/parquet/core.py:382: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
../../../.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/io/parquet/arrow.py:599: in read_partition
arrow_table = cls._read_table(
../../../.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/io/parquet/arrow.py:2007: in _read_table
return _read_table_from_path(
../../../.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/io/parquet/arrow.py:406: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:217: in init
self.reader.open(source, use_memory_map=memory_map,
pyarrow/_parquet.pyx:949: in pyarrow._parquet.ParquetReader.open
???


???
E OSError: Couldn't deserialize thrift: TProtocolException: Invalid data

pyarrow/error.pxi:112: OSError
----------------------------- Captured stderr call -----------------------------
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))
distributed.worker - WARNING - Compute Failed
Function: read_parquet_part
args: (<fsspec.implementations.local.LocalFileSystem object at 0x7fb8192f4dc0>, <bound method ArrowDatasetEngine.read_partition of <class 'dask.dataframe.io.parquet.arrow.ArrowLegacyEngine'>>, Empty DataFrame
Columns: [x, y, id, name-string, label]
Index: [], [(('/tmp/pytest-of-jenkins/pytest-39/test_dask_preproc_cpu_None_Shu2/processed/part_2.parquet', [0], []), {})], ['x', 'y', 'id', 'name-string', 'label'], None, {'partitions': <pyarrow.parquet.ParquetPartitions object at 0x7fb6849ecf10>, 'categories': [], 'filters': None})
kwargs: {}
Exception: OSError("Couldn't deserialize thrift: TProtocolException: Invalid data\n")

____________________ test_dense_embedding_layer[mean-stack] ____________________

aggregation = 'stack', combiner = 'mean'

@pytest.mark.parametrize("aggregation", ["stack", "concat"])
@pytest.mark.parametrize("combiner", ["sum", "mean"])  # TODO: add sqrtn
def test_dense_embedding_layer(aggregation, combiner):
    raw_good_columns = get_good_feature_columns()
    scalar_numeric, vector_numeric, one_hot, multi_hot = raw_good_columns
    one_hot_embedding = tf.feature_column.indicator_column(one_hot)
    multi_hot_embedding = tf.feature_column.embedding_column(multi_hot, 8, combiner=combiner)

    # should raise ValueError if passed categorical columns
    with pytest.raises(ValueError):
        embedding_layer = layers.DenseFeatures(raw_good_columns, aggregation=aggregation)

    if aggregation == "stack":
        # can't pass numeric to stack aggregation unless dims are 1
        with pytest.raises(ValueError):
            embedding_layer = layers.DenseFeatures(
                [
                    scalar_numeric,
                    vector_numeric,
                    one_hot_embedding,
                    multi_hot_embedding,
                ],
                aggregation=aggregation,
            )
        # can't have mismatched dims with stack aggregation
        with pytest.raises(ValueError):
            embedding_layer = layers.DenseFeatures(
                [one_hot_embedding, multi_hot_embedding], aggregation=aggregation
            )

        # reset b embedding to have matching dims
        multi_hot_embedding = tf.feature_column.embedding_column(multi_hot, 100, combiner=combiner)
        cols = [one_hot_embedding, multi_hot_embedding]
    else:
        cols = [scalar_numeric, vector_numeric, one_hot_embedding, multi_hot_embedding]

    embedding_layer = layers.DenseFeatures(cols, aggregation=aggregation)
    inputs = {
        "scalar_continuous": tf.keras.Input(name="scalar_continuous", shape=(1,), dtype=tf.float32),
        "vector_continuous": tf.keras.Input(
            name="vector_continuous__values", shape=(1,), dtype=tf.float32
        ),
        "one_hot": tf.keras.Input(name="one_hot", shape=(1,), dtype=tf.int64),
        "multi_hot": (
            tf.keras.Input(name="multi_hot__values", shape=(1,), dtype=tf.int64),
            tf.keras.Input(name="multi_hot__nnzs", shape=(1,), dtype=tf.int64),
        ),
    }
    if aggregation == "stack":
        inputs.pop("scalar_continuous")
        inputs.pop("vector_continuous")

    output = embedding_layer(inputs)
    model = tf.keras.Model(inputs=inputs, outputs=output)
    model.compile("sgd", "mse")

    # TODO: check for out-of-range categorical behavior
    scalar = np.array([0.1, -0.2, 0.3], dtype=np.float32)
    vector = np.random.randn(3, 128).astype("float32")
    one_hot = np.array([44, 21, 32])
    multi_hot_values = np.array([0, 2, 1, 4, 1, 3, 1])
    multi_hot_nnzs = np.array([1, 2, 4])
    x = {
        "scalar_continuous": scalar[:, None],
        "vector_continuous": vector.flatten()[:, None],
        "one_hot": one_hot[:, None],
        "multi_hot": (multi_hot_values[:, None], multi_hot_nnzs[:, None]),
    }
    if aggregation == "stack":
        x.pop("scalar_continuous")
        x.pop("vector_continuous")

    multi_hot_embedding_table = embedding_layer.embedding_tables["multi_hot"].numpy()
    multi_hot_embedding_rows = _compute_expected_multi_hot(
        multi_hot_embedding_table, multi_hot_values, multi_hot_nnzs, combiner
    )

    # check that shape and values match up
    y_hat = model(x).numpy()
    assert y_hat.shape[0] == 3
    if aggregation == "stack":
        assert len(y_hat.shape) == 3
        # len of columns is 2 because of mh (vals, nnzs) struct
        assert y_hat.shape[1] == (len(x))
        assert y_hat.shape[2] == 100
      np.testing.assert_allclose(y_hat[:, 0], multi_hot_embedding_rows, rtol=1e-05)

E AssertionError:
E Not equal to tolerance rtol=1e-05, atol=0
E
E Mismatched elements: 1 / 300 (0.333%)
E Max absolute difference: 1.4901161e-08
E Max relative difference: 1.8782282e-05
E x: array([[ 1.060052e-01, 1.583474e-01, -8.539265e-02, -2.269728e-03,
E 2.466256e-02, 6.518302e-03, 6.934836e-02, 2.290908e-01,
E -1.516305e-01, 5.413985e-02, 1.011277e-01, -6.577917e-02,...
E y: array([[ 1.060052e-01, 1.583474e-01, -8.539265e-02, -2.269728e-03,
E 2.466256e-02, 6.518302e-03, 6.934836e-02, 2.290908e-01,
E -1.516305e-01, 5.413985e-02, 1.011277e-01, -6.577917e-02,...

tests/unit/framework_utils/test_tf_layers.py:139: AssertionError
=============================== warnings summary ===============================
tests/unit/test_dask_nvt.py: 3 warnings
tests/unit/test_io.py: 24 warnings
tests/unit/test_tf4rec.py: 2 warnings
tests/unit/test_tools.py: 2 warnings
tests/unit/test_triton_inference.py: 5 warnings
tests/unit/loader/test_tf_dataloader.py: 50 warnings
tests/unit/loader/test_torch_dataloader.py: 16 warnings
tests/unit/ops/test_column_similarity.py: 7 warnings
tests/unit/ops/test_ops.py: 74 warnings
tests/unit/workflow/test_workflow.py: 31 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))

tests/unit/test_io.py::test_validate_dataset_bad_schema
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:1123: UserWarning: Unable to sample column dtypes to infer nvt.Dataset schema, schema is empty.
warnings.warn(

tests/unit/test_io.py: 96 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/init.py:38: DeprecationWarning: ColumnGroup is deprecated, use ColumnSelector instead
warnings.warn("ColumnGroup is deprecated, use ColumnSelector instead", DeprecationWarning)

tests/unit/test_io.py: 24 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/node.py:47: FutureWarning: The ["a", "b", "c"] >> ops.Operator syntax for creating a ColumnGroup has been deprecated in NVTabular 21.09 and will be removed in a future version.
warnings.warn(

tests/unit/test_io.py: 36 warnings
tests/unit/workflow/test_workflow.py: 44 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py:89: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for execution. Please use the client argument to initialize a Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 52 warnings
tests/unit/workflow/test_workflow.py: 35 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dask.py:372: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for this write operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 36 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:515: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler is being used for this shuffle operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/indexing.py:1637: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops.py: 80 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[tmp] = _arange(len(df), like_df=df, dtype="int32")

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/ops/test_ops.py::test_groupby_op[keys0-True]
tests/unit/ops/test_ops.py::test_groupby_op[keys0-False]
tests/unit/ops/test_ops.py::test_groupby_op[id-True]
tests/unit/ops/test_ops.py::test_groupby_op[id-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

tests/unit/workflow/test_cpu_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 18 0 0 0 100%
nvtabular/columns/init.py 2 0 0 0 100%
nvtabular/columns/schema.py 209 17 103 20 88% 46->62, 49, 51, 53-56, 58, 98->109, 104, 147, 174, 260->267, 262, 263->265, 275, 292->297, 295->297, 308, 332, 339, 348, 351, 356->355
nvtabular/columns/selector.py 74 1 34 0 99% 121
nvtabular/dispatch.py 290 55 144 23 79% 36-40, 45-47, 53-63, 70-71, 114-116, 121-124, 128-133, 140, 159, 170, 176, 181->183, 194, 217-220, 259->261, 268, 271, 277, 293, 300, 331->336, 334, 337, 340->344, 377, 388-391, 417-420, 450, 454, 495, 519, 521, 528
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 78 90 15 39% 30, 99, 103, 114-130, 140, 143-158, 162, 166-167, 173-198, 207-217, 220-227, 229->233, 234, 239-279, 282
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 32 2 14 2 91% 50, 91
nvtabular/framework_utils/torch/models.py 45 1 28 4 93% 57->61, 87->89, 93->96, 103
nvtabular/framework_utils/torch/utils.py 75 5 30 5 90% 51->53, 64, 71->76, 75, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 385 210 180 13 45% 82-86, 141-174, 195-218, 263-307, 338, 364-372, 380-387, 406, 428-444, 485-489, 527-537, 583-623, 629-645, 649-716, 723->726, 726->722, 762-772, 781, 791, 812, 818-844, 850-876, 883, 889->892, 893
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 176 176 98 0 0% 27-332
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/inference/triton/model_pt.py 101 101 40 0 0% 27-220
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 183 18 72 11 87% 111, 114, 150, 235-246, 398, 408, 425->428, 436, 440->442, 442->438, 447, 449
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 364 76 176 28 76% 48-49, 259, 261, 274, 283, 303-317, 440->514, 445-448, 454-461, 466-510, 514->523, 574-575, 576->580, 623, 745, 747, 749, 755, 759-761, 763, 823-824, 851, 858-859, 865, 871, 967-968, 1085-1090, 1096, 1190, 1199
nvtabular/io/dataset_engine.py 24 1 0 0 96% 48
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 551 45 180 26 89% 34-35, 57, 76, 80->92, 89, 112, 122->127, 140, 142, 166->170, 173-179, 225-233, 248, 254, 272->274, 287, 306-316, 457-462, 500-505, 621->628, 689->694, 695-696, 816, 820, 824, 830, 862, 879, 883, 890->892, 1000->exit, 1010->1015, 1020->1030, 1035, 1057, 1080-1081
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 175 13 68 5 92% 24-25, 51, 79, 125, 128, 212, 221, 224, 267, 288-290
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 330 13 140 11 95% 128, 143-144, 242->244, 254-258, 304-305, 344->348, 345->344, 419, 423-424, 454, 559, 567
nvtabular/loader/tensorflow.py 163 22 52 7 86% 58, 66-69, 84, 98, 308, 344, 359-361, 390-392, 402-410, 413-416
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 22 0 0 0 100%
nvtabular/ops/add_metadata.py 9 0 0 0 100%
nvtabular/ops/bucketize.py 37 10 18 3 69% 53-55, 59->exit, 62-65, 84-87, 94
nvtabular/ops/categorify.py 626 67 334 48 86% 245, 247, 264, 268, 276, 284, 286, 313, 332-333, 357, 366, 377->381, 385-392, 474-475, 500-501, 622, 715, 733, 769, 847-848, 863-867, 868->832, 886, 894, 901->exit, 925, 928->931, 983, 988, 1010->1014, 1016->973, 1022-1025, 1037, 1041, 1043, 1050, 1055-1058, 1136, 1138, 1208->1231, 1214->1231, 1232-1237, 1274, 1293->1298, 1297, 1307->1304, 1312->1304, 1319, 1322, 1330-1340
nvtabular/ops/clip.py 18 2 6 3 79% 44, 52->54, 55
nvtabular/ops/column_similarity.py 118 25 38 5 74% 19-20, 78->exit, 108, 134, 198-199, 208-210, 218-234, 251->254, 255, 265
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 31 1 8 1 95% 69->71, 94
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 91 12 36 3 82% 63-67, 93, 121, 147, 151, 162-165
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 119 3 70 4 96% 73, 84, 94->96, 106->111, 141
nvtabular/ops/hash_bucket.py 41 2 20 2 93% 72, 106->112, 118
nvtabular/ops/hashed_cross.py 36 4 15 3 86% 53, 66, 81, 91
nvtabular/ops/internal/init.py 3 0 0 0 100%
nvtabular/ops/internal/concat_columns.py 11 0 0 0 100%
nvtabular/ops/internal/identity.py 6 1 0 0 83% 42
nvtabular/ops/internal/subset_columns.py 13 1 0 0 92% 53
nvtabular/ops/join_external.py 92 18 36 7 76% 20-21, 114, 116, 118, 135-161, 177->179, 216->227, 221
nvtabular/ops/join_groupby.py 101 7 36 4 92% 108, 115, 124, 131->130, 215-216, 219-220
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 66 24 26 1 58% 21-22, 53-54, 104-118, 126-137
nvtabular/ops/logop.py 13 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 81 10 14 1 86% 70, 78-79, 85, 118-119, 141-142, 146, 157
nvtabular/ops/operator.py 66 1 14 1 98% 111
nvtabular/ops/rename.py 41 3 22 3 90% 47, 88-90
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 153 11 66 4 91% 167->171, 175->184, 232-233, 236-237, 249-255, 346->349, 362
nvtabular/tags.py 16 0 0 0 100%
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 321
nvtabular/tools/dataset_inspector.py 50 7 18 1 79% 32-39
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 102 43 46 8 52% 31-32, 36-37, 50, 61-62, 64-66, 69, 72, 78, 84, 90-126, 145, 149->153
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow/init.py 2 0 0 0 100%
nvtabular/workflow/node.py 240 18 116 10 89% 55, 93->98, 146, 248->252, 288, 302, 311, 329-334, 339, 388-389, 400->395, 453-458
nvtabular/workflow/workflow.py 221 15 112 7 93% 28-29, 47, 139, 195, 222-224, 332, 347-348, 366-367, 502, 514

TOTAL 7554 1486 3049 346 78%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 77.69%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': No module named 's3fs'
SKIPPED [8] tests/unit/test_io.py:555: could not import 'uavro': No module named 'uavro'
SKIPPED [2] tests/unit/test_io.py:914: Dask>=2021.07.1 required for file aggregation
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:521: not working correctly in ci environment
==== 2 failed, 1517 passed, 12 skipped, 762 warnings in 2211.65s (0:36:51) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1713918211077614622.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1127 of commit 69ee7369a54f55e87c5c435d35d598bc28ea9f12, no merge conflicts.
Running as SYSTEM
Setting status of 69ee7369a54f55e87c5c435d35d598bc28ea9f12 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/3513/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/1127/*:refs/remotes/origin/pr/1127/* # timeout=10
 > git rev-parse 69ee7369a54f55e87c5c435d35d598bc28ea9f12^{commit} # timeout=10
Checking out Revision 69ee7369a54f55e87c5c435d35d598bc28ea9f12 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 69ee7369a54f55e87c5c435d35d598bc28ea9f12 # timeout=10
Commit message: "remove unnecessary added param for call is checked within function"
 > git rev-list --no-walk 6675990fcf757141c11dd257bb59f984d10fecb5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins5773827164776042572.sh
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (21.2.4)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (58.0.4)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.0)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.7.1)
running develop
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cpp
creating build/temp.linux-x86_64-3.8/cpp/nvtabular
creating build/temp.linux-x86_64-3.8/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+81.g69ee736 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+81.g69ee736 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+81.g69ee736 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO=0.6.0+81.g69ee736 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
creating build/lib.linux-x86_64-3.8
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/cpp/nvtabular/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-3.8/cpp/nvtabular/inference/fill.o -o build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.8/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Generating nvtabular/inference/triton/model_config_pb2.py from nvtabular/inference/triton/model_config.proto
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 0.6.0+81.g69ee736 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Processing dependencies for nvtabular==0.6.0+81.g69ee736
Searching for protobuf==3.17.3
Best match: protobuf 3.17.3
Adding protobuf 3.17.3 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for tensorflow-metadata==1.2.0
Best match: tensorflow-metadata 1.2.0
Processing tensorflow_metadata-1.2.0-py3.8.egg
tensorflow-metadata 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tensorflow_metadata-1.2.0-py3.8.egg
Searching for pyarrow==4.0.1
Best match: pyarrow 4.0.1
Adding pyarrow 4.0.1 to easy-install.pth file
Installing plasma_store script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for tqdm==4.61.2
Best match: tqdm 4.61.2
Processing tqdm-4.61.2-py3.8.egg
tqdm 4.61.2 is already the active version in easy-install.pth
Installing tqdm script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tqdm-4.61.2-py3.8.egg
Searching for numba==0.54.0
Best match: numba 0.54.0
Processing numba-0.54.0-py3.8-linux-x86_64.egg
numba 0.54.0 is already the active version in easy-install.pth
Installing pycc script to /var/jenkins_home/.local/bin
Installing numba script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg
Searching for pandas==1.2.5
Best match: pandas 1.2.5
Processing pandas-1.2.5-py3.8-linux-x86_64.egg
pandas 1.2.5 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg
Searching for distributed==2021.4.1
Best match: distributed 2021.4.1
Processing distributed-2021.4.1-py3.8.egg
distributed 2021.4.1 is already the active version in easy-install.pth
Installing dask-ssh script to /var/jenkins_home/.local/bin
Installing dask-scheduler script to /var/jenkins_home/.local/bin
Installing dask-worker script to /var/jenkins_home/.local/bin

Using /var/jenkins_home/.local/lib/python3.8/site-packages/distributed-2021.4.1-py3.8.egg
Searching for dask==2021.4.1
Best match: dask 2021.4.1
Processing dask-2021.4.1-py3.8.egg
dask 2021.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg
Searching for PyYAML==5.4.1
Best match: PyYAML 5.4.1
Processing PyYAML-5.4.1-py3.8-linux-x86_64.egg
PyYAML 5.4.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg
Searching for six==1.15.0
Best match: six 1.15.0
Adding six 1.15.0 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for googleapis-common-protos==1.53.0
Best match: googleapis-common-protos 1.53.0
Processing googleapis_common_protos-1.53.0-py3.8.egg
googleapis-common-protos 1.53.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/googleapis_common_protos-1.53.0-py3.8.egg
Searching for absl-py==0.12.0
Best match: absl-py 0.12.0
Processing absl_py-0.12.0-py3.8.egg
absl-py 0.12.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg
Searching for numpy==1.20.2
Best match: numpy 1.20.2
Adding numpy 1.20.2 to easy-install.pth file
Installing f2py script to /var/jenkins_home/.local/bin
Installing f2py3 script to /var/jenkins_home/.local/bin
Installing f2py3.8 script to /var/jenkins_home/.local/bin

Using /usr/local/lib/python3.8/dist-packages
Searching for setuptools==58.0.4
Best match: setuptools 58.0.4
Adding setuptools 58.0.4 to easy-install.pth file

Using /var/jenkins_home/.local/lib/python3.8/site-packages
Searching for llvmlite==0.37.0
Best match: llvmlite 0.37.0
Processing llvmlite-0.37.0-py3.8-linux-x86_64.egg
llvmlite 0.37.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/llvmlite-0.37.0-py3.8-linux-x86_64.egg
Searching for pytz==2021.1
Best match: pytz 2021.1
Adding pytz 2021.1 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for python-dateutil==2.8.2
Best match: python-dateutil 2.8.2
Adding python-dateutil 2.8.2 to easy-install.pth file

Using /usr/local/lib/python3.8/dist-packages
Searching for zict==2.0.0
Best match: zict 2.0.0
Processing zict-2.0.0-py3.8.egg
zict 2.0.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg
Searching for tornado==6.1
Best match: tornado 6.1
Processing tornado-6.1-py3.8-linux-x86_64.egg
tornado 6.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg
Searching for toolz==0.11.1
Best match: toolz 0.11.1
Processing toolz-0.11.1-py3.8.egg
toolz 0.11.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/toolz-0.11.1-py3.8.egg
Searching for tblib==1.7.0
Best match: tblib 1.7.0
Processing tblib-1.7.0-py3.8.egg
tblib 1.7.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg
Searching for sortedcontainers==2.4.0
Best match: sortedcontainers 2.4.0
Processing sortedcontainers-2.4.0-py3.8.egg
sortedcontainers 2.4.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg
Searching for psutil==5.8.0
Best match: psutil 5.8.0
Processing psutil-5.8.0-py3.8-linux-x86_64.egg
psutil 5.8.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg
Searching for msgpack==1.0.2
Best match: msgpack 1.0.2
Processing msgpack-1.0.2-py3.8-linux-x86_64.egg
msgpack 1.0.2 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/msgpack-1.0.2-py3.8-linux-x86_64.egg
Searching for cloudpickle==1.6.0
Best match: cloudpickle 1.6.0
Processing cloudpickle-1.6.0-py3.8.egg
cloudpickle 1.6.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/cloudpickle-1.6.0-py3.8.egg
Searching for click==8.0.1
Best match: click 8.0.1
Processing click-8.0.1-py3.8.egg
click 8.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/click-8.0.1-py3.8.egg
Searching for partd==1.2.0
Best match: partd 1.2.0
Processing partd-1.2.0-py3.8.egg
partd 1.2.0 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg
Searching for fsspec==2021.8.1
Best match: fsspec 2021.8.1
Processing fsspec-2021.8.1-py3.8.egg
fsspec 2021.8.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/fsspec-2021.8.1-py3.8.egg
Searching for HeapDict==1.0.1
Best match: HeapDict 1.0.1
Processing HeapDict-1.0.1-py3.8.egg
HeapDict 1.0.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg
Searching for locket==0.2.1
Best match: locket 0.2.1
Processing locket-0.2.1-py3.8.egg
locket 0.2.1 is already the active version in easy-install.pth

Using /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg
Finished processing dependencies for nvtabular==0.6.0+81.g69ee736
Running black --check
All done! ✨ 🍰 ✨
128 files would be left unchanged.
Running flake8
Running isort
Skipped 2 files
Running bandit
Running pylint
************* Module nvtabular.ops.categorify
nvtabular/ops/categorify.py:504:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)
************* Module nvtabular.ops.fill
nvtabular/ops/fill.py:67:15: I1101: Module 'nvtabular_cpp' has no 'inference' member, but source is unavailable. Consider adding this module to extension-pkg-allow-list if you want to perform analysis based on run-time introspection of living objects. (c-extension-no-member)


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
/usr/local/lib/python3.8/dist-packages/recommonmark/parser.py:75: UserWarning: Container node skipped: type=document
warn("Container node skipped: type={0}".format(mdnode.t))
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.12.1, forked-1.3.0, xdist-2.3.0
collected 1530 items / 1 skipped / 1529 selected

tests/unit/test_dask_nvt.py ............................................ [ 2%]
..................................................................... [ 7%]
tests/unit/test_io.py .................................................. [ 10%]
........................................................................ [ 15%]
..........ssssssss.....................................................s [ 20%]
s [ 20%]
tests/unit/test_notebooks.py ...... [ 20%]
tests/unit/test_tf4rec.py . [ 20%]
tests/unit/test_tools.py ...................... [ 22%]
tests/unit/test_triton_inference.py .............................. [ 23%]
tests/unit/columns/test_column_schemas.py .............................. [ 25%]
................................................... [ 29%]
tests/unit/columns/test_column_selector.py .................... [ 30%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 30%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 32%]
................................................... [ 35%]
tests/unit/framework_utils/test_torch_layers.py . [ 35%]
tests/unit/loader/test_dataloader_backend.py .. [ 35%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 38%]
........................................s.. [ 40%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 42%]
....................................................... [ 46%]
tests/unit/ops/test_column_similarity.py ........................ [ 47%]
tests/unit/ops/test_ops.py ............................................. [ 50%]
........................................................................ [ 55%]
........................................................................ [ 60%]
........................................................................ [ 64%]
........................................................................ [ 69%]
........................................................................ [ 74%]
................................................. [ 77%]
tests/unit/ops/test_ops_schema.py ...................................... [ 80%]
........................................................................ [ 84%]
........................................................................ [ 89%]
.......................... [ 91%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 91%]
tests/unit/workflow/test_workflow.py ................................... [ 93%]
.......................................................... [ 97%]
tests/unit/workflow/test_workflow_node.py ........... [ 98%]
tests/unit/workflow/test_workflow_ops.py .. [ 98%]
tests/unit/workflow/test_workflow_schemas.py ....................... [100%]

=============================== warnings summary ===============================
tests/unit/test_dask_nvt.py: 3 warnings
tests/unit/test_io.py: 24 warnings
tests/unit/test_tf4rec.py: 2 warnings
tests/unit/test_tools.py: 2 warnings
tests/unit/test_triton_inference.py: 5 warnings
tests/unit/loader/test_tf_dataloader.py: 50 warnings
tests/unit/loader/test_torch_dataloader.py: 16 warnings
tests/unit/ops/test_column_similarity.py: 7 warnings
tests/unit/ops/test_ops.py: 74 warnings
tests/unit/workflow/test_workflow.py: 31 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/var/jenkins_home/.local/lib/python3.8/site-packages/numba-0.54.0-py3.8-linux-x86_64.egg/numba/cuda/compiler.py:865: NumbaPerformanceWarning: �[1mGrid size (1) < 2 * SM count (112) will likely result in GPU under utilization due to low occupancy.�[0m
warn(NumbaPerformanceWarning(msg))

tests/unit/test_io.py::test_validate_dataset_bad_schema
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:1123: UserWarning: Unable to sample column dtypes to infer nvt.Dataset schema, schema is empty.
warnings.warn(

tests/unit/test_io.py: 96 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/init.py:38: DeprecationWarning: ColumnGroup is deprecated, use ColumnSelector instead
warnings.warn("ColumnGroup is deprecated, use ColumnSelector instead", DeprecationWarning)

tests/unit/test_io.py: 24 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
tests/unit/workflow/test_workflow_node.py: 1 warning
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/node.py:47: FutureWarning: The ["a", "b", "c"] >> ops.Operator syntax for creating a ColumnGroup has been deprecated in NVTabular 21.09 and will be removed in a future version.
warnings.warn(

tests/unit/test_io.py: 36 warnings
tests/unit/workflow/test_workflow.py: 44 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py:89: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for execution. Please use the client argument to initialize a Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 52 warnings
tests/unit/workflow/test_workflow.py: 35 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dask.py:372: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler will be used for this write operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/test_io.py: 36 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:515: UserWarning: A global dask.distributed client has been detected, but the single-threaded scheduler is being used for this shuffle operation. Please use the client argument to initialize a Dataset and/or Workflow object with distributed-execution enabled.
warnings.warn(

tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.1]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:125: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[f"{col}_filled"] = df[col].isna()

tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-parquet-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-0.1]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.01]
tests/unit/ops/test_ops.py::test_fill_median[True-True-op_columns1-csv-no-header-0.1]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:126: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = df[col].fillna(self.medians[col])

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_ops.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/indexing.py:1637: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:54: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[f"{col}_filled"] = df[col].isna()

tests/unit/ops/test_ops.py::test_fill_missing[True-True-parquet]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py:55: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[col] = df[col].fillna(self.fill_val)

tests/unit/ops/test_ops.py: 80 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/join_external.py:191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[tmp] = _arange(len(df), like_df=df, dtype="int32")

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-False]
tests/unit/ops/test_ops.py::test_groupby_op[keys0-True]
tests/unit/ops/test_ops.py::test_groupby_op[keys0-False]
tests/unit/ops/test_ops.py::test_groupby_op[id-True]
tests/unit/ops/test_ops.py::test_groupby_op[id-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/dask-2021.4.1-py3.8.egg/dask/dataframe/core.py:6610: UserWarning: Insufficient elements for head. 1 elements requested, only 0 elements available. Try passing larger npartitions to head.
warnings.warn(msg.format(n, len(r)))

tests/unit/workflow/test_cpu_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas-1.2.5-py3.8-linux-x86_64.egg/pandas/core/frame.py:3191: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]

-- Docs: https://docs.pytest.org/en/stable/warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

examples/multi-gpu-movielens/torch_trainer.py 65 0 6 1 99% 32->36
nvtabular/init.py 18 0 0 0 100%
nvtabular/columns/init.py 2 0 0 0 100%
nvtabular/columns/schema.py 209 17 103 20 88% 46->62, 49, 51, 53-56, 58, 98->109, 104, 147, 174, 260->267, 262, 263->265, 275, 292->297, 295->297, 308, 332, 339, 348, 351, 356->355
nvtabular/columns/selector.py 74 1 34 0 99% 121
nvtabular/dispatch.py 290 55 144 23 79% 36-40, 45-47, 53-63, 70-71, 114-116, 121-124, 128-133, 140, 159, 170, 176, 181->183, 194, 217-220, 259->261, 268, 271, 277, 293, 300, 331->336, 334, 337, 340->344, 377, 388-391, 417-420, 450, 454, 495, 519, 521, 528
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 78 90 15 39% 30, 99, 103, 114-130, 140, 143-158, 162, 166-167, 173-198, 207-217, 220-227, 229->233, 234, 239-279, 282
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 85 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 32 2 14 2 91% 50, 91
nvtabular/framework_utils/torch/models.py 45 1 28 4 93% 57->61, 87->89, 93->96, 103
nvtabular/framework_utils/torch/utils.py 75 5 30 5 90% 51->53, 64, 71->76, 75, 118-120
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 385 210 180 13 45% 82-86, 141-174, 195-218, 263-307, 338, 364-372, 380-387, 406, 428-444, 485-489, 527-537, 583-623, 629-645, 649-716, 723->726, 726->722, 762-772, 781, 791, 812, 818-844, 850-876, 883, 889->892, 893
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/model.py 176 176 98 0 0% 27-332
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/inference/triton/model_pt.py 101 101 40 0 0% 27-220
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 57 6 20 5 86% 22-23, 99, 103->107, 108, 110, 124
nvtabular/io/dask.py 183 18 72 11 87% 111, 114, 150, 235-246, 398, 408, 425->428, 436, 440->442, 442->438, 447, 449
nvtabular/io/dataframe_engine.py 61 5 28 6 88% 19-20, 50, 69, 88->92, 92->97, 94->97, 97->116, 125
nvtabular/io/dataset.py 364 76 176 28 76% 48-49, 259, 261, 274, 283, 303-317, 440->514, 445-448, 454-461, 466-510, 514->523, 574-575, 576->580, 623, 745, 747, 749, 755, 759-761, 763, 823-824, 851, 858-859, 865, 871, 967-968, 1085-1090, 1096, 1190, 1199
nvtabular/io/dataset_engine.py 24 1 0 0 96% 48
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 551 45 180 26 89% 34-35, 57, 76, 80->92, 89, 112, 122->127, 140, 142, 166->170, 173-179, 225-233, 248, 254, 272->274, 287, 306-316, 457-462, 500-505, 621->628, 689->694, 695-696, 816, 820, 824, 830, 862, 879, 883, 890->892, 1000->exit, 1010->1015, 1020->1030, 1035, 1057, 1080-1081
nvtabular/io/shuffle.py 31 6 16 5 77% 42, 44-45, 49, 59, 63
nvtabular/io/writer.py 175 13 68 5 92% 24-25, 51, 79, 125, 128, 212, 221, 224, 267, 288-290
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 330 13 140 11 95% 128, 143-144, 242->244, 254-258, 304-305, 344->348, 345->344, 419, 423-424, 454, 559, 567
nvtabular/loader/tensorflow.py 163 22 52 7 86% 58, 66-69, 84, 98, 308, 344, 359-361, 390-392, 402-410, 413-416
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 81 13 16 2 78% 25-27, 30-36, 111, 149-150
nvtabular/ops/init.py 22 0 0 0 100%
nvtabular/ops/add_metadata.py 9 0 0 0 100%
nvtabular/ops/bucketize.py 37 10 18 3 69% 53-55, 59->exit, 62-65, 84-87, 94
nvtabular/ops/categorify.py 626 67 334 48 86% 245, 247, 264, 268, 276, 284, 286, 313, 332-333, 357, 366, 377->381, 385-392, 474-475, 500-501, 622, 715, 733, 769, 847-848, 863-867, 868->832, 886, 894, 901->exit, 925, 928->931, 983, 988, 1010->1014, 1016->973, 1022-1025, 1037, 1041, 1043, 1050, 1055-1058, 1136, 1138, 1208->1231, 1214->1231, 1232-1237, 1274, 1293->1298, 1297, 1307->1304, 1312->1304, 1319, 1322, 1330-1340
nvtabular/ops/clip.py 18 2 6 3 79% 44, 52->54, 55
nvtabular/ops/column_similarity.py 118 25 38 5 74% 19-20, 78->exit, 108, 134, 198-199, 208-210, 218-234, 251->254, 255, 265
nvtabular/ops/data_stats.py 56 2 22 3 94% 91->93, 95, 97->87, 102
nvtabular/ops/difference_lag.py 31 1 8 1 95% 69->71, 94
nvtabular/ops/dropna.py 8 0 0 0 100%
nvtabular/ops/fill.py 91 12 36 3 82% 63-67, 93, 121, 147, 151, 162-165
nvtabular/ops/filter.py 20 1 6 1 92% 49
nvtabular/ops/groupby.py 119 3 70 4 96% 73, 84, 94->96, 106->111, 141
nvtabular/ops/hash_bucket.py 41 2 20 2 93% 72, 106->112, 118
nvtabular/ops/hashed_cross.py 36 4 15 3 86% 53, 66, 81, 91
nvtabular/ops/internal/init.py 3 0 0 0 100%
nvtabular/ops/internal/concat_columns.py 11 0 0 0 100%
nvtabular/ops/internal/identity.py 6 1 0 0 83% 42
nvtabular/ops/internal/subset_columns.py 13 1 0 0 92% 53
nvtabular/ops/join_external.py 92 18 36 7 76% 20-21, 114, 116, 118, 135-161, 177->179, 216->227, 221
nvtabular/ops/join_groupby.py 101 7 36 4 92% 108, 115, 124, 131->130, 215-216, 219-220
nvtabular/ops/lambdaop.py 39 6 18 6 79% 59, 63, 77, 89, 94, 103
nvtabular/ops/list_slice.py 66 24 26 1 58% 21-22, 53-54, 104-118, 126-137
nvtabular/ops/logop.py 13 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 81 10 14 1 86% 70, 78-79, 85, 118-119, 141-142, 146, 157
nvtabular/ops/operator.py 66 1 14 1 98% 111
nvtabular/ops/rename.py 41 3 22 3 90% 47, 88-90
nvtabular/ops/stat_operator.py 8 0 0 0 100%
nvtabular/ops/target_encoding.py 153 11 66 4 91% 167->171, 175->184, 232-233, 236-237, 249-255, 346->349, 362
nvtabular/tags.py 16 0 0 0 100%
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 236 1 62 1 99% 321
nvtabular/tools/dataset_inspector.py 50 7 18 1 79% 32-39
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 102 43 46 8 52% 31-32, 36-37, 50, 61-62, 64-66, 69, 72, 78, 84, 90-126, 145, 149->153
nvtabular/worker.py 82 5 38 7 90% 24-25, 82->99, 91, 92->99, 99->102, 108, 110, 111->113
nvtabular/workflow/init.py 2 0 0 0 100%
nvtabular/workflow/node.py 240 18 116 10 89% 55, 93->98, 146, 248->252, 288, 302, 311, 329-334, 339, 388-389, 400->395, 453-458
nvtabular/workflow/workflow.py 221 15 112 7 93% 28-29, 47, 139, 195, 222-224, 332, 347-348, 366-367, 502, 514

TOTAL 7554 1486 3049 346 78%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 77.69%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': No module named 's3fs'
SKIPPED [8] tests/unit/test_io.py:555: could not import 'uavro': No module named 'uavro'
SKIPPED [2] tests/unit/test_io.py:914: Dask>=2021.07.1 required for file aggregation
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:521: not working correctly in ci environment
========= 1519 passed, 12 skipped, 776 warnings in 2097.73s (0:34:57) ==========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins8434363626674895033.sh

Copy link
Contributor

@karlhigley karlhigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! There's one stray chunk of code commented out in the middle, but I'm approving and merging anyway in the interest of time. We can submit a second PR to remove the comments.

Comment on lines +583 to +586
# nodes = list(set(nvt.workflow.node.iter_nodes([output_node])))
# for current in reversed(nodes):
# if current.op and hasattr(current.op, "get_embedding_sizes"):
# output.update(current.op.get_embedding_sizes(current.output_schema.column_names))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# nodes = list(set(nvt.workflow.node.iter_nodes([output_node])))
# for current in reversed(nodes):
# if current.op and hasattr(current.op, "get_embedding_sizes"):
# output.update(current.op.get_embedding_sizes(current.output_schema.column_names))
# nodes = list(set(nvt.workflow.node.iter_nodes([output_node])))
# for current in reversed(nodes):
# if current.op and hasattr(current.op, "get_embedding_sizes"):
# output.update(current.op.get_embedding_sizes(current.output_schema.column_names))

@karlhigley karlhigley merged commit a4b3def into NVIDIA-Merlin:main Sep 21, 2021
Comment on lines -1102 to +1120
sampled_dtypes = self.sample_dtypes(n)
dtypes = dict(zip(sampled_dtypes.index, sampled_dtypes))
_ddf = self.to_ddf()
dtypes = {
col_name: {"dtype": dtype, "is_list": False}
for col_name, dtype in _ddf.dtypes.items()
}
for partition_index in range(_ddf.npartitions):
_head = _ddf.partitions[partition_index].head(n)

if len(_head):
for col in _head.columns:
dtypes[col] = {
"dtype": dispatch._list_val_dtype(_head[col]) or _head[col].dtype,
"is_list": dispatch._is_list_dtype(_head[col]),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a special optimization in #1119 to avoid loading any of the partitions from remote storage (which is super slow). It looks like this change will now skip that optimization. I'll need to double check that this introduces a performance regression in the criteo benchmark on GPC. If it does, I suggest we fix this asap.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I addressed the previous comment in #1119. While making that change, I also realized that the highlighted code above will always read every partition in the dataset (which I am assuming is due to a missing break statement).

mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
* almost completely working embeddings sizes

* get embedding sizes now working

* fix error in test logic

* fix bugs in tests in ops

* joinexternal now casts all to dataset to infer and propagate schema

* remove unnecessary added param for call is checked within function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants