Handle hive-partitioning in NVTabular.dataset.Dataset #677

rjzamora · 2021-03-26T17:56:20Z

Closes #642
Addresses global shuffle component of #641

Thu purpose of this PR is to improve handling of hive-partitioned parquet data in NVTabular. Since the Dataset API already uses dask.dataframe.read_parquet, there is currenlty no "correcness" issue with reading hive-partitioned data. However, (1) there is no convenient mechanism to write hive-partitioned data, and (2) the read stage typically results in many small partitions (rather than a single partition for each input directory).

Solution to (1): The Dataset.to_parquet method now supports a partition_on= argument. This is designed to match the same option in dask.dataframe/dask_cudf. If the user passes a list of 1+ columns with this argument, the output data will be shuffled at IO time into a distinct directory for each unique combination of those partition_on column values. When multiple columns are use for partitioning (e.g. ["month", "day"]), the directory structure is nested (so that the full path for an output file will look something like "/month=Mar/day=30/part.0.parquet").
Solution to (2): Since [FEA] Sequential / Session-based recommendation and time series support - Group by sorting values by timestamp #641 will need a mechanism to ensure a unique mapping between specified column groups and ddf partitions, this PR adds a Dataset.partition_by_keys method to perform a global shuffle on the specified column group (keys) and return a new (shuffled) Dataset. For general Dataset objects, this method will simply call ddf.shuffle() under the hood. For Dataset objects that are backed by hive-partitioned data, however, we use the metadata stored in the file paths to avoid a full shuffle. In the future, this optimization can be pushed even further by directly agregating all IO tasks within the same hive-partition. However, I suspect that shuch an optimization should be implemented in dask.dataframe.

Example Usage

import pandas as pd
import dask.dataframe as dd
import dask
import nvtabular as nvt

path = "fake.data"

# Create a sample ddf
ddf = dask.datasets.timeseries(
    start="2000-01-01",
    end="2000-01-03",
    freq="600s",
    partition_freq="6h",
    seed=42,
).reset_index()
ddf['timestamp'] = ddf['timestamp'].dt.round('D').dt.day

# Convert to a Datset and write out hive-partitioned data to disk
keys = ["timestamp", "name"]
nvt.Dataset(ddf).to_parquet(path, partition_on=keys)

This will produce a directory structure like:

$ find fake.data/ -type d -print
fake.data/
fake.data/timestamp=1
fake.data/timestamp=1/name=Alice
fake.data/timestamp=1/name=Frank
fake.data/timestamp=1/name=Victor
fake.data/timestamp=1/name=George
fake.data/timestamp=1/name=Quinn
fake.data/timestamp=1/name=Kevin
fake.data/timestamp=1/name=Ursula
...

Then, you can read the data back in with NVT, and ensure that the ddf partitions are shuffled by keys:

ds = nvt.Dataset(path, engine="parquet").shuffle_by_keys(keys)
ds.to_ddf().compute()

      id         x         y timestamp    name
0    991 -0.750009 -0.587392         1   Alice
1   1022  0.866823 -0.682096         1   Alice
2    991  0.467775  0.683211         1   Alice
3    967  0.534984 -0.931405         1     Bob
4    991 -0.149716 -0.651939         1     Bob
..   ...       ...       ...       ...     ...
25   964  0.843602  0.598580         3  Yvonne
26   961  0.853070 -0.987596         3  Yvonne
27   947  0.934162  0.190069         3  Yvonne
28  1024 -0.107280  0.662606         3  Yvonne
29  1006  0.169090 -0.784889         3   Zelda

[288 rows x 5 columns]

rjzamora · 2021-03-26T17:56:42Z

cc @gabrielspmoreira @benfred

nvidia-merlin-bot · 2021-03-26T18:06:31Z

Click to view CI Results

GitHub pull request #677 of commit 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3, no merge conflicts.
Running as SYSTEM
Setting status of 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1989/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3^{commit} # timeout=10
Checking out Revision 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 # timeout=10
Commit message: "expand testing and fix bug"
 > git rev-list --no-walk 1f60ba950f935d104c7c8fa21742158698eba3eb # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins427843502843120314.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
93 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 763 items / 2 skipped / 761 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 38%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 45%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py ...................                     [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 76%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ......F..........FF..........s       [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-41/test_empty_cols_label_name0_co0')

df =       name-cat name-string    id  label         x         y

0     Patricia     Michael   990   1002 -0.005722  0.54568...n  1004   1075  0.821929 -0.615211

4320       Ray      Xavier  1007    986 -0.643196  0.034432
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7cbc73d070>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:73:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f7cbc73dbe0>

dask_stats = x             

y     -0.020037743

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

______________________ test_gpu_dl[None-parquet-10-0.06] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-41/test_gpu_dl_None_parquet_10_0_1')

df =       name-cat name-string    id  label         x         y

0     Patricia     Michael   990   1002 -0.005722  0.54568...n  1004   1075  0.821929 -0.615211

4320       Ray      Xavier  1007    986 -0.643196  0.034432
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7d92066eb0>

batch_size = 10, part_mem_fraction = 0.06, engine = 'parquet', devices = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, GPU_DEVICE_IDS[:2]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:106:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f7da2135e50>

dask_stats = x             

y     -0.020037743

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_gpu_dl[None-parquet-100-0.001] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-41/test_gpu_dl_None_parquet_100_00')

df =       name-cat name-string    id  label         x         y

0     Patricia     Michael   990   1002 -0.005722  0.54568...n  1004   1075  0.821929 -0.615211

4320       Ray      Xavier  1007    986 -0.643196  0.034432
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7da21238e0>

batch_size = 100, part_mem_fraction = 0.001, engine = 'parquet', devices = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, GPU_DEVICE_IDS[:2]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:106:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f7d920667f0>

dask_stats = x             

y     -0.020037743

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      2     30      6    91%   44, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          245     30    112     20    85%   254, 256, 269, 278, 296-310, 409->474, 414-417, 422->432, 427-428, 439->437, 453->457, 503, 604->606, 606->615, 616, 623-624, 630, 636, 731-732, 844-849, 855, 880

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     27    150     12    93%   83-91, 136-143, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      269     15    112      9    94%   77-78, 107, 124, 131-132, 143, 203->205, 220, 243-244, 283->287, 358, 362-363, 457, 464

nvtabular/loader/tensorflow.py                                   120      8     52      7    90%   52, 60-63, 73, 83, 282, 309->313, 342

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4898   1012   2022    224    77%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.62%

=========================== short test summary info ============================

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-0.001]

============ 3 failed, 750 passed, 12 skipped in 587.86s (0:09:47) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7476407945782677737.sh

nvidia-merlin-bot · 2021-03-26T18:16:43Z

Click to view CI Results

GitHub pull request #677 of commit 89f29ce11125543cc1b4ca94b74cbbe0d4583adc, no merge conflicts. Running as SYSTEM Setting status of 89f29ce11125543cc1b4ca94b74cbbe0d4583adc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1990/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 89f29ce11125543cc1b4ca94b74cbbe0d4583adc^{commit} # timeout=10 Checking out Revision 89f29ce11125543cc1b4ca94b74cbbe0d4583adc (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 89f29ce11125543cc1b4ca94b74cbbe0d4583adc # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins1838599612458303638.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 94 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 770 items / 2 skipped / 768 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 67%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 76%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .............................s [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 245 30 112 20 85% 254, 256, 269, 278, 296-310, 409->474, 414-417, 422->432, 427-428, 439->437, 453->457, 503, 604->606, 606->615, 616, 623-624, 630, 636, 731-732, 844-849, 855, 880
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 278 17 114 10 93% 84-85, 113, 130, 137-138, 149, 214->216, 224-228, 238, 261-262, 301->305, 376, 380-381, 475, 482
nvtabular/loader/tensorflow.py 120 11 52 7 88% 52, 60-63, 73, 83, 287, 302-304, 314->318, 347
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4907 1019 2024 226 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.53%

================= 759 passed, 13 skipped in 596.36s (0:09:56) ==================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5509550321025932911.sh

gabrielspmoreira · 2021-03-26T20:34:35Z

Sounds great @rjzamora ! This PR will allow incremental training and evaluation of sequential recommender models and time series, as allows splitting data by time windows.

nvidia-merlin-bot · 2021-03-29T20:33:57Z

Click to view CI Results

GitHub pull request #677 of commit 0eedde3f05679f9948ceda0136fbe06b704e6c6b, no merge conflicts.
Running as SYSTEM
Setting status of 0eedde3f05679f9948ceda0136fbe06b704e6c6b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1999/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 0eedde3f05679f9948ceda0136fbe06b704e6c6b^{commit} # timeout=10
Checking out Revision 0eedde3f05679f9948ceda0136fbe06b704e6c6b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0eedde3f05679f9948ceda0136fbe06b704e6c6b # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 2dcc4a9ed07a2ff8254449e1291ebc2e4281ddbf # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5306605500818761616.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
94 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 770 items / 2 skipped / 768 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 67%]

tests/unit/test_tf_dataloader.py ..FFFFFFFFFFF............s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 76%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py .............................s       [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_1_parquet_1')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7febf0c63e50>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7febf0c63310>

dask_stats = x       

y       

id    1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fec887571f0>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7febf0c630a0>

dask_stats = x       

y       

id    1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_10_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fecbafaa310>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fecbafaa7c0>

dask_stats = x     0.017360521

y            

id         1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_100_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fecbb15c2e0>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fecbb0f15e0>

dask_stats = x           

y     0.00956418

id        1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[True-100-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_100_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fecbb067370>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fecbb067c10>

dask_stats = x       

y       

id    1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_1_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7fecbafb3a90>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fecbb04f700>

dask_stats = x       

y       

id    1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_1_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7fec886ae6d0>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fec886aec40>

dask_stats = x       

y       

id    1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_10_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7fec88757a90>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fecbaee6430>

dask_stats = x       

y       

id    1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_10_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7febf0dd63d0>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7febf0d823d0>

dask_stats = x       

y       

id    1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_100_parqu0')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7febf0d714c0>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fecb90ed910>

dask_stats = x           

y     0.00956418

id        1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_100_parqu1')

paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7fec002e6a90>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fec002e63a0>

dask_stats = x     0.017360521

y            

id         1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          245     30    112     20    85%   254, 256, 269, 278, 296-310, 409->474, 414-417, 422->432, 427-428, 439->437, 453->457, 503, 604->606, 606->615, 616, 623-624, 630, 636, 731-732, 844-849, 855, 880

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      278     25    114     11    90%   79, 84-85, 113, 130, 137-138, 142-146, 149, 214-218, 224-228, 238, 261-262, 301->305, 376, 380-381, 475, 482

nvtabular/loader/tensorflow.py                                   120     12     52      8    87%   52, 60-63, 73, 83, 287, 294, 302-304, 314->318, 347

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4907   1028   2024    228    76%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.31%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]

============ 11 failed, 748 passed, 13 skipped in 587.95s (0:09:47) ============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins5032331935929203304.sh

nvidia-merlin-bot · 2021-03-31T00:57:33Z

Click to view CI Results

GitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts.
Running as SYSTEM
Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2011/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10
Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 42f58af7c1b8c1b29f31c482329dbf6bdd410c24 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5873751992607622620.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py .FFFFFFFFFFFF............s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ......FF..FF..FFFFFFFFFF.ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_1_parquet_0')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff031327790>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0312d2540>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff033a150d0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_1_parquet_1')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fef78727a90>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0286909c0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef78727ee0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fef787a6e80>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0313a5c40>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef787a62b0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_10_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff028119880>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff031361840>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff028119a00>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_100_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff018374580>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0402afec0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff018374a00>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[True-100-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_100_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fef782ef970>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fef782efdc0>

dask_stats = x     0.017761632

y            

id         1000.5

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_1_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff0f60ad790>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff03825e6c0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0f60ad130>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_1_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff0185432e0>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0313a5f40>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff018543790>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_10_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff018543400>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0402b11c0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0185435b0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_10_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff0280bc910>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0f38faac0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0280bc040>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_100_parqu0')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7fef784c2940>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff04029aac0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef784c2cd0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_100_parqu1')

paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff02817b1c0>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0402c5840>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff02817b6a0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_empty_cols_label_name0_co0')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff028287c40>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0280a5b40>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0282876a0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________ test_empty_cols[label_name0-cont_names0-cat_names1-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_empty_cols_label_name0_co1')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7fef58429520>

engine = 'parquet', cat_names = [], cont_names = ['x', 'y', 'id']

label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fef584295b0>

dask_stats = x     0.017761632

y            

id         1000.5

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_________ test_empty_cols[label_name1-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_empty_cols_label_name1_co0')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff018292610>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = []
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff02829f840>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef782d4e50>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________ test_empty_cols[label_name1-cont_names0-cat_names1-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_empty_cols_label_name1_co1')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105c96460>

engine = 'parquet', cat_names = [], cont_names = ['x', 'y', 'id']

label_name = []
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff028081840>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105c964f0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

__________________ test_gpu_dl_break[None-parquet-1000-0.001] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_break_None_parquet0')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105d580d0>

batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff040281d40>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105d58f70>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

__________________ test_gpu_dl_break[None-parquet-1000-0.06] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_break_None_parquet1')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105c6d7f0>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff038a97440>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105c6d5b0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

___________________ test_gpu_dl_break[0-parquet-1000-0.001] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_break_0_parquet_100')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105f296a0>

batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = 0
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff033a66a40>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105f29490>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_gpu_dl_break[0-parquet-1000-0.06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_break_0_parquet_101')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff018782790>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = 0
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff038228240>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105f6e7f0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_gpu_dl[None-parquet-1000-0.001] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_None_parquet_1000_0')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105ed7ac0>

batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff040284f40>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105ed7940>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_gpu_dl[None-parquet-1000-0.06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_None_parquet_1000_1')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7fef5840f4f0>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff040284ac0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef5840f2e0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

______________________ test_gpu_dl[0-parquet-1000-0.001] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_0_parquet_1000_0_00')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff1059ff880>

batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = 0
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff02829f4c0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff1059ff790>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_______________________ test_gpu_dl[0-parquet-1000-0.06] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_0_parquet_1000_0_01')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff1059fffd0>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = 0
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff028091ec0>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff1059cfc40>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________________________ test_kill_dl[parquet-0.001] __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_kill_dl_parquet_0_001_0')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff0185bccd0>

part_mem_fraction = 0.001, engine = 'parquet'
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:238:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0522cce40>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0185bca00>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

__________________________ test_kill_dl[parquet-0.1] ___________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_kill_dl_parquet_0_1_0')

df =       name-cat name-string    id  label         x         y

0        Quinn        Gary  1020    996  0.672772 -0.59399...t  1027   1048  0.668489 -0.437601

4320    Xavier     Michael   996   1040  0.471682  0.617638
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105f735e0>

part_mem_fraction = 0.1, engine = 'parquet'
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:238:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff031386740>

fill_value = 1000.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105f73670>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          247     31    114     21    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     28    114     13    89%   79, 84-85, 113, 130, 137-138, 142-146, 149, 211, 217-221, 227-231, 241, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     12     52      8    87%   52, 60-63, 73, 83, 286, 293, 301-303, 313->317, 346

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4910   1032   2026    231    76%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.23%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names1-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names0-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names1-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.001]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[0-parquet-1000-0.001]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[0-parquet-1000-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1000-0.001]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1000-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[0-parquet-1000-0.001]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[0-parquet-1000-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.001] - Typ...

FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.1] - TypeE...

============ 26 failed, 729 passed, 14 skipped in 490.99s (0:08:10) ============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins292754657294673574.sh

rjzamora · 2021-03-31T16:12:40Z

Is it possible that the dataloader CI failures are being caused by this PR?

benfred · 2021-03-31T16:37:22Z

Is it possible that the dataloader CI failures are being caused by this PR?

I don't think so - we've had some flaky tests around this for a while now (#397) , but for some reason these errors seem more common now

benfred · 2021-03-31T16:37:27Z

rerun tests

nvidia-merlin-bot · 2021-03-31T16:46:03Z

Click to view CI Results

GitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts.
Running as SYSTEM
Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2015/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10
Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 40a7f2b1c8e4e6743f499c4904ba1db32fbba0a2 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6076946174219903460.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py .........................s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ......F...F..............ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_empty_cols_label_name0_co0')

df =       name-cat name-string    id  label         x         y

0        Frank         Bob   977   1050 -0.442815  0.69365...h   983    945  0.348971  0.700293

4320   Norbert       Zelda  1013   1014 -0.623388 -0.007252
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f1f101b2b50>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f1f103531f0>

dask_stats = x     0.01028773

y           

id        1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_________ test_empty_cols[label_name1-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_empty_cols_label_name1_co0')

df =       name-cat name-string    id  label         x         y

0        Frank         Bob   977   1050 -0.442815  0.69365...h   983    945  0.348971  0.700293

4320   Norbert       Zelda  1013   1014 -0.623388 -0.007252
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f1ef07ba2e0>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = []
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f1ef00b8730>

dask_stats = x     0.01028773

y           

id        1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          247     31    114     21    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 286, 301-303, 313->317, 346

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4910   1020   2026    227    77%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.51%

=========================== short test summary info ============================

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names0-parquet]

============ 2 failed, 753 passed, 14 skipped in 493.10s (0:08:13) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins274464265043646598.sh

benfred · 2021-03-31T16:53:27Z

rerun tests

nvidia-merlin-bot · 2021-03-31T17:02:03Z

Click to view CI Results

GitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts. Running as SYSTEM Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2016/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10 Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins2330873152729625782.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .........................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 247 31 114 21 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 286, 301-303, 313->317, 346
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4910 1020 2026 227 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.51%

================= 755 passed, 14 skipped in 493.61s (0:08:13) ==================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1033248889434797742.sh

nvidia-merlin-bot · 2021-03-31T18:00:37Z

Click to view CI Results

GitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts.
Running as SYSTEM
Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2017/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10
Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4907412045382266287.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py .........................s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ..........F..............ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_________ test_empty_cols[label_name1-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_empty_cols_label_name1_co0')

df =      name-cat name-string    id  label         x         y

0     Michael     Norbert  1006   1055 -0.572479  0.016072

...ra   999    989 -0.009399 -0.914909

4320    Jerry       Quinn  1008   1012  0.955115  0.879404
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f044c262700>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = []
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f044c7dc130>

dask_stats = x     -0.015519284

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          247     31    114     21    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 288, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4910   1020   2026    227    77%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.51%

=========================== short test summary info ============================

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names0-parquet]

============ 1 failed, 754 passed, 14 skipped in 492.20s (0:08:12) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins2995186208006081316.sh

rjzamora · 2021-03-31T19:15:49Z

rerun tests

nvidia-merlin-bot · 2021-03-31T19:24:55Z

Click to view CI Results

GitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts.
Running as SYSTEM
Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2018/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10
Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1242628838666643737.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py ...F...FFFF.F............s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py .........................ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7f2b02ef0cd0>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f2b02bd7a60>

dask_stats = x     -0.009783547

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_1_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f2b02cc86d0>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f2ac83b0d90>

dask_stats = x     -0.009783547

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_1_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f2ad0039f40>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f2b02cbba60>

dask_stats = x     -0.009783547

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_10_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f2b02eee100>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f2b02cceb20>

dask_stats = x     -0.009783547

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_10_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f2ad03eafa0>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f2ad03eaca0>

dask_stats = x     -0.009783547

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_100_parqu1')

paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f2b02cb4670>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f2ad04a1b50>

dask_stats = x     -0.009783547

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          247     31    114     21    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 288, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4910   1020   2026    227    77%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.51%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]

============ 6 failed, 749 passed, 14 skipped in 522.71s (0:08:42) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins1434248472016933687.sh

karlhigley · 2021-03-31T19:42:33Z

@jperez999 This CI error looks familiar and I think you tried to explain it to me, but I didn't fully understand what's causing it. Thoughts on how to resolve it?

jperez999 · 2021-03-31T20:31:17Z

@karlhigley ok so I think... and follow me on this. So the reason for the error is that the median stat operator collects a median value of NA for the continuous columns (sometimes both x and y, sometimes just one of them). When we go to apply the NA value we hit the error we are seeing. I think this is because of the way we setup the dataset and the beginning here https://github.com/NVIDIA/NVTabular/blob/main/tests/conftest.py#L108-L112 This inserts the NA values, and I think we are seeing the random positions coincidentally land on the median indexes.

jperez999 · 2021-03-31T20:31:25Z

rerun tests

nvidia-merlin-bot · 2021-03-31T20:39:55Z

Click to view CI Results

GitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts.
Running as SYSTEM
Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2019/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10
Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3416488090804215825.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py .........................s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ......F..................ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_empty_cols_label_name0_co0')

df =       name-cat name-string    id  label         x         y

0     Patricia       Alice  1024   1006  0.903991  0.86151...n  1009   1065 -0.355152 -0.101055

4320     Jerry         Dan   944    981 -0.677558 -0.940314
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f921003e5b0>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f9210525a90>

dask_stats = x       

y       

id    1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          247     31    114     21    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 288, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4910   1020   2026    227    77%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.51%

=========================== short test summary info ============================

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]

============ 1 failed, 754 passed, 14 skipped in 490.05s (0:08:10) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins913225217622897003.sh

nvidia-merlin-bot · 2021-03-31T22:15:08Z

Click to view CI Results

GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2026/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 2d552f6806f843cc0da94110e76e281e1db982b8 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5070192051355054301.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py .FFFFFFFFFFFF............s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ......FF..FF..FFFFFFFFFF.ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_1_parquet_0')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7b284670a0>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7ac25d5140>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7b28524f10>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_1_parquet_1')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a94109970>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b29681a40>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a94109d00>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a9429d2e0>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b2849aa40>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a9429d760>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_10_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a9478c6d0>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7be41e67c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a9478caf0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_100_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a9444da30>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7be42b80c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a9444dd60>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[True-100-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_100_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a946f0d00>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7be41f28c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a946f0250>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_1_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab8436190>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b9c0760c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7ab85ffdc0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_1_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab855e3d0>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7bfaceb7c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7ab855e5e0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_10_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab8277940>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b536ddb40>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7ab8277d60>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_10_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a94388ca0>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b9c0760c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a94388070>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_100_parqu0')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a942fc6d0>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7bf7ee96c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a942fc4c0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_100_parqu1')

paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a7459dcd0>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b9e66d040>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a7459ddf0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_empty_cols_label_name0_co0')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a94660160>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28a32340>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a94660910>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________ test_empty_cols[label_name0-cont_names0-cat_names1-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_empty_cols_label_name0_co1')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a6c71f3d0>

engine = 'parquet', cat_names = [], cont_names = ['x', 'y', 'id']

label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28e75c40>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7b28492df0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________ test_empty_cols[label_name1-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_empty_cols_label_name1_co0')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a74190130>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = []
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b53645d40>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a74691400>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________ test_empty_cols[label_name1-cont_names0-cat_names1-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_empty_cols_label_name1_co1')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09cf1a00>

engine = 'parquet', cat_names = [], cont_names = ['x', 'y', 'id']

label_name = []
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28d08b40>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09cf16d0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

__________________ test_gpu_dl_break[None-parquet-1000-0.001] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_break_None_parquet0')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab817a1f0>

batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28fe56c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a9470e130>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

__________________ test_gpu_dl_break[None-parquet-1000-0.06] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_break_None_parquet1')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09ef5e20>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28fef040>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09ef5ca0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

___________________ test_gpu_dl_break[0-parquet-1000-0.001] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_break_0_parquet_100')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7bfab41670>

batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = 0
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28fd1040>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09d7da00>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

____________________ test_gpu_dl_break[0-parquet-1000-0.06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_break_0_parquet_101')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09da6dc0>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = 0
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28b70dc0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09da6a90>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_gpu_dl[None-parquet-1000-0.001] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_None_parquet_1000_0')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab8627cd0>

batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28c6e940>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7ab8627b20>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_____________________ test_gpu_dl[None-parquet-1000-0.06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_None_parquet_1000_1')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a94698400>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b289ee140>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a94698ac0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

______________________ test_gpu_dl[0-parquet-1000-0.001] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_0_parquet_1000_0_00')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a944ce2e0>

batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = 0
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b288775c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a944cee50>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_______________________ test_gpu_dl[0-parquet-1000-0.06] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_0_parquet_1000_0_01')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09b149a0>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = 0
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28488b40>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09b14790>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

_________________________ test_kill_dl[parquet-0.001] __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_kill_dl_parquet_0_001_0')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09ce1e20>

part_mem_fraction = 0.001, engine = 'parquet'
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:238:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b289eebc0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09ce1f10>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64

__________________________ test_kill_dl[parquet-0.1] ___________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_kill_dl_parquet_0_1_0')

df =      name-cat name-string    id  label         x         y

0       Kevin       Laura   968    999  0.526043  0.837925

...er  1025   1028 -0.103355  0.852146

4320    Laura       Sarah   988   1001  0.645996 -0.767400
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09b5fe50>

part_mem_fraction = 0.1, engine = 'parquet'
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:238:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:144: in fit

results = dask.compute(stats, scheduler="synchronous")[0]

/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute

results = schedule(dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync

return get_async(apply_sync, 1, dsk, keys, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async

fire_task()

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task

apply_async(

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync

res = func(args, **kwds)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task

result = pack_exception(e, dumps)

/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task

result = _execute_task(task, data)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func((_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get

result = _execute_task(task, cache)

/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task

return func(*(_execute_task(a, cache) for a in args))

/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply

return func(*args, **kwargs)

/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce

df = func(*args, **kwargs)

nvtabular/workflow.py:336: in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

nvtabular/ops/fill.py:98: in transform

df[col] = df[col].fillna(self.medians[col])

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna

return super().fillna(

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7bf7ee94c0>

fill_value = 1002.5, method = None, dtype = None, fill_nan = True
def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):


          raise TypeError(


                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError

------------------------------ Captured log call -------------------------------

ERROR    nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09b5fd60>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition

df = column_group.op.transform(column_group.input_column_names, df)

File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner

return func(*args, **kwds)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform

df[col] = df[col].fillna(self.medians[col])

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna

return super().fillna(

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna

copy_data[name] = copy_data[name].fillna(value[name], method)

File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna

raise TypeError(

TypeError: Cannot safely cast non-equivalent float to int64
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          249     32    116     22    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     28    114     13    89%   79, 84-85, 113, 130, 137-138, 142-146, 149, 211, 217-221, 227-231, 241, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     12     52      8    87%   52, 60-63, 73, 83, 288, 295, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4912   1033   2028    232    76%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.21%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names1-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names0-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names1-parquet]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.001]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[0-parquet-1000-0.001]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[0-parquet-1000-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1000-0.001]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1000-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[0-parquet-1000-0.001]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[0-parquet-1000-0.06]

FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.001] - Typ...

FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.1] - TypeE...

============ 26 failed, 729 passed, 14 skipped in 503.04s (0:08:23) ============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins2439974284389972750.sh

benfred · 2021-03-31T22:27:28Z

rerun tests

nvidia-merlin-bot · 2021-03-31T22:36:14Z

Click to view CI Results

GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2027/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3656656512144996904.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py .........................s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ...............F.........ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

__________________ test_gpu_dl_break[None-parquet-1000-0.06] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-12/test_gpu_dl_break_None_parquet1')

df =       name-cat name-string    id  label         x         y

0          Ray       Alice  1007   1009 -0.193376 -0.84207...e   984    985 -0.402427 -0.005272

4320       Tim        Gary   986    950 -0.625596 -0.488068
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7faf8df8df40>

batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None
@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  processor.fit_transform(dataset).to_parquet(


        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fafe2c8c520>

dask_stats = x     0.034901721

y            

id         1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          249     32    116     22    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 288, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4912   1021   2028    228    76%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.50%

=========================== short test summary info ============================

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.06]

============ 1 failed, 754 passed, 14 skipped in 504.12s (0:08:24) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins8202843174893632191.sh

benfred · 2021-04-01T00:41:17Z

@karlhigley @jperez999 @rjzamora I think the NA failure in the unittests is unrelated to this PR - I spent some time debugging this and left my thoughts here #687 (comment) . That PR has a 'fix' - but would like to figure out why this is happening

benfred · 2021-04-01T00:41:24Z

rerun tests

nvidia-merlin-bot · 2021-04-01T00:55:49Z

Click to view CI Results

GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2030/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 78cea240d94600c01b749619aaaa33154ad88555 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1600729026063402115.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py ..FF....F................s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py .........................ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-15/test_tf_gpu_dl_True_1_parquet_1')

paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fd7d86292b0>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fd7d8629940>

dask_stats = x     0.002405407

y            

id         1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-15/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7fd7d850aaf0>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fd7d86c3640>

dask_stats = x     0.002405407

y            

id         1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-15/test_tf_gpu_dl_False_1_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7fd7d8638130>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7fd79c14a160>

dask_stats = x     0.002405407

y            

id         1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          249     32    116     22    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 288, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4912   1021   2028    228    76%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.50%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]

============ 3 failed, 752 passed, 14 skipped in 501.99s (0:08:21) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins5394768993438524972.sh

benfred · 2021-04-01T03:56:22Z

rerun tests

nvidia-merlin-bot · 2021-04-01T04:05:13Z

Click to view CI Results

GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2032/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 4a11db2df80cadc1a37a6e53798d2aed3855e556 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5352641028358161637.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py ...F.....F...............s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ......F..................ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-17/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-17/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-17/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7f17ba38c5b0>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f17ba38c400>

dask_stats = x       

y       

id    1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-17/test_tf_gpu_dl_False_10_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-17/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-17/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7f17ba3d6c40>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f17ba43bd30>

dask_stats = x       

y       

id    1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-17/test_empty_cols_label_name0_co0')

df =      name-cat name-string    id  label         x         y

0       Zelda       Frank  1036    981  0.512298  0.349215

...er   989   1017  0.568488  0.312415

4320    Alice      Hannah  1009   1056 -0.233545 -0.972603
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7f1903937bb0>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7f190393b220>

dask_stats = x       

y       

id    1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          249     32    116     22    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 288, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4912   1021   2028    228    76%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.50%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]

============ 3 failed, 752 passed, 14 skipped in 506.96s (0:08:26) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins9009968696599090121.sh

benfred · 2021-04-01T04:54:18Z

rerun tests

nvidia-merlin-bot · 2021-04-01T05:03:06Z

Click to view CI Results

GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2038/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk d3a6f11b0464ef8d4e1ccbffd2d8900bbc8309d0 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins4894229412186291544.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py .........................s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py ......F..................ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-21/test_empty_cols_label_name0_co0')

df =      name-cat name-string    id  label         x         y

0     Michael      Xavier  1034   1053  0.740279 -0.619412

...ah   994   1014 -0.397221  0.085056

4320    Jerry      Oliver   975    994 -0.300738  0.687318
[4321 rows x 6 columns]

dataset = <nvtabular.io.dataset.Dataset object at 0x7faa0c789a00>

engine = 'parquet', cat_names = ['name-cat', 'name-string']

cont_names = ['x', 'y', 'id'], label_name = ['label']
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)


  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")


tests/unit/test_torch_dataloader.py:76:

nvtabular/workflow.py:177: in fit_transform

self.fit(dataset)

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7faad2149580>

dask_stats = x            

y     0.020540627

id         1001.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          249     32    116     22    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 288, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4912   1021   2028    228    76%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.50%

=========================== short test summary info ============================

FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]

============ 1 failed, 754 passed, 14 skipped in 504.98s (0:08:24) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins1590525323433494328.sh

benfred · 2021-04-01T05:08:30Z

rerun tests

nvidia-merlin-bot · 2021-04-01T05:17:11Z

Click to view CI Results

GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2039/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7351562382516827366.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected
tests/unit/test_column_group.py .                                        [  0%]

tests/unit/test_column_similarity.py ......                              [  0%]

tests/unit/test_dask_nvt.py ............................................ [  6%]

.....................................................................    [ 15%]

tests/unit/test_io.py .................................................. [ 22%]

..................................................................ssssss [ 31%]

ss..............................................                         [ 37%]

tests/unit/test_notebooks.py ..s..                                       [ 38%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 54%]

........................................................................ [ 63%]

...................................                                      [ 68%]

tests/unit/test_tf_dataloader.py ...FF..FF.FFF............s              [ 71%]

tests/unit/test_tf_layers.py ........................................... [ 77%]

...................................                                      [ 81%]

tests/unit/test_tools.py ......................                          [ 84%]

tests/unit/test_torch_dataloader.py .........................ss          [ 88%]

tests/unit/test_workflow.py ............................................ [ 93%]

...............................................                          [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7efbec676d30>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7efbac557370>

dask_stats = x             

y     -0.001688915

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_True_10_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']

use_paths = True

dataset = <nvtabular.io.dataset.Dataset object at 0x7efc02719310>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7efc02719a60>

dask_stats = x             

y     -0.001688915

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_1_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7efc047bc2e0>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7efb885da5e0>

dask_stats = x     -0.010991114

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_1_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7efc045e5fd0>

batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7efc045c2040>

dask_stats = x     -0.010991114

y             

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_10_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7efc045b4400>

batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7efc048f55b0>

dask_stats = x             

y     -0.001688915

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_100_parqu0')

paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7efac9ffee20>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7efac9ffe5e0>

dask_stats = x             

y     -0.001688915

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_100_parqu1')

paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']

use_paths = False

dataset = <nvtabular.io.dataset.Dataset object at 0x7efac9fc15e0>

batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)


  workflow.fit(dataset)


tests/unit/test_tf_dataloader.py:91:

nvtabular/workflow.py:147: in fit

op.fit_finalize(computed_stats)

/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner

return func(*args, **kwds)

self = <nvtabular.ops.fill.FillMedian object at 0x7efac9f12400>

dask_stats = x             

y     -0.001688915

id          1000.0

Name: 0.5, dtype: float64
@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:


      self.medians[col] = float(dask_stats[col])


E           TypeError: float() argument must be a string or a number, not '_NAType'
nvtabular/ops/fill.py:112: TypeError
----------- coverage: platform linux, python 3.8.8-final-0 -----------

Name                                                           Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                                             12      0      0      0   100%

nvtabular/column_group.py                                        149     18     80      5    86%   54, 87, 128, 151-164, 191, 278

nvtabular/dispatch.py                                             81     11     38      5    83%   35, 45->47, 69, 94, 111, 118, 135-138, 167-170

nvtabular/framework_utils/init.py                              0      0      0      0   100%

nvtabular/framework_utils/tensorflow/init.py                   1      0      0      0   100%

nvtabular/framework_utils/tensorflow/feature_column_utils.py     146    137     96      0     4%   28-32, 69-303

nvtabular/framework_utils/tensorflow/layers/init.py            4      0      0      0   100%

nvtabular/framework_utils/tensorflow/layers/embedding.py         153     14     89      7    87%   56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376

nvtabular/framework_utils/tensorflow/layers/interaction.py        47     25     20      1    43%   49, 74-103, 106-110, 113

nvtabular/framework_utils/tensorflow/layers/outer_product.py      30     24     10      0    15%   37-38, 41-60, 71-84, 87

nvtabular/framework_utils/torch/init.py                        0      0      0      0   100%

nvtabular/framework_utils/torch/layers/init.py                 2      0      0      0   100%

nvtabular/framework_utils/torch/layers/embeddings.py              27     12     12      1    51%   47, 71-79, 82-88

nvtabular/framework_utils/torch/models.py                         41      6     22      7    76%   56, 59, 64, 85, 87-88, 91->94, 98->100

nvtabular/framework_utils/torch/utils.py                          31      7     10      3    76%   52, 56-58, 67-69

nvtabular/inference/init.py                                    0      0      0      0   100%

nvtabular/inference/triton/init.py                           218    211     90      0     2%   24-494

nvtabular/inference/triton/model.py                               56     56     22      0     0%   27-142

nvtabular/inference/triton/model_hugectr.py                       44     44     14      0     0%   27-120

nvtabular/io/init.py                                           4      0      0      0   100%

nvtabular/io/avro.py                                              88     88     30      0     0%   16-189

nvtabular/io/csv.py                                               54      4     20      5    88%   95, 99->103, 104, 106, 120

nvtabular/io/dask.py                                             178      7     68     11    93%   109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432

nvtabular/io/dataframe_engine.py                                  58      3     30      6    90%   44, 63, 82->86, 86->91, 88->91, 91->110, 119

nvtabular/io/dataset.py                                          249     32    116     22    84%   254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886

nvtabular/io/dataset_engine.py                                    23      1      0      0    96%   45

nvtabular/io/hugectr.py                                           45      2     24      2    91%   34, 74->97, 101

nvtabular/io/parquet.py                                          472     28    150     13    93%   83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913

nvtabular/io/shuffle.py                                           30      7     12      2    69%   41-48

nvtabular/io/writer.py                                           169     12     64      5    92%   31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280

nvtabular/io/writer_factory.py                                    18      2      8      2    85%   35, 60

nvtabular/loader/init.py                                       0      0      0      0   100%

nvtabular/loader/backend.py                                      280     17    114     10    93%   84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487

nvtabular/loader/tensorflow.py                                   119     11     52      7    88%   52, 60-63, 73, 83, 288, 303-305, 315->319, 348

nvtabular/loader/tf_utils.py                                      55     10     20      5    80%   29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70

nvtabular/loader/torch.py                                         43     10      8      0    69%   25-27, 30-36

nvtabular/ops/init.py                                         19      0      0      0   100%

nvtabular/ops/bucketize.py                                        25      4     16      2    76%   46, 49-52

nvtabular/ops/categorify.py                                      513     69    298     47    83%   223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089

nvtabular/ops/clip.py                                             19      2      6      3    80%   45, 53->55, 56

nvtabular/ops/column_similarity.py                                88     22     32      5    69%   84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221

nvtabular/ops/data_stats.py                                       57      1     24      4    94%   88->90, 90->92, 93->84, 96->84, 101

nvtabular/ops/difference_lag.py                                   26      0      8      1    97%   67->69

nvtabular/ops/dropna.py                                            9      0      0      0   100%

nvtabular/ops/fill.py                                             58      2     20      1    96%   93, 119

nvtabular/ops/filter.py                                           21      1      6      1    93%   44

nvtabular/ops/hash_bucket.py                                      32      2     18      2    88%   73, 102

nvtabular/ops/hashed_cross.py                                     29      3     13      4    83%   51, 66, 80->exit, 81

nvtabular/ops/join_external.py                                    69      5     28      6    89%   96, 98, 116, 129->133, 153, 168

nvtabular/ops/join_groupby.py                                     80      5     28      2    94%   106, 109->116, 183-184, 187-188

nvtabular/ops/lambdaop.py                                         27      3     10      3    84%   61, 65, 78

nvtabular/ops/logop.py                                             9      0      0      0   100%

nvtabular/ops/moments.py                                          65      0     20      0   100%

nvtabular/ops/normalize.py                                        64      6     14      2    87%   61->60, 67-68, 100-101, 123-124

nvtabular/ops/operator.py                                         15      1      2      1    88%   24

nvtabular/ops/rename.py                                           18      3     10      3    71%   41, 54, 58

nvtabular/ops/stat_operator.py                                    11      0      0      0   100%

nvtabular/ops/target_encoding.py                                 151     11     66      5    91%   143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338

nvtabular/tools/init.py                                        0      0      0      0   100%

nvtabular/tools/data_gen.py                                      235      1     62      4    98%   183->185, 230->234, 320->319, 322

nvtabular/tools/dataset_inspector.py                              49      9     18      0    75%   29-38

nvtabular/tools/inspector_script.py                               46     46      0      0     0%   17-168

nvtabular/utils.py                                                42     16     18      7    55%   20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73

nvtabular/worker.py                                               68      1     30      2    97%   73, 83->98

nvtabular/workflow.py                                            140      9     62      4    93%   39, 125, 137-139, 242, 270-271, 341
TOTAL                                                           4912   1021   2028    228    76%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.50%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]

============ 7 failed, 748 passed, 14 skipped in 500.16s (0:08:20) =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins2215583304916106267.sh

benfred · 2021-04-01T05:19:58Z

Rerun tests

nvidia-merlin-bot · 2021-04-01T05:28:45Z

Click to view CI Results

GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts. Running as SYSTEM Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2040/ and message: 'Pending' Using context: Jenkins Unit Test Run Building in workspace /var/jenkins_home/workspace/nvtabular_tests using credential ghub_token Cloning the remote Git repository Cloning repository https://github.com/NVIDIA/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git using GIT_ASKPASS to set credentials github token setup > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10 Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 Commit message: "Merge branch 'main' into hive-partitioning" > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins5011005984566441076.sh Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing wheel metadata: started Preparing wheel metadata: finished with status 'done' Installing collected packages: nvtabular Running setup.py develop for nvtabular Successfully installed nvtabular All done! ✨ 🍰 ✨ 95 files would be left unchanged. /conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images warn(f"Likely recursive symlink detected to {resolved_path}") ============================= test session starts ============================== platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0 collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .........................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 249 32 116 22 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4912 1021 2028 228 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.50%

================= 755 passed, 14 skipped in 503.80s (0:08:23) ==================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1511800520641781225.sh

* add Dataset.shuffle_by_keys * support npartitions * adding partition_on option to to_parquet * fix _metadata creation for partitioned data * expand testing and fix bug * avoid shuffle when we dont need it

rjzamora added 6 commits March 25, 2021 15:23

add Dataset.shuffle_by_keys

8455c2c

support npartitions

3e9d620

adding partition_on option to to_parquet

1effddc

fix _metadata creation for partitioned data

79c641c

expand testing and fix bug

9fb2dd5

Merge branch 'main' into hive-partitioning

89f29ce

Merge branch 'main' into hive-partitioning

0eedde3

rjzamora and others added 2 commits March 30, 2021 10:31

avoid shuffle when we dont need it

b2f756b

Merge branch 'main' into hive-partitioning

6503e50

benfred approved these changes Mar 31, 2021

View reviewed changes

Merge branch 'main' into hive-partitioning

a51acdb

rjzamora mentioned this pull request Mar 31, 2021

Add (local) Groupby Operation #685

Merged

Merge branch 'main' into hive-partitioning

60313a1

karlhigley merged commit 50a2f46 into NVIDIA-Merlin:main Apr 1, 2021

rjzamora deleted the hive-partitioning branch April 1, 2021 14:12

benfred mentioned this pull request May 4, 2021

[FEA] Partition the output parquet files by a column #431

Closed

Handle hive-partitioning in NVTabular.dataset.Dataset #677

Handle hive-partitioning in NVTabular.dataset.Dataset #677

Conversation

rjzamora commented Mar 26, 2021

rjzamora commented Mar 26, 2021

nvidia-merlin-bot commented Mar 26, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Mar 26, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

gabrielspmoreira commented Mar 26, 2021

nvidia-merlin-bot commented Mar 29, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Mar 31, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

rjzamora commented Mar 31, 2021

benfred commented Mar 31, 2021

benfred commented Mar 31, 2021

nvidia-merlin-bot commented Mar 31, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Mar 31, 2021

nvidia-merlin-bot commented Mar 31, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Mar 31, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

rjzamora commented Mar 31, 2021

nvidia-merlin-bot commented Mar 31, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

karlhigley commented Mar 31, 2021

jperez999 commented Mar 31, 2021

jperez999 commented Mar 31, 2021

nvidia-merlin-bot commented Mar 31, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Mar 31, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Mar 31, 2021

nvidia-merlin-bot commented Mar 31, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Apr 1, 2021 • edited Loading

benfred commented Apr 1, 2021

nvidia-merlin-bot commented Apr 1, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Apr 1, 2021

nvidia-merlin-bot commented Apr 1, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Apr 1, 2021

nvidia-merlin-bot commented Apr 1, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Apr 1, 2021

nvidia-merlin-bot commented Apr 1, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Apr 1, 2021

nvidia-merlin-bot commented Apr 1, 2021

----------- coverage: platform linux, python 3.8.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

benfred commented Apr 1, 2021 •

edited

Loading

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing