Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle hive-partitioning in NVTabular.dataset.Dataset #677

Merged
merged 11 commits into from
Apr 1, 2021

Conversation

rjzamora
Copy link
Collaborator

Closes #642
Addresses global shuffle component of #641

Thu purpose of this PR is to improve handling of hive-partitioned parquet data in NVTabular. Since the Dataset API already uses dask.dataframe.read_parquet, there is currenlty no "correcness" issue with reading hive-partitioned data. However, (1) there is no convenient mechanism to write hive-partitioned data, and (2) the read stage typically results in many small partitions (rather than a single partition for each input directory).

  • Solution to (1): The Dataset.to_parquet method now supports a partition_on= argument. This is designed to match the same option in dask.dataframe/dask_cudf. If the user passes a list of 1+ columns with this argument, the output data will be shuffled at IO time into a distinct directory for each unique combination of those partition_on column values. When multiple columns are use for partitioning (e.g. ["month", "day"]), the directory structure is nested (so that the full path for an output file will look something like "/month=Mar/day=30/part.0.parquet").
  • Solution to (2): Since [FEA] Sequential / Session-based recommendation and time series support - Group by sorting values by timestamp #641 will need a mechanism to ensure a unique mapping between specified column groups and ddf partitions, this PR adds a Dataset.partition_by_keys method to perform a global shuffle on the specified column group (keys) and return a new (shuffled) Dataset. For general Dataset objects, this method will simply call ddf.shuffle() under the hood. For Dataset objects that are backed by hive-partitioned data, however, we use the metadata stored in the file paths to avoid a full shuffle. In the future, this optimization can be pushed even further by directly agregating all IO tasks within the same hive-partition. However, I suspect that shuch an optimization should be implemented in dask.dataframe.

Example Usage

import pandas as pd
import dask.dataframe as dd
import dask
import nvtabular as nvt

path = "fake.data"

# Create a sample ddf
ddf = dask.datasets.timeseries(
    start="2000-01-01",
    end="2000-01-03",
    freq="600s",
    partition_freq="6h",
    seed=42,
).reset_index()
ddf['timestamp'] = ddf['timestamp'].dt.round('D').dt.day

# Convert to a Datset and write out hive-partitioned data to disk
keys = ["timestamp", "name"]
nvt.Dataset(ddf).to_parquet(path, partition_on=keys)

This will produce a directory structure like:

$ find fake.data/ -type d -print
fake.data/
fake.data/timestamp=1
fake.data/timestamp=1/name=Alice
fake.data/timestamp=1/name=Frank
fake.data/timestamp=1/name=Victor
fake.data/timestamp=1/name=George
fake.data/timestamp=1/name=Quinn
fake.data/timestamp=1/name=Kevin
fake.data/timestamp=1/name=Ursula
...

Then, you can read the data back in with NVT, and ensure that the ddf partitions are shuffled by keys:

ds = nvt.Dataset(path, engine="parquet").shuffle_by_keys(keys)
ds.to_ddf().compute()
      id         x         y timestamp    name
0    991 -0.750009 -0.587392         1   Alice
1   1022  0.866823 -0.682096         1   Alice
2    991  0.467775  0.683211         1   Alice
3    967  0.534984 -0.931405         1     Bob
4    991 -0.149716 -0.651939         1     Bob
..   ...       ...       ...       ...     ...
25   964  0.843602  0.598580         3  Yvonne
26   961  0.853070 -0.987596         3  Yvonne
27   947  0.934162  0.190069         3  Yvonne
28  1024 -0.107280  0.662606         3  Yvonne
29  1006  0.169090 -0.784889         3   Zelda

[288 rows x 5 columns]

@rjzamora
Copy link
Collaborator Author

cc @gabrielspmoreira @benfred

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3, no merge conflicts.
Running as SYSTEM
Setting status of 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1989/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3^{commit} # timeout=10
Checking out Revision 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 # timeout=10
Commit message: "expand testing and fix bug"
 > git rev-list --no-walk 1f60ba950f935d104c7c8fa21742158698eba3eb # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins427843502843120314.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
93 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 763 items / 2 skipped / 761 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 38%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 45%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py ................... [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 76%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ......F..........FF..........s [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-41/test_empty_cols_label_name0_co0')
df = name-cat name-string id label x y
0 Patricia Michael 990 1002 -0.005722 0.54568...n 1004 1075 0.821929 -0.615211
4320 Ray Xavier 1007 986 -0.643196 0.034432

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7cbc73d070>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:73:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f7cbc73dbe0>
dask_stats = x
y -0.020037743
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
______________________ test_gpu_dl[None-parquet-10-0.06] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-41/test_gpu_dl_None_parquet_10_0_1')
df = name-cat name-string id label x y
0 Patricia Michael 990 1002 -0.005722 0.54568...n 1004 1075 0.821929 -0.615211
4320 Ray Xavier 1007 986 -0.643196 0.034432

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7d92066eb0>
batch_size = 10, part_mem_fraction = 0.06, engine = 'parquet', devices = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, GPU_DEVICE_IDS[:2]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:106:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f7da2135e50>
dask_stats = x
y -0.020037743
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_gpu_dl[None-parquet-100-0.001] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-41/test_gpu_dl_None_parquet_100_00')
df = name-cat name-string id label x y
0 Patricia Michael 990 1002 -0.005722 0.54568...n 1004 1075 0.821929 -0.615211
4320 Ray Xavier 1007 986 -0.643196 0.034432

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7da21238e0>
batch_size = 100, part_mem_fraction = 0.001, engine = 'parquet', devices = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("devices", [None, GPU_DEVICE_IDS[:2]])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, devices):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:106:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f7d920667f0>
dask_stats = x
y -0.020037743
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 2 30 6 91% 44, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 245 30 112 20 85% 254, 256, 269, 278, 296-310, 409->474, 414-417, 422->432, 427-428, 439->437, 453->457, 503, 604->606, 606->615, 616, 623-624, 630, 636, 731-732, 844-849, 855, 880
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 27 150 12 93% 83-91, 136-143, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 269 15 112 9 94% 77-78, 107, 124, 131-132, 143, 203->205, 220, 243-244, 283->287, 358, 362-363, 457, 464
nvtabular/loader/tensorflow.py 120 8 52 7 90% 52, 60-63, 73, 83, 282, 309->313, 342
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4898 1012 2022 224 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.62%
=========================== short test summary info ============================
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-0.001]
============ 3 failed, 750 passed, 12 skipped in 587.86s (0:09:47) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7476407945782677737.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 89f29ce11125543cc1b4ca94b74cbbe0d4583adc, no merge conflicts.
Running as SYSTEM
Setting status of 89f29ce11125543cc1b4ca94b74cbbe0d4583adc to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1990/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 89f29ce11125543cc1b4ca94b74cbbe0d4583adc^{commit} # timeout=10
Checking out Revision 89f29ce11125543cc1b4ca94b74cbbe0d4583adc (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 89f29ce11125543cc1b4ca94b74cbbe0d4583adc # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 9fb2dd5d8fdc927f73405b5819e1ad46d6d9a4a3 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1838599612458303638.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
94 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 770 items / 2 skipped / 768 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 67%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 76%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .............................s [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 245 30 112 20 85% 254, 256, 269, 278, 296-310, 409->474, 414-417, 422->432, 427-428, 439->437, 453->457, 503, 604->606, 606->615, 616, 623-624, 630, 636, 731-732, 844-849, 855, 880
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 278 17 114 10 93% 84-85, 113, 130, 137-138, 149, 214->216, 224-228, 238, 261-262, 301->305, 376, 380-381, 475, 482
nvtabular/loader/tensorflow.py 120 11 52 7 88% 52, 60-63, 73, 83, 287, 302-304, 314->318, 347
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4907 1019 2024 226 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.53%

================= 759 passed, 13 skipped in 596.36s (0:09:56) ==================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5509550321025932911.sh

@gabrielspmoreira
Copy link
Member

Sounds great @rjzamora ! This PR will allow incremental training and evaluation of sequential recommender models and time series, as allows splitting data by time windows.

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 0eedde3f05679f9948ceda0136fbe06b704e6c6b, no merge conflicts.
Running as SYSTEM
Setting status of 0eedde3f05679f9948ceda0136fbe06b704e6c6b to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/1999/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 0eedde3f05679f9948ceda0136fbe06b704e6c6b^{commit} # timeout=10
Checking out Revision 0eedde3f05679f9948ceda0136fbe06b704e6c6b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0eedde3f05679f9948ceda0136fbe06b704e6c6b # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 2dcc4a9ed07a2ff8254449e1291ebc2e4281ddbf # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5306605500818761616.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
94 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 770 items / 2 skipped / 768 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 67%]
tests/unit/test_tf_dataloader.py ..FFFFFFFFFFF............s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 76%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .............................s [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_1_parquet_1')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7febf0c63e50>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7febf0c63310>
dask_stats = x
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fec887571f0>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7febf0c630a0>
dask_stats = x
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_10_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fecbafaa310>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fecbafaa7c0>
dask_stats = x 0.017360521
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_100_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fecbb15c2e0>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fecbb0f15e0>
dask_stats = x
y 0.00956418
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[True-100-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_True_100_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fecbb067370>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fecbb067c10>
dask_stats = x
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_1_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7fecbafb3a90>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fecbb04f700>
dask_stats = x
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_1_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7fec886ae6d0>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fec886aec40>
dask_stats = x
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_10_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7fec88757a90>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fecbaee6430>
dask_stats = x
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_10_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7febf0dd63d0>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7febf0d823d0>
dask_stats = x
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_100_parqu0')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7febf0d714c0>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fecb90ed910>
dask_stats = x
y 0.00956418
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-49/test_tf_gpu_dl_False_100_parqu1')
paths = ['/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-49/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7fec002e6a90>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fec002e63a0>
dask_stats = x 0.017360521
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 245 30 112 20 85% 254, 256, 269, 278, 296-310, 409->474, 414-417, 422->432, 427-428, 439->437, 453->457, 503, 604->606, 606->615, 616, 623-624, 630, 636, 731-732, 844-849, 855, 880
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 278 25 114 11 90% 79, 84-85, 113, 130, 137-138, 142-146, 149, 214-218, 224-228, 238, 261-262, 301->305, 376, 380-381, 475, 482
nvtabular/loader/tensorflow.py 120 12 52 8 87% 52, 60-63, 73, 83, 287, 294, 302-304, 314->318, 347
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4907 1028 2024 228 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.31%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]
============ 11 failed, 748 passed, 13 skipped in 587.95s (0:09:47) ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5032331935929203304.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts.
Running as SYSTEM
Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2011/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10
Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 42f58af7c1b8c1b29f31c482329dbf6bdd410c24 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5873751992607622620.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .FFFFFFFFFFFF............s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ......FF..FF..FFFFFFFFFF.ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_1_parquet_0')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff031327790>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0312d2540>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff033a150d0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_1_parquet_1')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fef78727a90>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0286909c0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef78727ee0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fef787a6e80>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0313a5c40>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef787a62b0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_10_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff028119880>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff031361840>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff028119a00>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_100_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff018374580>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0402afec0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff018374a00>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[True-100-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_True_100_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fef782ef970>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fef782efdc0>
dask_stats = x 0.017761632
y
id 1000.5
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_1_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff0f60ad790>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff03825e6c0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0f60ad130>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_1_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff0185432e0>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0313a5f40>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff018543790>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_10_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff018543400>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0402b11c0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0185435b0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_10_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff0280bc910>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0f38faac0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0280bc040>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_100_parqu0')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7fef784c2940>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff04029aac0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef784c2cd0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_tf_gpu_dl_False_100_parqu1')
paths = ['/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-55/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff02817b1c0>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0402c5840>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff02817b6a0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_empty_cols_label_name0_co0')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff028287c40>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0280a5b40>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0282876a0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________ test_empty_cols[label_name0-cont_names0-cat_names1-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_empty_cols_label_name0_co1')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7fef58429520>
engine = 'parquet', cat_names = [], cont_names = ['x', 'y', 'id']
label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fef584295b0>
dask_stats = x 0.017761632
y
id 1000.5
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_________ test_empty_cols[label_name1-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_empty_cols_label_name1_co0')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff018292610>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = []

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff02829f840>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef782d4e50>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________ test_empty_cols[label_name1-cont_names0-cat_names1-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_empty_cols_label_name1_co1')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105c96460>
engine = 'parquet', cat_names = [], cont_names = ['x', 'y', 'id']
label_name = []

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff028081840>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105c964f0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
__________________ test_gpu_dl_break[None-parquet-1000-0.001] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_break_None_parquet0')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105d580d0>
batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff040281d40>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105d58f70>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
__________________ test_gpu_dl_break[None-parquet-1000-0.06] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_break_None_parquet1')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105c6d7f0>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff038a97440>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105c6d5b0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
___________________ test_gpu_dl_break[0-parquet-1000-0.001] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_break_0_parquet_100')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105f296a0>
batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = 0

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff033a66a40>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105f29490>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_gpu_dl_break[0-parquet-1000-0.06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_break_0_parquet_101')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff018782790>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = 0

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff038228240>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105f6e7f0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_gpu_dl[None-parquet-1000-0.001] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_None_parquet_1000_0')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105ed7ac0>
batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff040284f40>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105ed7940>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_gpu_dl[None-parquet-1000-0.06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_None_parquet_1000_1')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7fef5840f4f0>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff040284ac0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7fef5840f2e0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
______________________ test_gpu_dl[0-parquet-1000-0.001] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_0_parquet_1000_0_00')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff1059ff880>
batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = 0

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff02829f4c0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff1059ff790>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_______________________ test_gpu_dl[0-parquet-1000-0.06] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_gpu_dl_0_parquet_1000_0_01')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff1059fffd0>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = 0

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff028091ec0>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff1059cfc40>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________________________ test_kill_dl[parquet-0.001] __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_kill_dl_parquet_0_001_0')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff0185bccd0>
part_mem_fraction = 0.001, engine = 'parquet'

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:238:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff0522cce40>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff0185bca00>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
__________________________ test_kill_dl[parquet-0.1] ___________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-55/test_kill_dl_parquet_0_1_0')
df = name-cat name-string id label x y
0 Quinn Gary 1020 996 0.672772 -0.59399...t 1027 1048 0.668489 -0.437601
4320 Xavier Michael 996 1040 0.471682 0.617638

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7ff105f735e0>
part_mem_fraction = 0.1, engine = 'parquet'

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:238:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7ff031386740>
fill_value = 1000.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7ff105f73670>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 247 31 114 21 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 28 114 13 89% 79, 84-85, 113, 130, 137-138, 142-146, 149, 211, 217-221, 227-231, 241, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 12 52 8 87% 52, 60-63, 73, 83, 286, 293, 301-303, 313->317, 346
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4910 1032 2026 231 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.23%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names1-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names0-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names1-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.001]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[0-parquet-1000-0.001]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[0-parquet-1000-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1000-0.001]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1000-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[0-parquet-1000-0.001]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[0-parquet-1000-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.001] - Typ...
FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.1] - TypeE...
============ 26 failed, 729 passed, 14 skipped in 490.99s (0:08:10) ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins292754657294673574.sh

@rjzamora
Copy link
Collaborator Author

Is it possible that the dataloader CI failures are being caused by this PR?

@benfred
Copy link
Member

benfred commented Mar 31, 2021

Is it possible that the dataloader CI failures are being caused by this PR?

I don't think so - we've had some flaky tests around this for a while now (#397) , but for some reason these errors seem more common now

@benfred
Copy link
Member

benfred commented Mar 31, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts.
Running as SYSTEM
Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2015/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10
Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 40a7f2b1c8e4e6743f499c4904ba1db32fbba0a2 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6076946174219903460.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ......F...F..............ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_empty_cols_label_name0_co0')
df = name-cat name-string id label x y
0 Frank Bob 977 1050 -0.442815 0.69365...h 983 945 0.348971 0.700293
4320 Norbert Zelda 1013 1014 -0.623388 -0.007252

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f1f101b2b50>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f1f103531f0>
dask_stats = x 0.01028773
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_________ test_empty_cols[label_name1-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_empty_cols_label_name1_co0')
df = name-cat name-string id label x y
0 Frank Bob 977 1050 -0.442815 0.69365...h 983 945 0.348971 0.700293
4320 Norbert Zelda 1013 1014 -0.623388 -0.007252

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f1ef07ba2e0>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = []

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f1ef00b8730>
dask_stats = x 0.01028773
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 247 31 114 21 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 286, 301-303, 313->317, 346
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4910 1020 2026 227 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.51%
=========================== short test summary info ============================
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names0-parquet]
============ 2 failed, 753 passed, 14 skipped in 493.10s (0:08:13) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins274464265043646598.sh

@benfred
Copy link
Member

benfred commented Mar 31, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 6503e5069dea0f27dff7643a464d336bb1d752a8, no merge conflicts.
Running as SYSTEM
Setting status of 6503e5069dea0f27dff7643a464d336bb1d752a8 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2016/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 6503e5069dea0f27dff7643a464d336bb1d752a8^{commit} # timeout=10
Checking out Revision 6503e5069dea0f27dff7643a464d336bb1d752a8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2330873152729625782.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .........................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 247 31 114 21 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 286, 301-303, 313->317, 346
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4910 1020 2026 227 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.51%

================= 755 passed, 14 skipped in 493.61s (0:08:13) ==================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1033248889434797742.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts.
Running as SYSTEM
Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2017/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10
Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 6503e5069dea0f27dff7643a464d336bb1d752a8 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4907412045382266287.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ..........F..............ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_________ test_empty_cols[label_name1-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_empty_cols_label_name1_co0')
df = name-cat name-string id label x y
0 Michael Norbert 1006 1055 -0.572479 0.016072
...ra 999 989 -0.009399 -0.914909
4320 Jerry Quinn 1008 1012 0.955115 0.879404

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f044c262700>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = []

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f044c7dc130>
dask_stats = x -0.015519284
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 247 31 114 21 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4910 1020 2026 227 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.51%
=========================== short test summary info ============================
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names0-parquet]
============ 1 failed, 754 passed, 14 skipped in 492.20s (0:08:12) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins2995186208006081316.sh

@rjzamora
Copy link
Collaborator Author

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts.
Running as SYSTEM
Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2018/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10
Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1242628838666643737.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py ...F...FFFF.F............s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .........................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7f2b02ef0cd0>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f2b02bd7a60>
dask_stats = x -0.009783547
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_1_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f2b02cc86d0>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f2ac83b0d90>
dask_stats = x -0.009783547
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_1_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f2ad0039f40>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f2b02cbba60>
dask_stats = x -0.009783547
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_10_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f2b02eee100>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f2b02cceb20>
dask_stats = x -0.009783547
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_10_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f2ad03eafa0>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f2ad03eaca0>
dask_stats = x -0.009783547
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_tf_gpu_dl_False_100_parqu1')
paths = ['/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-4/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f2b02cb4670>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f2ad04a1b50>
dask_stats = x -0.009783547
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 247 31 114 21 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4910 1020 2026 227 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.51%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]
============ 6 failed, 749 passed, 14 skipped in 522.71s (0:08:42) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1434248472016933687.sh

@karlhigley
Copy link
Contributor

@jperez999 This CI error looks familiar and I think you tried to explain it to me, but I didn't fully understand what's causing it. Thoughts on how to resolve it?

@jperez999
Copy link
Contributor

@karlhigley ok so I think... and follow me on this. So the reason for the error is that the median stat operator collects a median value of NA for the continuous columns (sometimes both x and y, sometimes just one of them). When we go to apply the NA value we hit the error we are seeing. I think this is because of the way we setup the dataset and the beginning here https://github.com/NVIDIA/NVTabular/blob/main/tests/conftest.py#L108-L112 This inserts the NA values, and I think we are seeing the random positions coincidentally land on the median indexes.

@jperez999
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit a51acdb4154c37628ec0252a55c1d38e8dde1869, no merge conflicts.
Running as SYSTEM
Setting status of a51acdb4154c37628ec0252a55c1d38e8dde1869 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2019/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse a51acdb4154c37628ec0252a55c1d38e8dde1869^{commit} # timeout=10
Checking out Revision a51acdb4154c37628ec0252a55c1d38e8dde1869 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk a51acdb4154c37628ec0252a55c1d38e8dde1869 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3416488090804215825.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ......F..................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_empty_cols_label_name0_co0')
df = name-cat name-string id label x y
0 Patricia Alice 1024 1006 0.903991 0.86151...n 1009 1065 -0.355152 -0.101055
4320 Jerry Dan 944 981 -0.677558 -0.940314

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f921003e5b0>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f9210525a90>
dask_stats = x
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 247 31 114 21 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 884
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4910 1020 2026 227 77%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.51%
=========================== short test summary info ============================
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]
============ 1 failed, 754 passed, 14 skipped in 490.05s (0:08:10) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins913225217622897003.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2026/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 2d552f6806f843cc0da94110e76e281e1db982b8 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5070192051355054301.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .FFFFFFFFFFFF............s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ......FF..FF..FFFFFFFFFF.ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_1_parquet_0')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7b284670a0>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7ac25d5140>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7b28524f10>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_1_parquet_1')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a94109970>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b29681a40>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a94109d00>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a9429d2e0>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b2849aa40>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a9429d760>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_10_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a9478c6d0>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7be41e67c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a9478caf0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_100_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a9444da30>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7be42b80c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a9444dd60>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[True-100-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_True_100_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a946f0d00>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7be41f28c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a946f0250>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_1_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab8436190>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b9c0760c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7ab85ffdc0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_1_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab855e3d0>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7bfaceb7c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7ab855e5e0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_10_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab8277940>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b536ddb40>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7ab8277d60>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_10_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a94388ca0>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b9c0760c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a94388070>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_100_parqu0')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a942fc6d0>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7bf7ee96c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a942fc4c0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_tf_gpu_dl_False_100_parqu1')
paths = ['/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-11/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a7459dcd0>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b9e66d040>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a7459ddf0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_empty_cols_label_name0_co0')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a94660160>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28a32340>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a94660910>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________ test_empty_cols[label_name0-cont_names0-cat_names1-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_empty_cols_label_name0_co1')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a6c71f3d0>
engine = 'parquet', cat_names = [], cont_names = ['x', 'y', 'id']
label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28e75c40>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7b28492df0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________ test_empty_cols[label_name1-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_empty_cols_label_name1_co0')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a74190130>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = []

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b53645d40>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a74691400>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________ test_empty_cols[label_name1-cont_names0-cat_names1-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_empty_cols_label_name1_co1')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09cf1a00>
engine = 'parquet', cat_names = [], cont_names = ['x', 'y', 'id']
label_name = []

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28d08b40>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09cf16d0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
__________________ test_gpu_dl_break[None-parquet-1000-0.001] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_break_None_parquet0')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab817a1f0>
batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28fe56c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a9470e130>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
__________________ test_gpu_dl_break[None-parquet-1000-0.06] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_break_None_parquet1')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09ef5e20>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28fef040>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09ef5ca0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
___________________ test_gpu_dl_break[0-parquet-1000-0.001] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_break_0_parquet_100')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7bfab41670>
batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = 0

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28fd1040>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09d7da00>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
____________________ test_gpu_dl_break[0-parquet-1000-0.06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_break_0_parquet_101')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09da6dc0>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = 0

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28b70dc0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09da6a90>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_gpu_dl[None-parquet-1000-0.001] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_None_parquet_1000_0')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7ab8627cd0>
batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28c6e940>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7ab8627b20>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_____________________ test_gpu_dl[None-parquet-1000-0.06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_None_parquet_1000_1')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a94698400>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b289ee140>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a94698ac0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
______________________ test_gpu_dl[0-parquet-1000-0.001] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_0_parquet_1000_0_00')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7a944ce2e0>
batch_size = 1000, part_mem_fraction = 0.001, engine = 'parquet', device = 0

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b288775c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7a944cee50>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_______________________ test_gpu_dl[0-parquet-1000-0.06] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_gpu_dl_0_parquet_1000_0_01')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09b149a0>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = 0

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:169:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b28488b40>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09b14790>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
_________________________ test_kill_dl[parquet-0.001] __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_kill_dl_parquet_0_001_0')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09ce1e20>
part_mem_fraction = 0.001, engine = 'parquet'

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:238:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7b289eebc0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09ce1f10>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64
__________________________ test_kill_dl[parquet-0.1] ___________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-11/test_kill_dl_parquet_0_1_0')
df = name-cat name-string id label x y
0 Kevin Laura 968 999 0.526043 0.837925
...er 1025 1028 -0.103355 0.852146
4320 Laura Sarah 988 1001 0.645996 -0.767400

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f7c09b5fe50>
part_mem_fraction = 0.1, engine = 'parquet'

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
def test_kill_dl(tmpdir, df, dataset, part_mem_fraction, engine):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:238:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:144: in fit
results = dask.compute(stats, scheduler="synchronous")[0]
/conda/envs/rapids/lib/python3.8/site-packages/dask/base.py:563: in compute
results = schedule(dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:528: in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:495: in get_async
fire_task()
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:457: in fire_task
apply_async(
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:517: in apply_sync
res = func(args, **kwds)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:227: in execute_task
result = pack_exception(e, dumps)
/conda/envs/rapids/lib/python3.8/site-packages/dask/local.py:222: in execute_task
result = _execute_task(task, data)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/optimization.py:963: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:151: in get
result = _execute_task(task, cache)
/conda/envs/rapids/lib/python3.8/site-packages/dask/core.py:121: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/conda/envs/rapids/lib/python3.8/site-packages/dask/utils.py:35: in apply
return func(*args, **kwargs)
/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py:5471: in apply_and_enforce
df = func(*args, **kwargs)
nvtabular/workflow.py:336: in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)
nvtabular/ops/fill.py:98: in transform
df[col] = df[col].fillna(self.medians[col])
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py:1874: in fillna
return super().fillna(
/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py:1475: in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f7bf7ee94c0>
fill_value = 1002.5, method = None, dtype = None, fill_nan = True

def fillna(
    self,
    fill_value: Any = None,
    method: str = None,
    dtype: Dtype = None,
    fill_nan: bool = True,
) -> NumericalColumn:
    """
    Fill null values with *fill_value*
    """
    if fill_nan:
        col = self.nans_to_nulls()
    else:
        col = self

    if method is not None:
        return super(NumericalColumn, col).fillna(fill_value, method)

    if fill_value is None:
        raise ValueError("Must specify either 'fill_value' or 'method'")

    if (
        isinstance(fill_value, cudf.Scalar)
        and fill_value.dtype == col.dtype
    ):
        return super(NumericalColumn, col).fillna(fill_value, method)

    if np.isscalar(fill_value):
        # cast safely to the same dtype as self
        fill_value_casted = col.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
          raise TypeError(
                f"Cannot safely cast non-equivalent "
                f"{type(fill_value).__name__} to {col.dtype.name}"
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py:487: TypeError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:workflow.py:338 Failed to transform operator <nvtabular.ops.fill.FillMedian object at 0x7f7c09b5fd60>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow.py", line 336, in _transform_partition
df = column_group.op.transform(column_group.input_column_names, df)
File "/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/fill.py", line 98, in transform
df[col] = df[col].fillna(self.medians[col])
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/series.py", line 1874, in fillna
return super().fillna(
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/frame.py", line 1475, in fillna
copy_data[name] = copy_data[name].fillna(value[name], method)
File "/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/numerical.py", line 487, in fillna
raise TypeError(
TypeError: Cannot safely cast non-equivalent float to int64

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 249 32 116 22 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 28 114 13 89% 79, 84-85, 113, 130, 137-138, 142-146, 149, 211, 217-221, 227-231, 241, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 12 52 8 87% 52, 60-63, 73, 83, 288, 295, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4912 1033 2028 232 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.21%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names1-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names0-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name1-cont_names0-cat_names1-parquet]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.001]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[0-parquet-1000-0.001]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[0-parquet-1000-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1000-0.001]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1000-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[0-parquet-1000-0.001]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[0-parquet-1000-0.06]
FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.001] - Typ...
FAILED tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-0.1] - TypeE...
============ 26 failed, 729 passed, 14 skipped in 503.04s (0:08:23) ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins2439974284389972750.sh

@benfred
Copy link
Member

benfred commented Mar 31, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2027/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins3656656512144996904.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ...............F.........ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
__________________ test_gpu_dl_break[None-parquet-1000-0.06] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-12/test_gpu_dl_break_None_parquet1')
df = name-cat name-string id label x y
0 Ray Alice 1007 1009 -0.193376 -0.84207...e 984 985 -0.402427 -0.005272
4320 Tim Gary 986 950 -0.625596 -0.488068

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7faf8df8df40>
batch_size = 1000, part_mem_fraction = 0.06, engine = 'parquet', device = None

@pytest.mark.parametrize("part_mem_fraction", [0.001, 0.06])
@pytest.mark.parametrize("batch_size", [1000])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("device", [None, 0])
def test_gpu_dl_break(tmpdir, df, dataset, batch_size, part_mem_fraction, engine, device):
    cat_names = ["name-cat", "name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    processor = nvt.Workflow(conts + cats + label_name)

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  processor.fit_transform(dataset).to_parquet(
        shuffle=nvt.io.Shuffle.PER_PARTITION,
        output_path=output_train,
        out_files_per_proc=2,
    )

tests/unit/test_torch_dataloader.py:109:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fafe2c8c520>
dask_stats = x 0.034901721
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 249 32 116 22 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4912 1021 2028 228 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.50%
=========================== short test summary info ============================
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl_break[None-parquet-1000-0.06]
============ 1 failed, 754 passed, 14 skipped in 504.12s (0:08:24) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins8202843174893632191.sh

@benfred
Copy link
Member

benfred commented Apr 1, 2021

@karlhigley @jperez999 @rjzamora I think the NA failure in the unittests is unrelated to this PR - I spent some time debugging this and left my thoughts here #687 (comment) . That PR has a 'fix' - but would like to figure out why this is happening

@benfred
Copy link
Member

benfred commented Apr 1, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2030/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 78cea240d94600c01b749619aaaa33154ad88555 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1600729026063402115.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py ..FF....F................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .........................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-1-parquet-0.06] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-15/test_tf_gpu_dl_True_1_parquet_1')
paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fd7d86292b0>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fd7d8629940>
dask_stats = x 0.002405407
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-15/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7fd7d850aaf0>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fd7d86c3640>
dask_stats = x 0.002405407
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-15/test_tf_gpu_dl_False_1_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-15/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7fd7d8638130>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7fd79c14a160>
dask_stats = x 0.002405407
y
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 249 32 116 22 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4912 1021 2028 228 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.50%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]
============ 3 failed, 752 passed, 14 skipped in 501.99s (0:08:21) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5394768993438524972.sh

@benfred
Copy link
Member

benfred commented Apr 1, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2032/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 4a11db2df80cadc1a37a6e53798d2aed3855e556 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins5352641028358161637.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py ...F.....F...............s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ......F..................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-17/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-17/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-17/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7f17ba38c5b0>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f17ba38c400>
dask_stats = x
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-17/test_tf_gpu_dl_False_10_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-17/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-17/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7f17ba3d6c40>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f17ba43bd30>
dask_stats = x
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-17/test_empty_cols_label_name0_co0')
df = name-cat name-string id label x y
0 Zelda Frank 1036 981 0.512298 0.349215
...er 989 1017 0.568488 0.312415
4320 Alice Hannah 1009 1056 -0.233545 -0.972603

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7f1903937bb0>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7f190393b220>
dask_stats = x
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 249 32 116 22 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4912 1021 2028 228 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.50%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]
============ 3 failed, 752 passed, 14 skipped in 506.96s (0:08:26) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins9009968696599090121.sh

@benfred
Copy link
Member

benfred commented Apr 1, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2038/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk d3a6f11b0464ef8d4e1ccbffd2d8900bbc8309d0 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins4894229412186291544.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py ......F..................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_________ test_empty_cols[label_name0-cont_names0-cat_names0-parquet] __________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-21/test_empty_cols_label_name0_co0')
df = name-cat name-string id label x y
0 Michael Xavier 1034 1053 0.740279 -0.619412
...ah 994 1014 -0.397221 0.085056
4320 Jerry Oliver 975 994 -0.300738 0.687318

[4321 rows x 6 columns]
dataset = <nvtabular.io.dataset.Dataset object at 0x7faa0c789a00>
engine = 'parquet', cat_names = ['name-cat', 'name-string']
cont_names = ['x', 'y', 'id'], label_name = ['label']

@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("cat_names", [["name-cat", "name-string"], []])
@pytest.mark.parametrize("cont_names", [["x", "y", "id"], []])
@pytest.mark.parametrize("label_name", [["label"], []])
def test_empty_cols(tmpdir, df, dataset, engine, cat_names, cont_names, label_name):

    features = []
    if cont_names:
        features.append(cont_names >> ops.FillMedian() >> ops.Normalize())
    if cat_names:
        features.append(cat_names >> ops.Categorify())

    # test out https://github.com/NVIDIA/NVTabular/issues/149 making sure we can iterate over
    # empty cats/conts
    graph = sum(features, nvt.ColumnGroup(label_name))
    if not graph.columns:
        # if we don't have conts/cats/labels we're done
        return

    processor = nvt.Workflow(sum(features, nvt.ColumnGroup(label_name)))

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)
  df_out = processor.fit_transform(dataset).to_ddf().compute(scheduler="synchronous")

tests/unit/test_torch_dataloader.py:76:


nvtabular/workflow.py:177: in fit_transform
self.fit(dataset)
nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7faad2149580>
dask_stats = x
y 0.020540627
id 1001.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 249 32 116 22 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4912 1021 2028 228 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.50%
=========================== short test summary info ============================
FAILED tests/unit/test_torch_dataloader.py::test_empty_cols[label_name0-cont_names0-cat_names0-parquet]
============ 1 failed, 754 passed, 14 skipped in 504.98s (0:08:24) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1590525323433494328.sh

@benfred
Copy link
Member

benfred commented Apr 1, 2021

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2039/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7351562382516827366.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py ...FF..FF.FFF............s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .........................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7efbec676d30>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7efbac557370>
dask_stats = x
y -0.001688915
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[True-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_True_10_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']
use_paths = True
dataset = <nvtabular.io.dataset.Dataset object at 0x7efc02719310>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7efc02719a60>
dask_stats = x
y -0.001688915
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_1_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7efc047bc2e0>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7efb885da5e0>
dask_stats = x -0.010991114
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_1_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7efc045e5fd0>
batch_size = 1, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7efc045c2040>
dask_stats = x -0.010991114
y
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-10-parquet-0.06] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_10_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7efc045b4400>
batch_size = 10, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7efc048f55b0>
dask_stats = x
y -0.001688915
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_100_parqu0')
paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7efac9ffee20>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7efac9ffe5e0>
dask_stats = x
y -0.001688915
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.06] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_tf_gpu_dl_False_100_parqu1')
paths = ['/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-22/parquet0/dataset-1.parquet']
use_paths = False
dataset = <nvtabular.io.dataset.Dataset object at 0x7efac9fc15e0>
batch_size = 100, gpu_memory_frac = 0.06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.06])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    conts = cont_names >> ops.FillMedian() >> ops.Normalize()
    cats = cat_names >> ops.Categorify()

    workflow = nvt.Workflow(conts + cats + label_name)
  workflow.fit(dataset)

tests/unit/test_tf_dataloader.py:91:


nvtabular/workflow.py:147: in fit
op.fit_finalize(computed_stats)
/conda/envs/rapids/lib/python3.8/contextlib.py:75: in inner
return func(*args, **kwds)


self = <nvtabular.ops.fill.FillMedian object at 0x7efac9f12400>
dask_stats = x
y -0.001688915
id 1000.0
Name: 0.5, dtype: float64

@annotate("FillMedian_finalize", color="green", domain="nvt_python")
def fit_finalize(self, dask_stats):
    index = dask_stats.index
    vals = index.values_host if hasattr(index, "values_host") else index.values
    for col in vals:
      self.medians[col] = float(dask_stats[col])

E TypeError: float() argument must be a string or a number, not '_NAType'

nvtabular/ops/fill.py:112: TypeError

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 249 32 116 22 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4912 1021 2028 228 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.50%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.06]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.06]
============ 7 failed, 748 passed, 14 skipped in 500.16s (0:08:20) =============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins2215583304916106267.sh

@benfred
Copy link
Member

benfred commented Apr 1, 2021

Rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #677 of commit 60313a14b6240f8c4429e7d1fa168e4346fe51c5, no merge conflicts.
Running as SYSTEM
Setting status of 60313a14b6240f8c4429e7d1fa168e4346fe51c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/2040/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential ghub_token
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials github token setup
 > git fetch --tags --force --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/677/*:refs/remotes/origin/pr/677/* # timeout=10
 > git rev-parse 60313a14b6240f8c4429e7d1fa168e4346fe51c5^{commit} # timeout=10
Checking out Revision 60313a14b6240f8c4429e7d1fa168e4346fe51c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
Commit message: "Merge branch 'main' into hive-partitioning"
 > git rev-list --no-walk 60313a14b6240f8c4429e7d1fa168e4346fe51c5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins5011005984566441076.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
95 files would be left unchanged.
/conda/envs/rapids/lib/python3.8/site-packages/isort/files.py:30: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
============================= test session starts ==============================
platform linux -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: cov-2.11.1, xdist-2.2.1, forked-1.3.0
collected 767 items / 2 skipped / 765 selected

tests/unit/test_column_group.py . [ 0%]
tests/unit/test_column_similarity.py ...... [ 0%]
tests/unit/test_dask_nvt.py ............................................ [ 6%]
..................................................................... [ 15%]
tests/unit/test_io.py .................................................. [ 22%]
..................................................................ssssss [ 31%]
ss.............................................. [ 37%]
tests/unit/test_notebooks.py ..s.. [ 38%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 54%]
........................................................................ [ 63%]
................................... [ 68%]
tests/unit/test_tf_dataloader.py .........................s [ 71%]
tests/unit/test_tf_layers.py ........................................... [ 77%]
................................... [ 81%]
tests/unit/test_tools.py ...................... [ 84%]
tests/unit/test_torch_dataloader.py .........................ss [ 88%]
tests/unit/test_workflow.py ............................................ [ 93%]
............................................... [100%]

----------- coverage: platform linux, python 3.8.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 12 0 0 0 100%
nvtabular/column_group.py 149 18 80 5 86% 54, 87, 128, 151-164, 191, 278
nvtabular/dispatch.py 81 11 38 5 83% 35, 45->47, 69, 94, 111, 118, 135-138, 167-170
nvtabular/framework_utils/init.py 0 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 146 137 96 0 4% 28-32, 69-303
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 14 89 7 87% 56, 64->45, 100, 108, 188, 240-248, 251, 344->352, 366->369, 372-373, 376
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 20 1 43% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 10 0 15% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 27 12 12 1 51% 47, 71-79, 82-88
nvtabular/framework_utils/torch/models.py 41 6 22 7 76% 56, 59, 64, 85, 87-88, 91->94, 98->100
nvtabular/framework_utils/torch/utils.py 31 7 10 3 76% 52, 56-58, 67-69
nvtabular/inference/init.py 0 0 0 0 100%
nvtabular/inference/triton/init.py 218 211 90 0 2% 24-494
nvtabular/inference/triton/model.py 56 56 22 0 0% 27-142
nvtabular/inference/triton/model_hugectr.py 44 44 14 0 0% 27-120
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/avro.py 88 88 30 0 0% 16-189
nvtabular/io/csv.py 54 4 20 5 88% 95, 99->103, 104, 106, 120
nvtabular/io/dask.py 178 7 68 11 93% 109, 112, 148, 223, 380->378, 408->411, 419, 423->425, 425->421, 430, 432
nvtabular/io/dataframe_engine.py 58 3 30 6 90% 44, 63, 82->86, 86->91, 88->91, 91->110, 119
nvtabular/io/dataset.py 249 32 116 22 84% 254, 256, 269, 278, 296-310, 409->478, 414-417, 422->432, 427-428, 439->437, 453->457, 468, 507, 608->610, 610->619, 620, 627-628, 634, 640, 735-736, 848-853, 859, 879, 886
nvtabular/io/dataset_engine.py 23 1 0 0 96% 45
nvtabular/io/hugectr.py 45 2 24 2 91% 34, 74->97, 101
nvtabular/io/parquet.py 472 28 150 13 93% 83-91, 136-143, 155, 184-186, 311-316, 354-359, 475->482, 545->550, 551-552, 672, 676, 680, 718, 735, 739, 746->748, 866->871, 876->886, 913
nvtabular/io/shuffle.py 30 7 12 2 69% 41-48
nvtabular/io/writer.py 169 12 64 5 92% 31, 48, 76, 122, 125, 202, 211, 214, 257, 278-280
nvtabular/io/writer_factory.py 18 2 8 2 85% 35, 60
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 280 17 114 10 93% 84-85, 113, 130, 137-138, 217->219, 227-231, 247-248, 266-267, 306->310, 381, 385-386, 480, 487
nvtabular/loader/tensorflow.py 119 11 52 7 88% 52, 60-63, 73, 83, 288, 303-305, 315->319, 348
nvtabular/loader/tf_utils.py 55 10 20 5 80% 29->32, 32->34, 39->41, 43, 50-51, 58-60, 66-70
nvtabular/loader/torch.py 43 10 8 0 69% 25-27, 30-36
nvtabular/ops/init.py 19 0 0 0 100%
nvtabular/ops/bucketize.py 25 4 16 2 76% 46, 49-52
nvtabular/ops/categorify.py 513 69 298 47 83% 223, 240, 244, 252, 260, 262, 284, 303-304, 336-337, 399-401, 459-461, 466->468, 541, 579, 608->611, 612-614, 621-622, 635-637, 638->606, 654, 664, 666, 672, 688-689, 694, 697->700, 710, 734-736, 739, 741->743, 755-758, 784, 788, 790, 802-805, 920, 922, 964->985, 970->985, 986-991, 1028, 1044->1049, 1048, 1058->1055, 1063->1055, 1071, 1079-1089
nvtabular/ops/clip.py 19 2 6 3 80% 45, 53->55, 56
nvtabular/ops/column_similarity.py 88 22 32 5 69% 84, 154-155, 164-166, 174-190, 205->215, 207->210, 211, 221
nvtabular/ops/data_stats.py 57 1 24 4 94% 88->90, 90->92, 93->84, 96->84, 101
nvtabular/ops/difference_lag.py 26 0 8 1 97% 67->69
nvtabular/ops/dropna.py 9 0 0 0 100%
nvtabular/ops/fill.py 58 2 20 1 96% 93, 119
nvtabular/ops/filter.py 21 1 6 1 93% 44
nvtabular/ops/hash_bucket.py 32 2 18 2 88% 73, 102
nvtabular/ops/hashed_cross.py 29 3 13 4 83% 51, 66, 80->exit, 81
nvtabular/ops/join_external.py 69 5 28 6 89% 96, 98, 116, 129->133, 153, 168
nvtabular/ops/join_groupby.py 80 5 28 2 94% 106, 109->116, 183-184, 187-188
nvtabular/ops/lambdaop.py 27 3 10 3 84% 61, 65, 78
nvtabular/ops/logop.py 9 0 0 0 100%
nvtabular/ops/moments.py 65 0 20 0 100%
nvtabular/ops/normalize.py 64 6 14 2 87% 61->60, 67-68, 100-101, 123-124
nvtabular/ops/operator.py 15 1 2 1 88% 24
nvtabular/ops/rename.py 18 3 10 3 71% 41, 54, 58
nvtabular/ops/stat_operator.py 11 0 0 0 100%
nvtabular/ops/target_encoding.py 151 11 66 5 91% 143, 163->167, 170->179, 222-223, 226-227, 236-242, 335->338
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 235 1 62 4 98% 183->185, 230->234, 320->319, 322
nvtabular/tools/dataset_inspector.py 49 9 18 0 75% 29-38
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 42 16 18 7 55% 20-21, 25-26, 35, 39, 48-51, 53-55, 58, 61, 67, 73
nvtabular/worker.py 68 1 30 2 97% 73, 83->98
nvtabular/workflow.py 140 9 62 4 93% 39, 125, 137-139, 242, 270-271, 341

TOTAL 4912 1021 2028 228 76%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.50%

================= 755 passed, 14 skipped in 503.80s (0:08:23) ==================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1511800520641781225.sh

@karlhigley karlhigley merged commit 50a2f46 into NVIDIA-Merlin:main Apr 1, 2021
@rjzamora rjzamora deleted the hive-partitioning branch April 1, 2021 14:12
mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
* add Dataset.shuffle_by_keys

* support npartitions

* adding partition_on option to to_parquet

* fix _metadata creation for partitioned data

* expand testing and fix bug

* avoid shuffle when we dont need it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Partition output parquet files by a column
6 participants