Fix for working with cudf 0.15 #159

benfred · 2020-07-17T21:31:43Z

Cudf 0.15 recently disabled iterating over the values of an index, which broke
a number of ops in nvtabular. (change rapidsai/cudf#5340)
Fix by using values_host.

Cudf 0.15 recently disabled iterating over the values of an index, which broke a number of ops in nvtabular. (change rapidsai/cudf#5340) Fix by using values_host.

nvidia-merlin-bot · 2020-07-17T21:31:58Z

Click to view CI Results

GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/286/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 0b1a93b4dcae8509937a6ffb105f1911bafb5f58 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2413563116796183923.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins3587705493963020004.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

benfred · 2020-07-17T21:33:00Z

rerun tests

nvidia-merlin-bot · 2020-07-17T21:33:08Z

Click to view CI Results

GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/287/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins904949477396641404.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins9087851760922571547.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

jperez999 · 2020-07-19T02:17:40Z

rerun tests

nvidia-merlin-bot · 2020-07-19T02:17:54Z

Click to view CI Results

GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/288/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins885494632642895385.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins933950288347073525.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

jperez999 · 2020-07-19T02:35:08Z

rerun tests

nvidia-merlin-bot · 2020-07-19T02:35:16Z

Click to view CI Results

GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/289/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8359743667228744018.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8732341273937868876.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

jperez999 · 2020-07-19T03:04:25Z

rerun tests

nvidia-merlin-bot · 2020-07-19T03:04:39Z

Click to view CI Results

GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/290/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1998524369983926254.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins2701989822688980317.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

jperez999 · 2020-07-19T14:08:56Z

rerun tests

nvidia-merlin-bot · 2020-07-19T14:09:09Z

Click to view CI Results

GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/291/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7087152540603580233.sh
/tmp/jenkins7087152540603580233.sh: line 5: black: command not found
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins1548896684945165098.sh

jperez999 · 2020-07-19T14:45:38Z

rerun tests

nvidia-merlin-bot · 2020-07-19T14:49:55Z

Click to view CI Results

GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/292/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8953532366241069577.sh
No Path provided. Nothing to do 😴
             _                 _
            (_) ___  ___  _ __| |_
            | |/ _/ / _ \/ '__  _/
            | |\__ \/\_\/| |  | |_
            |_|\___/\___/\_/   \_/

  isort your imports, so you don't have to.

                VERSION 5.1.3

Nothing to do: no files or paths have have been passed in!
Try one of the following:
`isort .` - sort all Python files, starting from the current directory, recursively.
`isort . --interactive` - Do the same, but ask before making any changes.
`isort . --check --diff` - Check to see if imports are correctly sorted within this project.
`isort --help` - In-depth information about isort's available command-line options.

Visit https://timothycrosley.github.io/isort/ for complete information about how to use isort.
============================= test session starts ==============================

platform linux -- Python 3.7.4, pytest-5.4.3, py-1.9.0, pluggy-0.13.1

rootdir: /var/jenkins_home/workspace/nvtabular, inifile: setup.cfg

plugins: hypothesis-5.20.2, cov-2.10.0

collected 321 items / 1 skipped / 320 selected
tests/unit/test_dask_nvt.py ............................................ [ 13%]

..........                                                               [ 16%]

tests/unit/test_io.py .........................                          [ 24%]

tests/unit/test_notebooks.py s..                                         [ 25%]

tests/unit/test_ops.py ................................................. [ 40%]

............................................                             [ 54%]

tests/unit/test_tf_dataloader.py FFFFFFFFFFFF                            [ 58%]

tests/unit/test_torch_dataloader.py ............FFFFFFFFFFFFFFFFFFFFFFFF [ 69%]

FFFFFF                                                                   [ 71%]

tests/unit/test_workflow.py ............................................ [ 85%]

................................................                         [100%]
=================================== FAILURES ===================================

_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_1_parquet_0')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b3565dd90>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b355f2f80>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

______________________ test_tf_gpu_dl[True-1-parquet-0.1] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_1_parquet_1')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b325e3690>

batch_size = 1, gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b324cfc20>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_10_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b3575a690>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357400e0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_tf_gpu_dl[True-10-parquet-0.1] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_10_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b3572f390>

batch_size = 10, gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35778b00>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_100_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b32468450>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3571add0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_tf_gpu_dl[True-100-parquet-0.1] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_100_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f20188cfdd0>

batch_size = 100, gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356c1290>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_1_parquet0')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1fc5e85810>

batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1fbc71a4d0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_tf_gpu_dl[False-1-parquet-0.1] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_1_parquet1')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b35724610>

batch_size = 1, gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1fbc728d40>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_10_parque0')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b32443a90>

batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35708440>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_tf_gpu_dl[False-10-parquet-0.1] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_10_parque1')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b3233cb50>

batch_size = 10, gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578fd40>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_100_parqu0')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b51a473d0>

batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356c1170>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

____________________ test_tf_gpu_dl[False-100-parquet-0.1] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_100_parqu1')

paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']

use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b32177810>

batch_size = 100, gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )


  processor.update_stats(dataset, record_stats=True)


tests/unit/test_tf_dataloader.py:57:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1fbc71ad40>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

___________________ test_gpu_preproc[True-True-parquet-0.01] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_par0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1fbc75dcd0>, dump = True

gpu_memory_frac = 0.01, engine = 'parquet', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578f560>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

___________________ test_gpu_preproc[True-True-parquet-0.1] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_par1')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b321eda10>, dump = True

gpu_memory_frac = 0.1, engine = 'parquet', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35789c20>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_gpu_preproc[True-True-csv-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_csv0')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b3564f1d0>, dump = True

gpu_memory_frac = 0.01, engine = 'csv', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35740cb0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_gpu_preproc[True-True-csv-0.1] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_csv1')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b32314450>, dump = True

gpu_memory_frac = 0.1, engine = 'csv', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356c15f0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

________________ test_gpu_preproc[True-True-csv-no-header-0.01] ________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_csv2')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b51a53810>, dump = True

gpu_memory_frac = 0.01, engine = 'csv-no-header', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3567ec20>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

________________ test_gpu_preproc[True-True-csv-no-header-0.1] _________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_csv3')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b32367a10>, dump = True

gpu_memory_frac = 0.1, engine = 'csv-no-header', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356e5cb0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

__________________ test_gpu_preproc[True-False-parquet-0.01] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_pa0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f20188cfdd0>, dump = False

gpu_memory_frac = 0.01, engine = 'parquet', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357405f0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

___________________ test_gpu_preproc[True-False-parquet-0.1] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_pa1')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b32308490>, dump = False

gpu_memory_frac = 0.1, engine = 'parquet', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578fb90>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

____________________ test_gpu_preproc[True-False-csv-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_cs0')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b35759fd0>, dump = False

gpu_memory_frac = 0.01, engine = 'csv', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356bd3b0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_gpu_preproc[True-False-csv-0.1] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_cs1')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b3221ad10>, dump = False

gpu_memory_frac = 0.1, engine = 'csv', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356ca7a0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_______________ test_gpu_preproc[True-False-csv-no-header-0.01] ________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_cs2')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1fbc709490>, dump = False

gpu_memory_frac = 0.01, engine = 'csv-no-header', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357899e0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

________________ test_gpu_preproc[True-False-csv-no-header-0.1] ________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_cs3')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b3572d750>, dump = False

gpu_memory_frac = 0.1, engine = 'csv-no-header', preprocessing = True
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357409e0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

__________________ test_gpu_preproc[False-True-parquet-0.01] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_pa0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b35704a90>, dump = True

gpu_memory_frac = 0.01, engine = 'parquet', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356bdcb0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

___________________ test_gpu_preproc[False-True-parquet-0.1] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_pa1')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b321eb9d0>, dump = True

gpu_memory_frac = 0.1, engine = 'parquet', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35713830>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

____________________ test_gpu_preproc[False-True-csv-0.01] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_cs0')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b35763dd0>, dump = True

gpu_memory_frac = 0.01, engine = 'csv', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3571aa70>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_____________________ test_gpu_preproc[False-True-csv-0.1] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_cs1')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b321712d0>, dump = True

gpu_memory_frac = 0.1, engine = 'csv', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578f9e0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_______________ test_gpu_preproc[False-True-csv-no-header-0.01] ________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_cs2')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b324e8d10>, dump = True

gpu_memory_frac = 0.01, engine = 'csv-no-header', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357139e0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

________________ test_gpu_preproc[False-True-csv-no-header-0.1] ________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_cs3')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b3233bbd0>, dump = True

gpu_memory_frac = 0.1, engine = 'csv-no-header', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35734c20>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

__________________ test_gpu_preproc[False-False-parquet-0.01] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_p0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1fbc711450>, dump = False

gpu_memory_frac = 0.01, engine = 'parquet', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578fb90>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

__________________ test_gpu_preproc[False-False-parquet-0.1] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_p1')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b323386d0>, dump = False

gpu_memory_frac = 0.1, engine = 'parquet', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356bdef0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

____________________ test_gpu_preproc[False-False-csv-0.01] ____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_c0')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b35758350>, dump = False

gpu_memory_frac = 0.01, engine = 'csv', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1fbc7287a0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

____________________ test_gpu_preproc[False-False-csv-0.1] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_c1')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b32402650>, dump = False

gpu_memory_frac = 0.1, engine = 'csv', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356e5320>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_______________ test_gpu_preproc[False-False-csv-no-header-0.01] _______________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_c2')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b355dcfd0>, dump = False

gpu_memory_frac = 0.01, engine = 'csv-no-header', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578fd40>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_______________ test_gpu_preproc[False-False-csv-no-header-0.1] ________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_c3')

df =      name-string    id  label         x         y

0          Zelda  1005    997  0.758334 -0.947663

1         Yvonne  ...    Norbert  1016    971  0.461613  0.235278

2160     Michael  1042   1046  0.764492  0.102107
[4321 rows x 5 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b1658d190>, dump = False

gpu_memory_frac = 0.1, engine = 'csv-no-header', preprocessing = False
@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()


  processor.update_stats(dataset)


tests/unit/test_torch_dataloader.py:97:

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3571a710>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_________________________ test_gpu_dl[1-parquet-1e-06] _________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_1_parquet_1e_06_0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b3578c3d0>, batch_size = 1

gpu_memory_frac = 1e-06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,


      num_out_files=2,


    )

tests/unit/test_torch_dataloader.py:204:

nvtabular/workflow.py:748: in apply

out_files_per_proc=out_files_per_proc,

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35708c20>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

__________________________ test_gpu_dl[1-parquet-0.1] __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_1_parquet_0_1_0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b32047450>, batch_size = 1

gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,


      num_out_files=2,


    )

tests/unit/test_torch_dataloader.py:204:

nvtabular/workflow.py:748: in apply

out_files_per_proc=out_files_per_proc,

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35762ef0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

________________________ test_gpu_dl[10-parquet-1e-06] _________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_10_parquet_1e_06_0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b321d1f10>, batch_size = 10

gpu_memory_frac = 1e-06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,


      num_out_files=2,


    )

tests/unit/test_torch_dataloader.py:204:

nvtabular/workflow.py:748: in apply

out_files_per_proc=out_files_per_proc,

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357890e0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_________________________ test_gpu_dl[10-parquet-0.1] __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_10_parquet_0_1_0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b1661e510>, batch_size = 10

gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,


      num_out_files=2,


    )

tests/unit/test_torch_dataloader.py:204:

nvtabular/workflow.py:748: in apply

out_files_per_proc=out_files_per_proc,

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578f4d0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

________________________ test_gpu_dl[100-parquet-1e-06] ________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_100_parquet_1e_06_0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b3203ebd0>, batch_size = 100

gpu_memory_frac = 1e-06, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,


      num_out_files=2,


    )

tests/unit/test_torch_dataloader.py:204:

nvtabular/workflow.py:748: in apply

out_files_per_proc=out_files_per_proc,

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356e58c0>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

_________________________ test_gpu_dl[100-parquet-0.1] _________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_100_parquet_0_1_0')

df =      name-cat name-string    id  label         x         y

0      Ursula       Zelda  1005    997  0.758334 -0.947663

...da  1016    971  0.461613  0.235278

2160  Michael     Charlie  1042   1046  0.764492  0.102107
[4321 rows x 6 columns]

dataset = <nvtabular.io.Dataset object at 0x7f1b3236b290>, batch_size = 100

gpu_memory_frac = 0.1, engine = 'parquet'
@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,


      num_out_files=2,


    )

tests/unit/test_torch_dataloader.py:204:

nvtabular/workflow.py:748: in apply

out_files_per_proc=out_files_per_proc,

nvtabular/workflow.py:825: in update_stats

self.exec_phase(idx, record_stats=record_stats)

nvtabular/workflow.py:653: in exec_phase

self._aggregated_dask_transform(transforms)

nvtabular/workflow.py:632: in _aggregated_dask_transform

meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)

nvtabular/ops.py:115: in apply_op

new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)

/opt/conda/lib/python3.7/contextlib.py:74: in inner

return func(*args, **kwds)

nvtabular/ops.py:601: in op_logic

new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna

data = self._column.fillna(value)

self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3571a680>

fill_value = 999.5
def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(


                  type(fill_value).__name__, self.dtype.name


                )
            )

E               TypeError: Cannot safely cast non-equivalent float to int64
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError

=============================== warnings summary ===============================

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_notebooks.py: 374 tests with warnings

/var/jenkins_home/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_util.py:523: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.

tensor_proto.tensor_content = nparray.tostring()
tests/unit/test_notebooks.py::test_rossman_example

/var/jenkins_home/.local/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/data_structures.py:720: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working

if not isinstance(wrapped_dict, collections.Mapping):
tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06]

/var/jenkins_home/workspace/nvtabular/nvtabular/io.py:843: UserWarning: Row group size 144306 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-0.1]

tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-0.1]

tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-0.1]

/var/jenkins_home/workspace/nvtabular/nvtabular/workflow.py:730: UserWarning: num_out_files is deprecated. Use out_files_per_proc

warnings.warn("num_out_files is deprecated. Use out_files_per_proc")
-- Docs: https://docs.pytest.org/en/latest/warnings.html
----------- coverage: platform linux, python 3.7.4-final-0 -----------

Name                            Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py               6      0      0      0   100%

nvtabular/categorify.py           213     42    108     27    75%   37->38, 38, 44->60, 45->46, 46, 49->55, 55->58, 58->59, 59, 81->82, 82-84, 100->103, 103, 112->115, 115, 123->126, 129->132, 132->133, 133-135, 137->138, 138-154, 156->160, 160, 167->169, 171->175, 175-177, 186->188, 190->199, 193->197, 199-201, 222->223, 223, 226->227, 227, 228->231, 231-233, 312->313, 313, 314->315, 315, 332->343, 363->368, 366->367, 367

nvtabular/ds_writer.py             82      2     34      5    94%   69->77, 94->95, 95-96, 99->101, 104->102, 113->112

nvtabular/io.py                   539     43    188     24    89%   71->72, 72, 73->74, 74, 76->78, 78-81, 117, 139->145, 160->162, 162-163, 166->exit, 170, 193->196, 213->224, 221->223, 310->311, 311-312, 328, 334, 360->361, 361, 390, 416, 512->513, 513, 540->541, 541, 547-555, 597-599, 608->611, 612-614, 673->678, 737->739, 739, 744->745, 745, 752->753, 753, 761->773, 766->771, 771-773, 865->867, 867-869, 877->879, 879, 900->901, 901, 976->977, 977

nvtabular/ops.py                  409     28     96     19    90%   51->50, 77-81, 103->104, 104, 121->122, 122, 216, 274, 324, 351->352, 352, 407->408, 408, 423->424, 424-426, 427->430, 430, 476->477, 477, 484->483, 512, 517->518, 518, 527->529, 529-530, 566->567, 567, 596->597, 597, 600->602, 602-603, 677->678, 678, 700, 892->893, 893, 908->909, 909, 968->969, 969, 970->971, 971

nvtabular/tf_dataloader.py        122     17     48      9    84%   16->18, 27-28, 31->34, 34-36, 187->188, 188-189, 197->198, 198, 207->210, 210, 220->223, 223, 232->237, 256, 264-267, 270-271, 279->280, 280, 291, 318->321

nvtabular/torch_dataloader.py     152     49     56     11    62%   72->73, 73, 75->76, 76-77, 80->81, 81, 91-92, 96->97, 97, 111->114, 114-115, 120-124, 144->146, 146->148, 148->exit, 153->154, 154, 172->173, 173, 175->178, 209-215, 218, 221-225, 230-234, 248-249, 265-277, 280-282, 285

nvtabular/worker.py                29     11     14      1    53%   39-43, 56->57, 57, 62-67

nvtabular/workflow.py             450     21    250     28    92%   104->108, 108, 111->113, 114->115, 115-119, 148->exit, 164->exit, 180->exit, 196->exit, 238->241, 241, 246->248, 265->267, 314->315, 315, 338->335, 374->375, 375, 393->396, 396, 426->429, 429, 534->535, 535-537, 551->550, 612->613, 613, 648->649, 649, 724->725, 725-727, 731->735, 738->739, 739, 753->754, 754-756, 757->768, 763->768, 776->777, 777, 778->exit, 797->788

setup.py                            2      2      0      0     0%   18-20
TOTAL                            2004    215    794    124    86%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 85.74%

=========================== short test summary info ============================

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.1]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.1]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.1]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.1]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.1]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]

FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-parquet-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-parquet-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-csv-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-csv-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-csv-no-header-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-csv-no-header-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-parquet-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-parquet-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-csv-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-csv-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-csv-no-header-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-csv-no-header-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-parquet-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-parquet-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-csv-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-csv-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-csv-no-header-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-csv-no-header-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-parquet-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-parquet-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-csv-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-csv-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-csv-no-header-0.01]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-csv-no-header-0.1]

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06] - Ty...

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-0.1] - Type...

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06] - T...

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-0.1] - Typ...

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06] - ...

FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-0.1] - Ty...

===== 42 failed, 278 passed, 2 skipped, 385 warnings in 233.65s (0:03:53) ======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins7029286289148722756.sh

benfred · 2020-07-20T16:22:19Z

rerun tests

nvidia-merlin-bot · 2020-07-20T16:29:13Z

Click to view CI Results

GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/295/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 0b1a93b4dcae8509937a6ffb105f1911bafb5f58 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7404385790885753295.sh
No Path provided. Nothing to do 😴
Skipped 2 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular, inifile: setup.cfg
plugins: hypothesis-5.20.2, cov-2.10.0
collected 321 items / 1 skipped / 320 selected
tests/unit/test_dask_nvt.py ............................................ [ 13%]

..........                                                               [ 16%]

tests/unit/test_io.py .........................                          [ 24%]

tests/unit/test_notebooks.py s..                                         [ 25%]

tests/unit/test_ops.py ................................................. [ 40%]

............................................                             [ 54%]

tests/unit/test_tf_dataloader.py ............                            [ 58%]

tests/unit/test_torch_dataloader.py .................................... [ 69%]

......                                                                   [ 71%]

tests/unit/test_workflow.py ............................................ [ 85%]

................................................                         [100%]
=============================== warnings summary ===============================

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_notebooks.py: 374 tests with warnings

/var/jenkins_home/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_util.py:523: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.

tensor_proto.tensor_content = nparray.tostring()
tests/unit/test_notebooks.py::test_rossman_example

/var/jenkins_home/.local/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/data_structures.py:720: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working

if not isinstance(wrapped_dict, collections.Mapping):
tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06]

/var/jenkins_home/workspace/nvtabular/nvtabular/io.py:843: UserWarning: Row group size 143831 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-0.1]

tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-0.1]

tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06]

tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-0.1]

/var/jenkins_home/workspace/nvtabular/nvtabular/workflow.py:730: UserWarning: num_out_files is deprecated. Use out_files_per_proc

warnings.warn("num_out_files is deprecated. Use out_files_per_proc")
-- Docs: https://docs.pytest.org/en/latest/warnings.html
----------- coverage: platform linux, python 3.7.4-final-0 -----------

Name                            Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py               6      0      0      0   100%

nvtabular/categorify.py           213     42    108     27    75%   37->38, 38, 44->60, 45->46, 46, 49->55, 55->58, 58->59, 59, 81->82, 82-84, 100->103, 103, 112->115, 115, 123->126, 129->132, 132->133, 133-135, 137->138, 138-154, 156->160, 160, 167->169, 171->175, 175-177, 186->188, 190->199, 193->197, 199-201, 222->223, 223, 226->227, 227, 228->231, 231-233, 312->313, 313, 314->315, 315, 332->343, 363->368, 366->367, 367

nvtabular/ds_writer.py             82      2     34      5    94%   69->77, 94->95, 95-96, 99->101, 104->102, 113->112

nvtabular/io.py                   539     37    188     21    91%   73->74, 74, 78->81, 81, 117, 139->145, 193->196, 213->224, 221->223, 310->311, 311-312, 328, 334, 360->361, 361, 390, 416, 512->513, 513, 540->541, 541, 547-555, 597-599, 608->611, 612-614, 673->678, 737->739, 739, 744->745, 745, 752->753, 753, 761->773, 766->771, 771-773, 865->867, 867-869, 877->879, 879, 900->901, 901, 976->977, 977

nvtabular/ops.py                  409     25     96     17    91%   51->50, 77-81, 121->122, 122, 216, 274, 324, 351->352, 352, 407->408, 408, 423->424, 424-426, 427->430, 430, 476->477, 477, 484->483, 512, 517->518, 518, 527->529, 529-530, 566->567, 567, 596->597, 597, 677->678, 678, 700, 892->893, 893, 908->909, 909, 968->969, 969, 970->971, 971

nvtabular/tf_dataloader.py        122     13     48      8    88%   16->18, 27-28, 31->34, 34-36, 187->188, 188-189, 197->198, 198, 207->210, 210, 220->223, 223, 232->237, 256, 264->265, 265, 270-271

nvtabular/torch_dataloader.py     152     13     56     14    85%   72->73, 73, 75->76, 76-77, 80->81, 81, 91-92, 96->97, 97, 114->116, 144->146, 146->148, 148->exit, 153->154, 154, 172->173, 173, 175->178, 218, 265->267, 267->268, 268, 269->270, 270, 285

nvtabular/worker.py                29     11     14      1    53%   39-43, 56->57, 57, 62-67

nvtabular/workflow.py             450     21    250     28    92%   104->108, 108, 111->113, 114->115, 115-119, 148->exit, 164->exit, 180->exit, 196->exit, 238->241, 241, 246->248, 265->267, 314->315, 315, 338->335, 374->375, 375, 393->396, 396, 426->429, 429, 534->535, 535-537, 551->550, 612->613, 613, 648->649, 649, 724->725, 725-727, 731->735, 738->739, 739, 753->754, 754-756, 757->768, 763->768, 776->777, 777, 778->exit, 797->788

setup.py                            2      2      0      0     0%   18-20
TOTAL                            2004    166    794    121    88%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 88.46%

=========== 320 passed, 2 skipped, 385 warnings in 390.95s (0:06:30) ===========

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins1033407326286460975.sh

Cudf 0.15 recently disabled iterating over the values of an index, which broke a number of ops in nvtabular. (change rapidsai/cudf#5340) Fix by using values_host.

Fix for working with cudf 0.15

8a8e8f7

Cudf 0.15 recently disabled iterating over the values of an index, which broke a number of ops in nvtabular. (change rapidsai/cudf#5340) Fix by using values_host.

rjzamora approved these changes Jul 18, 2020

View reviewed changes

benfred merged commit 57ee6de into NVIDIA-Merlin:master Jul 20, 2020

benfred deleted the cudf_iter_fix branch July 14, 2021 04:37

mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022

Fix for working with cudf 0.15 (#159)

d767bad

Cudf 0.15 recently disabled iterating over the values of an index, which broke a number of ops in nvtabular. (change rapidsai/cudf#5340) Fix by using values_host.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for working with cudf 0.15 #159

Fix for working with cudf 0.15 #159

benfred commented Jul 17, 2020

nvidia-merlin-bot commented Jul 17, 2020

benfred commented Jul 17, 2020

nvidia-merlin-bot commented Jul 17, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

benfred commented Jul 20, 2020

nvidia-merlin-bot commented Jul 20, 2020

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

Fix for working with cudf 0.15 #159

Fix for working with cudf 0.15 #159

Conversation

benfred commented Jul 17, 2020

nvidia-merlin-bot commented Jul 17, 2020

benfred commented Jul 17, 2020

nvidia-merlin-bot commented Jul 17, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

jperez999 commented Jul 19, 2020

nvidia-merlin-bot commented Jul 19, 2020

----------- coverage: platform linux, python 3.7.4-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred commented Jul 20, 2020

nvidia-merlin-bot commented Jul 20, 2020

----------- coverage: platform linux, python 3.7.4-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing