Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for working with cudf 0.15 #159

Merged
merged 1 commit into from
Jul 20, 2020
Merged

Conversation

benfred
Copy link
Member

@benfred benfred commented Jul 17, 2020

Cudf 0.15 recently disabled iterating over the values of an index, which broke
a number of ops in nvtabular. (change rapidsai/cudf#5340)
Fix by using values_host.

Cudf 0.15 recently disabled iterating over the values of an index, which broke
a number of ops in nvtabular. (change rapidsai/cudf#5340)
Fix by using values_host.
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/286/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 0b1a93b4dcae8509937a6ffb105f1911bafb5f58 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2413563116796183923.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins3587705493963020004.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

@benfred
Copy link
Member Author

benfred commented Jul 17, 2020

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/287/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins904949477396641404.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins9087851760922571547.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

@jperez999
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/288/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins885494632642895385.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins933950288347073525.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

@jperez999
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/289/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8359743667228744018.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins8732341273937868876.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

@jperez999
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/290/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1998524369983926254.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins2701989822688980317.sh
mv: cannot move '/lib/libcuda.so' to '/lib/libcuda.so-conda-nvcc-backup': Permission denied
ln: failed to create symbolic link '/lib/libcuda.so': File exists

@jperez999
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/291/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7087152540603580233.sh
/tmp/jenkins7087152540603580233.sh: line 5: black: command not found
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[nvtabular_tests] $ /bin/bash /tmp/jenkins1548896684945165098.sh

@jperez999
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/292/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8953532366241069577.sh
No Path provided. Nothing to do 😴
             _                 _
            (_) ___  ___  _ __| |_
            | |/ _/ / _ \/ '__  _/
            | |\__ \/\_\/| |  | |_
            |_|\___/\___/\_/   \_/

  isort your imports, so you don't have to.

                VERSION 5.1.3

Nothing to do: no files or paths have have been passed in!

Try one of the following:

`isort .` - sort all Python files, starting from the current directory, recursively.
`isort . --interactive` - Do the same, but ask before making any changes.
`isort . --check --diff` - Check to see if imports are correctly sorted within this project.
`isort --help` - In-depth information about isort's available command-line options.

Visit https://timothycrosley.github.io/isort/ for complete information about how to use isort.

============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular, inifile: setup.cfg
plugins: hypothesis-5.20.2, cov-2.10.0
collected 321 items / 1 skipped / 320 selected

tests/unit/test_dask_nvt.py ............................................ [ 13%]
.......... [ 16%]
tests/unit/test_io.py ......................... [ 24%]
tests/unit/test_notebooks.py s.. [ 25%]
tests/unit/test_ops.py ................................................. [ 40%]
............................................ [ 54%]
tests/unit/test_tf_dataloader.py FFFFFFFFFFFF [ 58%]
tests/unit/test_torch_dataloader.py ............FFFFFFFFFFFFFFFFFFFFFFFF [ 69%]
FFFFFF [ 71%]
tests/unit/test_workflow.py ............................................ [ 85%]
................................................ [100%]

=================================== FAILURES ===================================
_____________________ test_tf_gpu_dl[True-1-parquet-0.01] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_1_parquet_0')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b3565dd90>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b355f2f80>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
______________________ test_tf_gpu_dl[True-1-parquet-0.1] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_1_parquet_1')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b325e3690>
batch_size = 1, gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b324cfc20>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_tf_gpu_dl[True-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_10_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b3575a690>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357400e0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_tf_gpu_dl[True-10-parquet-0.1] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_10_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b3572f390>
batch_size = 10, gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35778b00>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
____________________ test_tf_gpu_dl[True-100-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_100_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f1b32468450>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3571add0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_tf_gpu_dl[True-100-parquet-0.1] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_True_100_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = True, dataset = <nvtabular.io.Dataset object at 0x7f20188cfdd0>
batch_size = 100, gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356c1290>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_1_parquet0')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1fc5e85810>
batch_size = 1, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1fbc71a4d0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_tf_gpu_dl[False-1-parquet-0.1] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_1_parquet1')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b35724610>
batch_size = 1, gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1fbc728d40>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
____________________ test_tf_gpu_dl[False-10-parquet-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_10_parque0')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b32443a90>
batch_size = 10, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35708440>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_tf_gpu_dl[False-10-parquet-0.1] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_10_parque1')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b3233cb50>
batch_size = 10, gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578fd40>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.01] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_100_parqu0')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b51a473d0>
batch_size = 100, gpu_memory_frac = 0.01, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356c1170>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
____________________ test_tf_gpu_dl[False-100-parquet-0.1] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_tf_gpu_dl_False_100_parqu1')
paths = ['/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-0/parquet0/dataset-1.parquet']
use_paths = False, dataset = <nvtabular.io.Dataset object at 0x7f1b32177810>
batch_size = 100, gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
@pytest.mark.parametrize("use_paths", [True, False])
def test_tf_gpu_dl(tmpdir, paths, use_paths, dataset, batch_size, gpu_memory_frac, engine):
    cont_names = ["x", "y", "id"]
    cat_names = ["name-string"]
    label_name = ["label"]
    if engine == "parquet":
        cat_names.append("name-cat")

    columns = cont_names + cat_names

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)
    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()

    data_itr = tf_dataloader.KerasSequenceDataset(
        paths if use_paths else dataset,
        columns=columns,
        batch_size=batch_size,
        buffer_size=gpu_memory_frac,
        label_name=label_name[0],
        engine=engine,
        shuffle=False,
    )
  processor.update_stats(dataset, record_stats=True)

tests/unit/test_tf_dataloader.py:57:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1fbc71ad40>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
___________________ test_gpu_preproc[True-True-parquet-0.01] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_par0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1fbc75dcd0>, dump = True
gpu_memory_frac = 0.01, engine = 'parquet', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578f560>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
___________________ test_gpu_preproc[True-True-parquet-0.1] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_par1')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b321eda10>, dump = True
gpu_memory_frac = 0.1, engine = 'parquet', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35789c20>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_gpu_preproc[True-True-csv-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_csv0')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b3564f1d0>, dump = True
gpu_memory_frac = 0.01, engine = 'csv', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35740cb0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_gpu_preproc[True-True-csv-0.1] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_csv1')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b32314450>, dump = True
gpu_memory_frac = 0.1, engine = 'csv', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356c15f0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
________________ test_gpu_preproc[True-True-csv-no-header-0.01] ________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_csv2')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b51a53810>, dump = True
gpu_memory_frac = 0.01, engine = 'csv-no-header', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3567ec20>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
________________ test_gpu_preproc[True-True-csv-no-header-0.1] _________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_True_csv3')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b32367a10>, dump = True
gpu_memory_frac = 0.1, engine = 'csv-no-header', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356e5cb0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
__________________ test_gpu_preproc[True-False-parquet-0.01] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_pa0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f20188cfdd0>, dump = False
gpu_memory_frac = 0.01, engine = 'parquet', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357405f0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
___________________ test_gpu_preproc[True-False-parquet-0.1] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_pa1')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b32308490>, dump = False
gpu_memory_frac = 0.1, engine = 'parquet', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578fb90>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
____________________ test_gpu_preproc[True-False-csv-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_cs0')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b35759fd0>, dump = False
gpu_memory_frac = 0.01, engine = 'csv', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356bd3b0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_gpu_preproc[True-False-csv-0.1] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_cs1')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b3221ad10>, dump = False
gpu_memory_frac = 0.1, engine = 'csv', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356ca7a0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_______________ test_gpu_preproc[True-False-csv-no-header-0.01] ________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_cs2')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1fbc709490>, dump = False
gpu_memory_frac = 0.01, engine = 'csv-no-header', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357899e0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
________________ test_gpu_preproc[True-False-csv-no-header-0.1] ________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_True_False_cs3')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b3572d750>, dump = False
gpu_memory_frac = 0.1, engine = 'csv-no-header', preprocessing = True

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357409e0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
__________________ test_gpu_preproc[False-True-parquet-0.01] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_pa0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b35704a90>, dump = True
gpu_memory_frac = 0.01, engine = 'parquet', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356bdcb0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
___________________ test_gpu_preproc[False-True-parquet-0.1] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_pa1')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b321eb9d0>, dump = True
gpu_memory_frac = 0.1, engine = 'parquet', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35713830>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
____________________ test_gpu_preproc[False-True-csv-0.01] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_cs0')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b35763dd0>, dump = True
gpu_memory_frac = 0.01, engine = 'csv', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3571aa70>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_____________________ test_gpu_preproc[False-True-csv-0.1] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_cs1')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b321712d0>, dump = True
gpu_memory_frac = 0.1, engine = 'csv', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578f9e0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_______________ test_gpu_preproc[False-True-csv-no-header-0.01] ________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_cs2')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b324e8d10>, dump = True
gpu_memory_frac = 0.01, engine = 'csv-no-header', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357139e0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
________________ test_gpu_preproc[False-True-csv-no-header-0.1] ________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_True_cs3')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b3233bbd0>, dump = True
gpu_memory_frac = 0.1, engine = 'csv-no-header', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35734c20>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
__________________ test_gpu_preproc[False-False-parquet-0.01] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_p0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1fbc711450>, dump = False
gpu_memory_frac = 0.01, engine = 'parquet', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578fb90>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
__________________ test_gpu_preproc[False-False-parquet-0.1] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_p1')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b323386d0>, dump = False
gpu_memory_frac = 0.1, engine = 'parquet', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356bdef0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
____________________ test_gpu_preproc[False-False-csv-0.01] ____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_c0')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b35758350>, dump = False
gpu_memory_frac = 0.01, engine = 'csv', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1fbc7287a0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
____________________ test_gpu_preproc[False-False-csv-0.1] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_c1')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b32402650>, dump = False
gpu_memory_frac = 0.1, engine = 'csv', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356e5320>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_______________ test_gpu_preproc[False-False-csv-no-header-0.01] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_c2')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b355dcfd0>, dump = False
gpu_memory_frac = 0.01, engine = 'csv-no-header', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578fd40>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_______________ test_gpu_preproc[False-False-csv-no-header-0.1] ________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_preproc_False_False_c3')
df = name-string id label x y
0 Zelda 1005 997 0.758334 -0.947663
1 Yvonne ... Norbert 1016 971 0.461613 0.235278
2160 Michael 1042 1046 0.764492 0.102107

[4321 rows x 5 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b1658d190>, dump = False
gpu_memory_frac = 0.1, engine = 'csv-no-header', preprocessing = False

@pytest.mark.parametrize("gpu_memory_frac", [0.01, 0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("preprocessing", [True, False])
def test_gpu_preproc(tmpdir, df, dataset, dump, gpu_memory_frac, engine, preprocessing):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian(), ops.LogOp(preprocessing=preprocessing)])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())
    processor.finalize()
  processor.update_stats(dataset)

tests/unit/test_torch_dataloader.py:97:


nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3571a710>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_________________________ test_gpu_dl[1-parquet-1e-06] _________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_1_parquet_1e_06_0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b3578c3d0>, batch_size = 1
gpu_memory_frac = 1e-06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,
      num_out_files=2,
    )

tests/unit/test_torch_dataloader.py:204:


nvtabular/workflow.py:748: in apply
out_files_per_proc=out_files_per_proc,
nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35708c20>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
__________________________ test_gpu_dl[1-parquet-0.1] __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_1_parquet_0_1_0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b32047450>, batch_size = 1
gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,
      num_out_files=2,
    )

tests/unit/test_torch_dataloader.py:204:


nvtabular/workflow.py:748: in apply
out_files_per_proc=out_files_per_proc,
nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b35762ef0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
________________________ test_gpu_dl[10-parquet-1e-06] _________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_10_parquet_1e_06_0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b321d1f10>, batch_size = 10
gpu_memory_frac = 1e-06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,
      num_out_files=2,
    )

tests/unit/test_torch_dataloader.py:204:


nvtabular/workflow.py:748: in apply
out_files_per_proc=out_files_per_proc,
nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b357890e0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_________________________ test_gpu_dl[10-parquet-0.1] __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_10_parquet_0_1_0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b1661e510>, batch_size = 10
gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,
      num_out_files=2,
    )

tests/unit/test_torch_dataloader.py:204:


nvtabular/workflow.py:748: in apply
out_files_per_proc=out_files_per_proc,
nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3578f4d0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
________________________ test_gpu_dl[100-parquet-1e-06] ________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_100_parquet_1e_06_0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b3203ebd0>, batch_size = 100
gpu_memory_frac = 1e-06, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,
      num_out_files=2,
    )

tests/unit/test_torch_dataloader.py:204:


nvtabular/workflow.py:748: in apply
out_files_per_proc=out_files_per_proc,
nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b356e58c0>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
_________________________ test_gpu_dl[100-parquet-0.1] _________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_gpu_dl_100_parquet_0_1_0')
df = name-cat name-string id label x y
0 Ursula Zelda 1005 997 0.758334 -0.947663
...da 1016 971 0.461613 0.235278
2160 Michael Charlie 1042 1046 0.764492 0.102107

[4321 rows x 6 columns]
dataset = <nvtabular.io.Dataset object at 0x7f1b3236b290>, batch_size = 100
gpu_memory_frac = 0.1, engine = 'parquet'

@pytest.mark.parametrize("gpu_memory_frac", [0.000001, 0.1])
@pytest.mark.parametrize("engine", ["parquet"])
@pytest.mark.parametrize("batch_size", [1, 10, 100])
def test_gpu_dl(tmpdir, df, dataset, batch_size, gpu_memory_frac, engine):
    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    processor = nvt.Workflow(cat_names=cat_names, cont_names=cont_names, label_name=label_name)

    processor.add_feature([ops.FillMedian()])
    processor.add_preprocess(ops.Normalize())
    processor.add_preprocess(ops.Categorify())

    output_train = os.path.join(tmpdir, "train/")
    os.mkdir(output_train)

    processor.apply(
        dataset,
        apply_offline=True,
        record_stats=True,
        shuffle=True,
        output_path=output_train,
      num_out_files=2,
    )

tests/unit/test_torch_dataloader.py:204:


nvtabular/workflow.py:748: in apply
out_files_per_proc=out_files_per_proc,
nvtabular/workflow.py:825: in update_stats
self.exec_phase(idx, record_stats=record_stats)
nvtabular/workflow.py:653: in exec_phase
self._aggregated_dask_transform(transforms)
nvtabular/workflow.py:632: in _aggregated_dask_transform
meta = logic(meta, columns_ctx, cols_grp, target_cols, stats_context)
nvtabular/ops.py:115: in apply_op
new_gdf = self.op_logic(gdf, target_columns, stats_context=stats_context)
/opt/conda/lib/python3.7/contextlib.py:74: in inner
return func(*args, **kwds)
nvtabular/ops.py:601: in op_logic
new_gdf[col] = gdf[col].fillna(stats_context["medians"][col])
/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/series.py:1724: in fillna
data = self._column.fillna(value)


self = <cudf.core.column.numerical.NumericalColumn object at 0x7f1b3571a680>
fill_value = 999.5

def fillna(self, fill_value):
    """
    Fill null values with *fill_value*
    """
    if np.isscalar(fill_value):
        # castsafely to the same dtype as self
        fill_value_casted = self.dtype.type(fill_value)
        if not np.isnan(fill_value) and (fill_value_casted != fill_value):
            raise TypeError(
                "Cannot safely cast non-equivalent {} to {}".format(
                  type(fill_value).__name__, self.dtype.name
                )
            )

E TypeError: Cannot safely cast non-equivalent float to int64

/opt/conda/lib/python3.7/site-packages/cudf-0.15.0a0+2961.gce826c57c.dirty-py3.7-linux-x86_64.egg/cudf/core/column/numerical.py:299: TypeError
=============================== warnings summary ===============================
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_notebooks.py: 374 tests with warnings
/var/jenkins_home/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_util.py:523: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
tensor_proto.tensor_content = nparray.tostring()

tests/unit/test_notebooks.py::test_rossman_example
/var/jenkins_home/.local/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/data_structures.py:720: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
if not isinstance(wrapped_dict, collections.Mapping):

tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06]
/var/jenkins_home/workspace/nvtabular/nvtabular/io.py:843: UserWarning: Row group size 144306 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-0.1]
tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-0.1]
tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-0.1]
/var/jenkins_home/workspace/nvtabular/nvtabular/workflow.py:730: UserWarning: num_out_files is deprecated. Use out_files_per_proc
warnings.warn("num_out_files is deprecated. Use out_files_per_proc")

-- Docs: https://docs.pytest.org/en/latest/warnings.html

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 6 0 0 0 100%
nvtabular/categorify.py 213 42 108 27 75% 37->38, 38, 44->60, 45->46, 46, 49->55, 55->58, 58->59, 59, 81->82, 82-84, 100->103, 103, 112->115, 115, 123->126, 129->132, 132->133, 133-135, 137->138, 138-154, 156->160, 160, 167->169, 171->175, 175-177, 186->188, 190->199, 193->197, 199-201, 222->223, 223, 226->227, 227, 228->231, 231-233, 312->313, 313, 314->315, 315, 332->343, 363->368, 366->367, 367
nvtabular/ds_writer.py 82 2 34 5 94% 69->77, 94->95, 95-96, 99->101, 104->102, 113->112
nvtabular/io.py 539 43 188 24 89% 71->72, 72, 73->74, 74, 76->78, 78-81, 117, 139->145, 160->162, 162-163, 166->exit, 170, 193->196, 213->224, 221->223, 310->311, 311-312, 328, 334, 360->361, 361, 390, 416, 512->513, 513, 540->541, 541, 547-555, 597-599, 608->611, 612-614, 673->678, 737->739, 739, 744->745, 745, 752->753, 753, 761->773, 766->771, 771-773, 865->867, 867-869, 877->879, 879, 900->901, 901, 976->977, 977
nvtabular/ops.py 409 28 96 19 90% 51->50, 77-81, 103->104, 104, 121->122, 122, 216, 274, 324, 351->352, 352, 407->408, 408, 423->424, 424-426, 427->430, 430, 476->477, 477, 484->483, 512, 517->518, 518, 527->529, 529-530, 566->567, 567, 596->597, 597, 600->602, 602-603, 677->678, 678, 700, 892->893, 893, 908->909, 909, 968->969, 969, 970->971, 971
nvtabular/tf_dataloader.py 122 17 48 9 84% 16->18, 27-28, 31->34, 34-36, 187->188, 188-189, 197->198, 198, 207->210, 210, 220->223, 223, 232->237, 256, 264-267, 270-271, 279->280, 280, 291, 318->321
nvtabular/torch_dataloader.py 152 49 56 11 62% 72->73, 73, 75->76, 76-77, 80->81, 81, 91-92, 96->97, 97, 111->114, 114-115, 120-124, 144->146, 146->148, 148->exit, 153->154, 154, 172->173, 173, 175->178, 209-215, 218, 221-225, 230-234, 248-249, 265-277, 280-282, 285
nvtabular/worker.py 29 11 14 1 53% 39-43, 56->57, 57, 62-67
nvtabular/workflow.py 450 21 250 28 92% 104->108, 108, 111->113, 114->115, 115-119, 148->exit, 164->exit, 180->exit, 196->exit, 238->241, 241, 246->248, 265->267, 314->315, 315, 338->335, 374->375, 375, 393->396, 396, 426->429, 429, 534->535, 535-537, 551->550, 612->613, 613, 648->649, 649, 724->725, 725-727, 731->735, 738->739, 739, 753->754, 754-756, 757->768, 763->768, 776->777, 777, 778->exit, 797->788
setup.py 2 2 0 0 0% 18-20

TOTAL 2004 215 794 124 86%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 85.74%
=========================== short test summary info ============================
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-1-parquet-0.1]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-10-parquet-0.1]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[True-100-parquet-0.1]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-1-parquet-0.1]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-10-parquet-0.1]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.01]
FAILED tests/unit/test_tf_dataloader.py::test_tf_gpu_dl[False-100-parquet-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-parquet-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-parquet-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-csv-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-csv-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-csv-no-header-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-True-csv-no-header-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-parquet-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-parquet-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-csv-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-csv-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-csv-no-header-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[True-False-csv-no-header-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-parquet-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-parquet-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-csv-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-csv-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-csv-no-header-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-True-csv-no-header-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-parquet-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-parquet-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-csv-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-csv-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-csv-no-header-0.01]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_preproc[False-False-csv-no-header-0.1]
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06] - Ty...
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-0.1] - Type...
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06] - T...
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-0.1] - Typ...
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06] - ...
FAILED tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-0.1] - Ty...
===== 42 failed, 278 passed, 2 skipped, 385 warnings in 233.65s (0:03:53) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7029286289148722756.sh

@benfred
Copy link
Member Author

benfred commented Jul 20, 2020

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #159 of commit 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba, no merge conflicts.
Running as SYSTEM
Setting status of 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/295/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/159/*:refs/remotes/origin/pr/159/* # timeout=10
 > git rev-parse 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba^{commit} # timeout=10
Checking out Revision 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a8e8f74cf03a1399a9cb0be3f1de984fc1403ba # timeout=10
Commit message: "Fix for working with cudf 0.15"
 > git rev-list --no-walk 0b1a93b4dcae8509937a6ffb105f1911bafb5f58 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7404385790885753295.sh
No Path provided. Nothing to do 😴
Skipped 2 files
============================= test session starts ==============================
platform linux -- Python 3.7.4, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /var/jenkins_home/workspace/nvtabular, inifile: setup.cfg
plugins: hypothesis-5.20.2, cov-2.10.0
collected 321 items / 1 skipped / 320 selected

tests/unit/test_dask_nvt.py ............................................ [ 13%]
.......... [ 16%]
tests/unit/test_io.py ......................... [ 24%]
tests/unit/test_notebooks.py s.. [ 25%]
tests/unit/test_ops.py ................................................. [ 40%]
............................................ [ 54%]
tests/unit/test_tf_dataloader.py ............ [ 58%]
tests/unit/test_torch_dataloader.py .................................... [ 69%]
...... [ 71%]
tests/unit/test_workflow.py ............................................ [ 85%]
................................................ [100%]

=============================== warnings summary ===============================
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_notebooks.py: 374 tests with warnings
/var/jenkins_home/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_util.py:523: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
tensor_proto.tensor_content = nparray.tostring()

tests/unit/test_notebooks.py::test_rossman_example
/var/jenkins_home/.local/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/data_structures.py:720: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
if not isinstance(wrapped_dict, collections.Mapping):

tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06]
/var/jenkins_home/workspace/nvtabular/nvtabular/io.py:843: UserWarning: Row group size 143831 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[1-parquet-0.1]
tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[10-parquet-0.1]
tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-1e-06]
tests/unit/test_torch_dataloader.py::test_gpu_dl[100-parquet-0.1]
/var/jenkins_home/workspace/nvtabular/nvtabular/workflow.py:730: UserWarning: num_out_files is deprecated. Use out_files_per_proc
warnings.warn("num_out_files is deprecated. Use out_files_per_proc")

-- Docs: https://docs.pytest.org/en/latest/warnings.html

----------- coverage: platform linux, python 3.7.4-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 6 0 0 0 100%
nvtabular/categorify.py 213 42 108 27 75% 37->38, 38, 44->60, 45->46, 46, 49->55, 55->58, 58->59, 59, 81->82, 82-84, 100->103, 103, 112->115, 115, 123->126, 129->132, 132->133, 133-135, 137->138, 138-154, 156->160, 160, 167->169, 171->175, 175-177, 186->188, 190->199, 193->197, 199-201, 222->223, 223, 226->227, 227, 228->231, 231-233, 312->313, 313, 314->315, 315, 332->343, 363->368, 366->367, 367
nvtabular/ds_writer.py 82 2 34 5 94% 69->77, 94->95, 95-96, 99->101, 104->102, 113->112
nvtabular/io.py 539 37 188 21 91% 73->74, 74, 78->81, 81, 117, 139->145, 193->196, 213->224, 221->223, 310->311, 311-312, 328, 334, 360->361, 361, 390, 416, 512->513, 513, 540->541, 541, 547-555, 597-599, 608->611, 612-614, 673->678, 737->739, 739, 744->745, 745, 752->753, 753, 761->773, 766->771, 771-773, 865->867, 867-869, 877->879, 879, 900->901, 901, 976->977, 977
nvtabular/ops.py 409 25 96 17 91% 51->50, 77-81, 121->122, 122, 216, 274, 324, 351->352, 352, 407->408, 408, 423->424, 424-426, 427->430, 430, 476->477, 477, 484->483, 512, 517->518, 518, 527->529, 529-530, 566->567, 567, 596->597, 597, 677->678, 678, 700, 892->893, 893, 908->909, 909, 968->969, 969, 970->971, 971
nvtabular/tf_dataloader.py 122 13 48 8 88% 16->18, 27-28, 31->34, 34-36, 187->188, 188-189, 197->198, 198, 207->210, 210, 220->223, 223, 232->237, 256, 264->265, 265, 270-271
nvtabular/torch_dataloader.py 152 13 56 14 85% 72->73, 73, 75->76, 76-77, 80->81, 81, 91-92, 96->97, 97, 114->116, 144->146, 146->148, 148->exit, 153->154, 154, 172->173, 173, 175->178, 218, 265->267, 267->268, 268, 269->270, 270, 285
nvtabular/worker.py 29 11 14 1 53% 39-43, 56->57, 57, 62-67
nvtabular/workflow.py 450 21 250 28 92% 104->108, 108, 111->113, 114->115, 115-119, 148->exit, 164->exit, 180->exit, 196->exit, 238->241, 241, 246->248, 265->267, 314->315, 315, 338->335, 374->375, 375, 393->396, 396, 426->429, 429, 534->535, 535-537, 551->550, 612->613, 613, 648->649, 649, 724->725, 725-727, 731->735, 738->739, 739, 753->754, 754-756, 757->768, 763->768, 776->777, 777, 778->exit, 797->788
setup.py 2 2 0 0 0% 18-20

TOTAL 2004 166 794 121 88%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 88.46%
=========== 320 passed, 2 skipped, 385 warnings in 390.95s (0:06:30) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1033407326286460975.sh

@benfred benfred merged commit 57ee6de into NVIDIA-Merlin:master Jul 20, 2020
@benfred benfred deleted the cudf_iter_fix branch July 14, 2021 04:37
mikemckiernan pushed a commit that referenced this pull request Nov 24, 2022
Cudf 0.15 recently disabled iterating over the values of an index, which broke
a number of ops in nvtabular. (change rapidsai/cudf#5340)
Fix by using values_host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants