Fix nightly container builds #518

benfred · 2022-08-05T23:10:22Z

Our nightly container builds are failing, because all the integration tests are skipped
(since we don't have faiss/feast on the containers). pytest returns error code '5'
in this case, causing us to fail the container.

Use the workaround as suggested pytest-dev/pytest#2393 (comment)

Our nightly container builds are failing, because all the integration tests are skipped (since we don't have faiss/feast on the containers). pytest returns error code '5' in this case, causing us to fail the container. Use the workaround as suggested pytest-dev/pytest#2393 (comment)

github-actions · 2022-08-05T23:12:26Z

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-518

benfred · 2022-08-05T23:12:28Z

The problem on the nightly TF build:

root@93e5a5ef4e09:/Merlin# pytest -rxs tests/integration
================================================= test session starts ==================================================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /Merlin
plugins: anyio-3.6.1
collected 0 items / 1 skipped                                                                                          

=============================================== short test summary info ================================================
SKIPPED [1] tests/integration/examples/test_ci_building_deploying_multi_stage_RecSys.py:10: could not import 'feast': No module named 'feast'
================================================== 1 skipped in 2.11s ==================================================
root@93e5a5ef4e09:/Merlin# echo $?
5

After this fix, the error code is changed to 0 instead of 5

nvidia-merlin-bot · 2022-08-05T23:14:28Z

Click to view CI Results

GitHub pull request #518 of commit d91e4503c0d5b79e6337c955fe44ab77c9f5ab22, no merge conflicts.
Running as SYSTEM
Setting status of d91e4503c0d5b79e6337c955fe44ab77c9f5ab22 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/318/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/518/*:refs/remotes/origin/pr/518/* # timeout=10
 > git rev-parse d91e4503c0d5b79e6337c955fe44ab77c9f5ab22^{commit} # timeout=10
Checking out Revision d91e4503c0d5b79e6337c955fe44ab77c9f5ab22 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d91e4503c0d5b79e6337c955fe44ab77c9f5ab22 # timeout=10
Commit message: "Fix nightly container builds"
 > git rev-list --no-walk 550aeb40387b2e4ff05da58b0905360fcb34dd70 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins15632308793292706780.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items
tests/unit/test_version.py .                                             [ 33%]

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py F      [ 66%]

tests/unit/examples/test_scaling_criteo_merlin_models.py .               [100%]
=================================== FAILURES ===================================

__________________________________ test_func ___________________________________
self = <testbook.client.TestbookNotebookClient object at 0x7f85a0cc6b20>

cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53
def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:


          cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)


/usr/local/lib/python3.8/dist-packages/testbook/client.py:133:

args = (<testbook.client.TestbookNotebookClient object at 0x7f85a0cc6b20>, {'id': '413733ab', 'cell_type': 'code', 'metadata'...ast.py, line 299 in transform>]"\n\nAt:\n  /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}, 53)

kwargs = {}
def wrapped(*args, **kwargs):


  return just_run(coro(*args, **kwargs))


/usr/local/lib/python3.8/dist-packages/nbclient/util.py:85:

coro = <coroutine object NotebookClient.async_execute_cell at 0x7f84c84e13c0>
def just_run(coro: Awaitable) -> Any:
    """Make the coroutine run, even if there is an event loop running (using nest_asyncio)"""
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    if loop is None:
        had_running_loop = False
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
    else:
        had_running_loop = True
    if had_running_loop:
        # if there is a running loop, we patch using nest_asyncio
        # to have reentrant event loops
        check_ipython()
        import nest_asyncio

        nest_asyncio.apply()
        check_patch_tornado()


  return loop.run_until_complete(coro)


/usr/local/lib/python3.8/dist-packages/nbclient/util.py:60:

self = <_UnixSelectorEventLoop running=False closed=False debug=False>

future = <Task finished name='Task-369' coro=<NotebookClient.async_execute_cell() done, defined at /usr/local/lib/python3.8/dis...ps/feast.py, line 299 in transform>]"\n\nAt:\n  /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n\n')>
def run_until_complete(self, future):
    """Run until the Future is done.

    If the argument is a coroutine, it is wrapped in a Task.

    WARNING: It would be disastrous to call run_until_complete()
    with the same coroutine twice -- it would wrap it in two
    different Tasks and that can't be good.

    Return the Future's result, or raise its exception.
    """
    self._check_closed()
    self._check_running()

    new_task = not futures.isfuture(future)
    future = tasks.ensure_future(future, loop=self)
    if new_task:
        # An exception is raised if the future didn't complete, so there
        # is no need to log the "destroy pending task" message
        future._log_destroy_pending = False

    future.add_done_callback(_run_until_complete_cb)
    try:
        self.run_forever()
    except:
        if new_task and future.done() and not future.cancelled():
            # The coroutine raised a BaseException. Consume the exception
            # to not log a warning, the caller doesn't have access to the
            # local task.
            future.exception()
        raise
    finally:
        future.remove_done_callback(_run_until_complete_cb)
    if not future.done():
        raise RuntimeError('Event loop stopped before Future completed.')


  return future.result()


/usr/lib/python3.8/asyncio/base_events.py:616:

self = <testbook.client.TestbookNotebookClient object at 0x7f85a0cc6b20>

cell = {'id': '413733ab', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-05T23:12:15.815629Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n  /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}

cell_index = 53, execution_count = None, store_history = True
async def async_execute_cell(
    self,
    cell: NotebookNode,
    cell_index: int,
    execution_count: t.Optional[int] = None,
    store_history: bool = True,
) -> NotebookNode:
    """
    Executes a single code cell.

    To execute all cells see :meth:`execute`.

    Parameters
    ----------
    cell : nbformat.NotebookNode
        The cell which is currently being processed.
    cell_index : int
        The position of the cell within the notebook object.
    execution_count : int
        The execution count to be assigned to the cell (default: Use kernel response)
    store_history : bool
        Determines if history should be stored in the kernel (default: False).
        Specific to ipython kernels, which can store command histories.

    Returns
    -------
    output : dict
        The execution output payload (or None for no output).

    Raises
    ------
    CellExecutionError
        If execution failed and should raise an exception, this will be raised
        with defaults about the failure.

    Returns
    -------
    cell : NotebookNode
        The cell which was just processed.
    """
    assert self.kc is not None

    await run_hook(self.on_cell_start, cell=cell, cell_index=cell_index)

    if cell.cell_type != 'code' or not cell.source.strip():
        self.log.debug("Skipping non-executing cell %s", cell_index)
        return cell

    if self.skip_cells_with_tag in cell.metadata.get("tags", []):
        self.log.debug("Skipping tagged cell %s", cell_index)
        return cell

    if self.record_timing:  # clear execution metadata prior to execution
        cell['metadata']['execution'] = {}

    self.log.debug("Executing cell:\n%s", cell.source)

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors or "raises-exception" in cell.metadata.get("tags", [])
    )

    await run_hook(self.on_cell_execute, cell=cell, cell_index=cell_index)
    parent_msg_id = await ensure_async(
        self.kc.execute(
            cell.source, store_history=store_history, stop_on_error=not cell_allows_errors
        )
    )
    await run_hook(self.on_cell_complete, cell=cell, cell_index=cell_index)
    # We launched a code cell to execute
    self.code_cells_executed += 1
    exec_timeout = self._get_timeout(cell)

    cell.outputs = []
    self.clear_before_next_output = False

    task_poll_kernel_alive = asyncio.ensure_future(self._async_poll_kernel_alive())
    task_poll_output_msg = asyncio.ensure_future(
        self._async_poll_output_msg(parent_msg_id, cell, cell_index)
    )
    self.task_poll_for_reply = asyncio.ensure_future(
        self._async_poll_for_reply(
            parent_msg_id, cell, exec_timeout, task_poll_output_msg, task_poll_kernel_alive
        )
    )
    try:
        exec_reply = await self.task_poll_for_reply
    except asyncio.CancelledError:
        # can only be cancelled by task_poll_kernel_alive when the kernel is dead
        task_poll_output_msg.cancel()
        raise DeadKernelError("Kernel died")
    except Exception as e:
        # Best effort to cancel request if it hasn't been resolved
        try:
            # Check if the task_poll_output is doing the raising for us
            if not isinstance(e, CellControlSignal):
                task_poll_output_msg.cancel()
        finally:
            raise

    if execution_count:
        cell['execution_count'] = execution_count
    await run_hook(
        self.on_cell_executed, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )


  await self._check_raise_for_error(cell, cell_index, exec_reply)


/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1022:

self = <testbook.client.TestbookNotebookClient object at 0x7f85a0cc6b20>

cell = {'id': '413733ab', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-05T23:12:15.815629Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n  /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}

cell_index = 53

exec_reply = {'buffers': [], 'content': {'ename': 'InferenceServerException', 'engine_info': {'engine_id': -1, 'engine_uuid': '7cd0...e, 'engine': '7cd0f8e4-7186-43c5-a98f-41b193e6d322', 'started': '2022-08-05T23:12:15.815919Z', 'status': 'error'}, ...}
async def _check_raise_for_error(
    self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
) -> None:

    if exec_reply is None:
        return None

    exec_reply_content = exec_reply['content']
    if exec_reply_content['status'] != 'error':
        return None

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors
        or exec_reply_content.get('ename') in self.allow_error_names
        or "raises-exception" in cell.metadata.get("tags", [])
    )
    await run_hook(
        self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
    if not cell_allows_errors:


      raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)


E           nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:

E           ------------------

E

E           import shutil

E           from merlin.models.loader.tf_utils import configure_tensorflow

E           configure_tensorflow()

E           from merlin.systems.triton.utils import run_ensemble_on_tritonserver

E           response = run_ensemble_on_tritonserver(

E               "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"

E           )

E           response = [x.tolist()[0] for x in response["ordered_ids"]]

E           shutil.rmtree("/tmp/examples/", ignore_errors=True)

E

E           ------------------

E

E           �[0;31m---------------------------------------------------------------------------�[0m

E           �[0;31mInferenceServerException�[0m                  Traceback (most recent call last)

E           Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m

E           �[1;32m      3�[0m configure_tensorflow()

E           �[1;32m      4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver

E           �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m

E           �[1;32m      6�[0m �[43m    �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m

E           �[1;32m      7�[0m �[43m)�[49m

E           �[1;32m      8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]

E           �[1;32m      9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)

E

E           File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m

E           �[1;32m     91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m

E           �[1;32m     92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:

E           �[0;32m---> 93�[0m     response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m

E           �[1;32m     95�[0m �[38;5;28;01mreturn�[39;00m response

E

E           File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m

E           �[1;32m    139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]

E           �[1;32m    140�[0m �[38;5;28;01mwith�[39;00m client:

E           �[0;32m--> 141�[0m     response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m

E           �[1;32m    143�[0m results �[38;5;241m=�[39m {}

E           �[1;32m    144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:

E

E           File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m

E           �[1;32m   1320�[0m     �[38;5;28;01mreturn�[39;00m result

E           �[1;32m   1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:

E           �[0;32m-> 1322�[0m     �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m

E

E           File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m

E           �[1;32m     61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):

E           �[0;32m---> 62�[0m     �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m

E

E           �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:

E               1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

E

E           Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

E

E           At:

E             /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

E

E           InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:

E               1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

E

E           Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

E

E           At:

E             /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
/usr/local/lib/python3.8/dist-packages/nbclient/client.py:916: CellExecutionError
During handling of the above exception, another exception occurred:
def test_func():
    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "01-Building-Recommender-Systems-with-Merlin.ipynb",
        execute=False,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["NUM_ROWS"] = "10000"
            os.system("mkdir -p /tmp/examples")
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        tb1.execute()
        assert os.path.isdir("/tmp/examples/dlrm")
        assert os.path.isdir("/tmp/examples/feature_repo")
        assert os.path.isdir("/tmp/examples/query_tower")
        assert os.path.isfile("/tmp/examples/item_embeddings.parquet")
        assert os.path.isfile("/tmp/examples/feature_repo/user_features.py")
        assert os.path.isfile("/tmp/examples/feature_repo/item_features.py")

    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb",
        execute=False,
    ) as tb2:
        tb2.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        NUM_OF_CELLS = len(tb2.cells)
        tb2.execute_cell(list(range(0, NUM_OF_CELLS - 3)))
        top_k = tb2.ref("top_k")
        outputs = tb2.ref("outputs")
        assert outputs[0] == "ordered_ids"


      tb2.inject(


            """
            import shutil
            from merlin.models.loader.tf_utils import configure_tensorflow
            configure_tensorflow()
            from merlin.systems.triton.utils import run_ensemble_on_tritonserver
            response = run_ensemble_on_tritonserver(
                "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
            )
            response = [x.tolist()[0] for x in response["ordered_ids"]]
            shutil.rmtree("/tmp/examples/", ignore_errors=True)
            """
        )

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py:57:

/usr/local/lib/python3.8/dist-packages/testbook/client.py:237: in inject

cell = TestbookNode(self.execute_cell(inject_idx)) if run else TestbookNode(code_cell)

self = <testbook.client.TestbookNotebookClient object at 0x7f85a0cc6b20>

cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53
def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
            cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)
        except CellExecutionError as ce:


          raise TestbookRuntimeError(ce.evalue, ce, self._get_error_class(ce.ename))


E               testbook.exceptions.TestbookRuntimeError: An error occurred while executing the following cell:

E               ------------------

E

E               import shutil

E               from merlin.models.loader.tf_utils import configure_tensorflow

E               configure_tensorflow()

E               from merlin.systems.triton.utils import run_ensemble_on_tritonserver

E               response = run_ensemble_on_tritonserver(

E                   "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"

E               )

E               response = [x.tolist()[0] for x in response["ordered_ids"]]

E               shutil.rmtree("/tmp/examples/", ignore_errors=True)

E

E               ------------------

E

E               �[0;31m---------------------------------------------------------------------------�[0m

E               �[0;31mInferenceServerException�[0m                  Traceback (most recent call last)

E               Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m

E               �[1;32m      3�[0m configure_tensorflow()

E               �[1;32m      4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver

E               �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m

E               �[1;32m      6�[0m �[43m    �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m

E               �[1;32m      7�[0m �[43m)�[49m

E               �[1;32m      8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]

E               �[1;32m      9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)

E

E               File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m

E               �[1;32m     91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m

E               �[1;32m     92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:

E               �[0;32m---> 93�[0m     response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m

E               �[1;32m     95�[0m �[38;5;28;01mreturn�[39;00m response

E

E               File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m

E               �[1;32m    139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]

E               �[1;32m    140�[0m �[38;5;28;01mwith�[39;00m client:

E               �[0;32m--> 141�[0m     response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m

E               �[1;32m    143�[0m results �[38;5;241m=�[39m {}

E               �[1;32m    144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:

E

E               File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m

E               �[1;32m   1320�[0m     �[38;5;28;01mreturn�[39;00m result

E               �[1;32m   1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:

E               �[0;32m-> 1322�[0m     �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m

E

E               File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m

E               �[1;32m     61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):

E               �[0;32m---> 62�[0m     �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m

E

E               �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:

E                   1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

E

E               Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

E

E               At:

E                 /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

E

E               InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:

E                   1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

E

E               Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

E

E               At:

E                 /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
/usr/local/lib/python3.8/dist-packages/testbook/client.py:135: TestbookRuntimeError

----------------------------- Captured stdout call -----------------------------

Signal (2) received.

----------------------------- Captured stderr call -----------------------------

2022-08-05 23:10:42.188316: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2022-08-05 23:10:44.182840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory:  -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0

2022-08-05 23:10:44.183618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory:  -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0

Error in atexit._run_exitfuncs:

Traceback (most recent call last):

File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown

h.close()

File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 934, in close

self.stream.close()

File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 438, in close

self.watch_fd_thread.join()

AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'

WARNING clustering 243 points to 32 centroids: please provide at least 1248 training points

2022-08-05 23:12:08.950670: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2022-08-05 23:12:10.919191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory:  -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0

2022-08-05 23:12:10.919921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory:  -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0

I0805 23:12:16.094757 4624 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f4f36000000' with size 268435456

I0805 23:12:16.095503 4624 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864

I0805 23:12:16.102823 4624 model_repository_manager.cc:1191] loading: 0_queryfeast:1

I0805 23:12:16.203128 4624 model_repository_manager.cc:1191] loading: 1_predicttensorflow:1

I0805 23:12:16.210430 4624 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 0)

I0805 23:12:16.303554 4624 model_repository_manager.cc:1191] loading: 2_queryfaiss:1

I0805 23:12:16.403807 4624 model_repository_manager.cc:1191] loading: 3_queryfeast:1

I0805 23:12:16.504059 4624 model_repository_manager.cc:1191] loading: 4_unrollfeatures:1

I0805 23:12:16.604287 4624 model_repository_manager.cc:1191] loading: 5_predicttensorflow:1

I0805 23:12:16.704504 4624 model_repository_manager.cc:1191] loading: 6_softmaxsampling:1

I0805 23:12:18.503843 4624 model_repository_manager.cc:1345] successfully loaded '0_queryfeast' version 1

I0805 23:12:18.771593 4624 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow

I0805 23:12:18.771630 4624 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9

I0805 23:12:18.771637 4624 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9

I0805 23:12:18.771643 4624 tensorflow.cc:2221] backend configuration:

{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}

I0805 23:12:18.771680 4624 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)

I0805 23:12:18.775562 4624 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 5_predicttensorflow (version 1)

I0805 23:12:18.776809 4624 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)

2022-08-05 23:12:19.117786: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel

2022-08-05 23:12:19.122133: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }

2022-08-05 23:12:19.122162: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel

2022-08-05 23:12:19.122269: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2022-08-05 23:12:19.168999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12648 MB memory:  -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0

2022-08-05 23:12:19.210615: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.

2022-08-05 23:12:19.291077: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel

2022-08-05 23:12:19.315202: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 197436 microseconds.

I0805 23:12:19.315320 4624 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)

I0805 23:12:19.315413 4624 model_repository_manager.cc:1345] successfully loaded '1_predicttensorflow' version 1

I0805 23:12:21.660778 4624 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 3_queryfeast (GPU device 0)

I0805 23:12:21.662508 4624 model_repository_manager.cc:1345] successfully loaded '2_queryfaiss' version 1

I0805 23:12:23.991166 4624 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 4_unrollfeatures (GPU device 0)

I0805 23:12:23.991392 4624 model_repository_manager.cc:1345] successfully loaded '3_queryfeast' version 1

I0805 23:12:26.035157 4624 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 6_softmaxsampling (GPU device 0)

I0805 23:12:26.035397 4624 model_repository_manager.cc:1345] successfully loaded '4_unrollfeatures' version 1

I0805 23:12:28.135215 4624 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 5_predicttensorflow (GPU device 0)

I0805 23:12:28.135496 4624 model_repository_manager.cc:1345] successfully loaded '6_softmaxsampling' version 1

2022-08-05 23:12:28.136207: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel

2022-08-05 23:12:28.155797: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }

2022-08-05 23:12:28.155839: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel

2022-08-05 23:12:28.157942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12648 MB memory:  -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0

2022-08-05 23:12:28.180790: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.

2022-08-05 23:12:28.337046: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel

2022-08-05 23:12:28.389237: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 253043 microseconds.

I0805 23:12:28.389466 4624 model_repository_manager.cc:1345] successfully loaded '5_predicttensorflow' version 1

I0805 23:12:28.392304 4624 model_repository_manager.cc:1191] loading: ensemble_model:1

I0805 23:12:28.493138 4624 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1

I0805 23:12:28.493324 4624 server.cc:556]

+------------------+------+

| Repository Agent | Path |

+------------------+------+

+------------------+------+
I0805 23:12:28.493428 4624 server.cc:583]

+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Backend    | Path                                                            | Config                                                                                                                                                                       |

+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| python     | /opt/tritonserver/backends/python/libtriton_python.so           | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}               |

| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}} |

+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0805 23:12:28.493541 4624 server.cc:626]

+---------------------+---------+--------+

| Model               | Version | Status |

+---------------------+---------+--------+

| 0_queryfeast        | 1       | READY  |

| 1_predicttensorflow | 1       | READY  |

| 2_queryfaiss        | 1       | READY  |

| 3_queryfeast        | 1       | READY  |

| 4_unrollfeatures    | 1       | READY  |

| 5_predicttensorflow | 1       | READY  |

| 6_softmaxsampling   | 1       | READY  |

| ensemble_model      | 1       | READY  |

+---------------------+---------+--------+
I0805 23:12:28.556064 4624 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB

I0805 23:12:28.556958 4624 tritonserver.cc:2138]

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Option                           | Value                                                                                                                                                                                        |

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| server_id                        | triton                                                                                                                                                                                       |

| server_version                   | 2.22.0                                                                                                                                                                                       |

| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |

| model_repository_path[0]         | /tmp/examples/poc_ensemble                                                                                                                                                                   |

| model_control_mode               | MODE_NONE                                                                                                                                                                                    |

| strict_model_config              | 1                                                                                                                                                                                            |

| rate_limit                       | OFF                                                                                                                                                                                          |

| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |

| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |

| response_cache_byte_size         | 0                                                                                                                                                                                            |

| min_supported_compute_capability | 6.0                                                                                                                                                                                          |

| strict_readiness                 | 1                                                                                                                                                                                            |

| exit_timeout                     | 30                                                                                                                                                                                           |

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0805 23:12:28.558093 4624 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001

I0805 23:12:28.558409 4624 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000

I0805 23:12:28.599311 4624 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002

W0805 23:12:29.576267 4624 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0

W0805 23:12:29.576308 4624 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0

W0805 23:12:30.576452 4624 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0

W0805 23:12:30.576500 4624 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0

W0805 23:12:31.605377 4624 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0

W0805 23:12:31.605426 4624 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0

0805 23:12:32.361713 4881 pb_stub.cc:749] Failed to process the request(s) for model '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:

1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
At:

/tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
I0805 23:12:32.366281 4624 server.cc:257] Waiting for in-flight requests to complete.

I0805 23:12:32.366312 4624 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences

I0805 23:12:32.366322 4624 model_repository_manager.cc:1223] unloading: ensemble_model:1

I0805 23:12:32.366385 4624 model_repository_manager.cc:1223] unloading: 6_softmaxsampling:1

I0805 23:12:32.366423 4624 model_repository_manager.cc:1223] unloading: 5_predicttensorflow:1

I0805 23:12:32.366459 4624 model_repository_manager.cc:1223] unloading: 4_unrollfeatures:1

I0805 23:12:32.366553 4624 model_repository_manager.cc:1223] unloading: 3_queryfeast:1

I0805 23:12:32.366558 4624 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state

I0805 23:12:32.366581 4624 model_repository_manager.cc:1223] unloading: 2_queryfaiss:1

I0805 23:12:32.366607 4624 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1

I0805 23:12:32.366719 4624 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state

I0805 23:12:32.366723 4624 model_repository_manager.cc:1223] unloading: 1_predicttensorflow:1

I0805 23:12:32.366807 4624 model_repository_manager.cc:1223] unloading: 0_queryfeast:1

I0805 23:12:32.366855 4624 server.cc:288] All models are stopped, unloading models

I0805 23:12:32.366877 4624 server.cc:295] Timeout 30: Found 7 live models and 0 in-flight non-inference requests

I0805 23:12:32.366919 4624 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state

I0805 23:12:32.367028 4624 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state

I0805 23:12:32.378196 4624 model_repository_manager.cc:1328] successfully unloaded '1_predicttensorflow' version 1

I0805 23:12:32.386997 4624 model_repository_manager.cc:1328] successfully unloaded '5_predicttensorflow' version 1

I0805 23:12:33.367056 4624 server.cc:295] Timeout 29: Found 5 live models and 0 in-flight non-inference requests

I0805 23:12:33.698330 4624 model_repository_manager.cc:1328] successfully unloaded '4_unrollfeatures' version 1

I0805 23:12:33.906298 4624 model_repository_manager.cc:1328] successfully unloaded '2_queryfaiss' version 1

I0805 23:12:33.936709 4624 model_repository_manager.cc:1328] successfully unloaded '6_softmaxsampling' version 1

I0805 23:12:34.367224 4624 server.cc:295] Timeout 28: Found 2 live models and 0 in-flight non-inference requests

I0805 23:12:35.367375 4624 server.cc:295] Timeout 27: Found 2 live models and 0 in-flight non-inference requests

I0805 23:12:36.367510 4624 server.cc:295] Timeout 26: Found 2 live models and 0 in-flight non-inference requests

I0805 23:12:37.367644 4624 server.cc:295] Timeout 25: Found 2 live models and 0 in-flight non-inference requests

I0805 23:12:38.367781 4624 server.cc:295] Timeout 24: Found 2 live models and 0 in-flight non-inference requests

I0805 23:12:39.367915 4624 server.cc:295] Timeout 23: Found 2 live models and 0 in-flight non-inference requests

I0805 23:12:40.368064 4624 server.cc:295] Timeout 22: Found 2 live models and 0 in-flight non-inference requests

I0805 23:12:41.368202 4624 server.cc:295] Timeout 21: Found 2 live models and 0 in-flight non-inference requests

/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

ValueType.FLOAT: (np.float, False, False),

I0805 23:12:42.368338 4624 server.cc:295] Timeout 20: Found 2 live models and 0 in-flight non-inference requests

I0805 23:12:42.619487 4624 model_repository_manager.cc:1328] successfully unloaded '0_queryfeast' version 1

I0805 23:12:43.368467 4624 server.cc:295] Timeout 19: Found 1 live models and 0 in-flight non-inference requests

I0805 23:12:44.368597 4624 server.cc:295] Timeout 18: Found 1 live models and 0 in-flight non-inference requests

/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

ValueType.FLOAT: (np.float, False, False),

I0805 23:12:45.368792 4624 server.cc:295] Timeout 17: Found 1 live models and 0 in-flight non-inference requests

I0805 23:12:45.592674 4624 model_repository_manager.cc:1328] successfully unloaded '3_queryfeast' version 1

I0805 23:12:46.368927 4624 server.cc:295] Timeout 16: Found 0 live models and 0 in-flight non-inference requests

=========================== short test summary info ============================

FAILED tests/unit/examples/test_building_deploying_multi_stage_RecSys.py::test_func

=================== 1 failed, 2 passed in 233.16s (0:03:53) ====================

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_merlin] $ /bin/bash /tmp/jenkins12880330643410221480.sh

benfred · 2022-08-05T23:15:33Z

rerun tests

nvidia-merlin-bot · 2022-08-05T23:19:43Z

Click to view CI Results

GitHub pull request #518 of commit d91e4503c0d5b79e6337c955fe44ab77c9f5ab22, no merge conflicts.
Running as SYSTEM
Setting status of d91e4503c0d5b79e6337c955fe44ab77c9f5ab22 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/319/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/518/*:refs/remotes/origin/pr/518/* # timeout=10
 > git rev-parse d91e4503c0d5b79e6337c955fe44ab77c9f5ab22^{commit} # timeout=10
Checking out Revision d91e4503c0d5b79e6337c955fe44ab77c9f5ab22 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d91e4503c0d5b79e6337c955fe44ab77c9f5ab22 # timeout=10
Commit message: "Fix nightly container builds"
 > git rev-list --no-walk d91e4503c0d5b79e6337c955fe44ab77c9f5ab22 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins12620503681835000123.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items
tests/unit/test_version.py .                                             [ 33%]

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py .      [ 66%]

tests/unit/examples/test_scaling_criteo_merlin_models.py .               [100%]
======================== 3 passed in 237.98s (0:03:57) =========================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_merlin] $ /bin/bash /tmp/jenkins17673517722023837454.sh

benfred added the bug Something isn't working label Aug 5, 2022

EvenOldridge approved these changes Aug 5, 2022

View reviewed changes

benfred merged commit 562f1bf into main Aug 5, 2022

benfred deleted the fix_integration_ci branch August 5, 2022 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix nightly container builds #518

Fix nightly container builds #518

benfred commented Aug 5, 2022

github-actions bot commented Aug 5, 2022

benfred commented Aug 5, 2022

nvidia-merlin-bot commented Aug 5, 2022

benfred commented Aug 5, 2022

nvidia-merlin-bot commented Aug 5, 2022

Fix nightly container builds #518

Fix nightly container builds #518

Conversation

benfred commented Aug 5, 2022

github-actions bot commented Aug 5, 2022

Documentation preview

benfred commented Aug 5, 2022

nvidia-merlin-bot commented Aug 5, 2022

benfred commented Aug 5, 2022

nvidia-merlin-bot commented Aug 5, 2022