Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add filtering step to the multi-stage recsys building and deployment notebooks #496

Closed
wants to merge 7 commits into from

Conversation

rnyak
Copy link
Contributor

@rnyak rnyak commented Aug 1, 2022

Currently PoC notebook does not have filtering step this PR tries to add this with a hacky workaround. However I get an error (see below) when I try to export the ensemble graph.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [22], in <cell line: 4>()
      1 # define the path where all the models and config files exported to
      2 export_path = os.path.join(BASE_DIR + 'poc_ensemble')
----> 4 ensemble = Ensemble(ordering, request_schema)
      5 ens_config, node_configs = ensemble.export(export_path)

File /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ensemble.py:51, in Ensemble.__init__(self, ops, schema, name, label_columns)
     37 """_summary_
     38 
     39 Parameters
   (...)
     48     List of strings representing label columns, by default None
     49 """
     50 self.graph = Graph(ops)
---> 51 self.graph.construct_schema(schema)
     52 self.name = name
     53 self.label_columns = label_columns or []

File /usr/local/lib/python3.8/dist-packages/merlin/dag/graph.py:73, in Graph.construct_schema(self, root_schema, preserve_dtypes)
     70 def construct_schema(self, root_schema: Schema, preserve_dtypes=False) -> "Graph":
     71     nodes = list(postorder_iter_nodes(self.output_node))
---> 73     self._compute_node_schemas(root_schema, nodes, preserve_dtypes)
     74     self._validate_node_schemas(root_schema, nodes, preserve_dtypes)
     76     return self

File /usr/local/lib/python3.8/dist-packages/merlin/dag/graph.py:80, in Graph._compute_node_schemas(self, root_schema, nodes, preserve_dtypes)
     78 def _compute_node_schemas(self, root_schema, nodes, preserve_dtypes=False):
     79     for node in nodes:
---> 80         node.compute_schemas(root_schema, preserve_dtypes=preserve_dtypes)

File /usr/local/lib/python3.8/dist-packages/merlin/dag/node.py:179, in Node.compute_schemas(self, root_schema, preserve_dtypes)
    176     if not self.selector and self.parents[0].selector and (self.parents[0].selector.names):
    177         self.selector = parents_selector
--> 179 self.input_schema = self.op.compute_input_schema(
    180     root_schema, parents_schema, deps_schema, self.selector
    181 )
    183 self.selector = self.op.compute_selector(
    184     self.input_schema, self.selector, parents_selector, dependencies_selector
    185 )
    187 prev_output_schema = self.output_schema if preserve_dtypes else None

File /usr/local/lib/python3.8/dist-packages/merlin/dag/base_operator.py:79, in BaseOperator.compute_input_schema(self, root_schema, parents_schema, deps_schema, selector)
     55 def compute_input_schema(
     56     self,
     57     root_schema: Schema,
   (...)
     60     selector: ColumnSelector,
     61 ) -> Schema:
     62     """Given the schemas coming from upstream sources and a column selector for the
     63     input columns, returns a set of schemas for the input columns this operator will use
     64     Parameters
   (...)
     77         The schemas of the columns used by this operator
     78     """
---> 79     self._validate_matching_cols(
     80         parents_schema + deps_schema, selector, self.compute_input_schema.__name__
     81     )
     83     return parents_schema + deps_schema

File /usr/local/lib/python3.8/dist-packages/merlin/dag/base_operator.py:199, in BaseOperator._validate_matching_cols(self, schema, selector, method_name)
    197 missing_cols = [name for name in selector.names if name not in schema.column_names]
    198 if missing_cols:
--> 199     raise ValueError(
    200         f"Missing columns {missing_cols} found in operator"
    201         f"{self.__class__.__name__} during {method_name}."
    202     )

ValueError: Missing columns ['item_id_seen'] found in operatorSubsetColumns during compute_input_schema.

@rnyak rnyak added enhancement New feature or request examples Adding new examples labels Aug 1, 2022
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

github-actions bot commented Aug 1, 2022

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-496

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #496 of commit c68297f2870235ee7d247eab35ed5264579fc194, no merge conflicts.
Running as SYSTEM
Setting status of c68297f2870235ee7d247eab35ed5264579fc194 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/297/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/496/*:refs/remotes/origin/pr/496/* # timeout=10
 > git rev-parse c68297f2870235ee7d247eab35ed5264579fc194^{commit} # timeout=10
Checking out Revision c68297f2870235ee7d247eab35ed5264579fc194 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c68297f2870235ee7d247eab35ed5264579fc194 # timeout=10
Commit message: "add filtering step"
 > git rev-list --no-walk 33650ff3be4f27c39d99f0e06a983d521bd7fcff # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins8071722894646148979.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 2 items

tests/unit/test_version.py . [ 50%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py F [100%]

=================================== FAILURES ===================================
__________________________________ test_func ___________________________________

self = <testbook.client.TestbookNotebookClient object at 0x7f913ddaeb80>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
          cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)

/usr/local/lib/python3.8/dist-packages/testbook/client.py:133:


args = (<testbook.client.TestbookNotebookClient object at 0x7f913ddaeb80>, {'id': '9d1419f5', 'cell_type': 'code', 'metadata'...ast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}, 53)
kwargs = {}

def wrapped(*args, **kwargs):
  return just_run(coro(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:85:


coro = <coroutine object NotebookClient.async_execute_cell at 0x7f913dca22c0>

def just_run(coro: Awaitable) -> Any:
    """Make the coroutine run, even if there is an event loop running (using nest_asyncio)"""
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    if loop is None:
        had_running_loop = False
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
    else:
        had_running_loop = True
    if had_running_loop:
        # if there is a running loop, we patch using nest_asyncio
        # to have reentrant event loops
        check_ipython()
        import nest_asyncio

        nest_asyncio.apply()
        check_patch_tornado()
  return loop.run_until_complete(coro)

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:60:


self = <_UnixSelectorEventLoop running=False closed=False debug=False>
future = <Task finished name='Task-369' coro=<NotebookClient.async_execute_cell() done, defined at /usr/local/lib/python3.8/dis...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n\n')>

def run_until_complete(self, future):
    """Run until the Future is done.

    If the argument is a coroutine, it is wrapped in a Task.

    WARNING: It would be disastrous to call run_until_complete()
    with the same coroutine twice -- it would wrap it in two
    different Tasks and that can't be good.

    Return the Future's result, or raise its exception.
    """
    self._check_closed()
    self._check_running()

    new_task = not futures.isfuture(future)
    future = tasks.ensure_future(future, loop=self)
    if new_task:
        # An exception is raised if the future didn't complete, so there
        # is no need to log the "destroy pending task" message
        future._log_destroy_pending = False

    future.add_done_callback(_run_until_complete_cb)
    try:
        self.run_forever()
    except:
        if new_task and future.done() and not future.cancelled():
            # The coroutine raised a BaseException. Consume the exception
            # to not log a warning, the caller doesn't have access to the
            # local task.
            future.exception()
        raise
    finally:
        future.remove_done_callback(_run_until_complete_cb)
    if not future.done():
        raise RuntimeError('Event loop stopped before Future completed.')
  return future.result()

/usr/lib/python3.8/asyncio/base_events.py:616:


self = <testbook.client.TestbookNotebookClient object at 0x7f913ddaeb80>
cell = {'id': '9d1419f5', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-01T19:47:42.301289Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53, execution_count = None, store_history = True

async def async_execute_cell(
    self,
    cell: NotebookNode,
    cell_index: int,
    execution_count: t.Optional[int] = None,
    store_history: bool = True,
) -> NotebookNode:
    """
    Executes a single code cell.

    To execute all cells see :meth:`execute`.

    Parameters
    ----------
    cell : nbformat.NotebookNode
        The cell which is currently being processed.
    cell_index : int
        The position of the cell within the notebook object.
    execution_count : int
        The execution count to be assigned to the cell (default: Use kernel response)
    store_history : bool
        Determines if history should be stored in the kernel (default: False).
        Specific to ipython kernels, which can store command histories.

    Returns
    -------
    output : dict
        The execution output payload (or None for no output).

    Raises
    ------
    CellExecutionError
        If execution failed and should raise an exception, this will be raised
        with defaults about the failure.

    Returns
    -------
    cell : NotebookNode
        The cell which was just processed.
    """
    assert self.kc is not None

    await run_hook(self.on_cell_start, cell=cell, cell_index=cell_index)

    if cell.cell_type != 'code' or not cell.source.strip():
        self.log.debug("Skipping non-executing cell %s", cell_index)
        return cell

    if self.skip_cells_with_tag in cell.metadata.get("tags", []):
        self.log.debug("Skipping tagged cell %s", cell_index)
        return cell

    if self.record_timing:  # clear execution metadata prior to execution
        cell['metadata']['execution'] = {}

    self.log.debug("Executing cell:\n%s", cell.source)

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors or "raises-exception" in cell.metadata.get("tags", [])
    )

    await run_hook(self.on_cell_execute, cell=cell, cell_index=cell_index)
    parent_msg_id = await ensure_async(
        self.kc.execute(
            cell.source, store_history=store_history, stop_on_error=not cell_allows_errors
        )
    )
    await run_hook(self.on_cell_complete, cell=cell, cell_index=cell_index)
    # We launched a code cell to execute
    self.code_cells_executed += 1
    exec_timeout = self._get_timeout(cell)

    cell.outputs = []
    self.clear_before_next_output = False

    task_poll_kernel_alive = asyncio.ensure_future(self._async_poll_kernel_alive())
    task_poll_output_msg = asyncio.ensure_future(
        self._async_poll_output_msg(parent_msg_id, cell, cell_index)
    )
    self.task_poll_for_reply = asyncio.ensure_future(
        self._async_poll_for_reply(
            parent_msg_id, cell, exec_timeout, task_poll_output_msg, task_poll_kernel_alive
        )
    )
    try:
        exec_reply = await self.task_poll_for_reply
    except asyncio.CancelledError:
        # can only be cancelled by task_poll_kernel_alive when the kernel is dead
        task_poll_output_msg.cancel()
        raise DeadKernelError("Kernel died")
    except Exception as e:
        # Best effort to cancel request if it hasn't been resolved
        try:
            # Check if the task_poll_output is doing the raising for us
            if not isinstance(e, CellControlSignal):
                task_poll_output_msg.cancel()
        finally:
            raise

    if execution_count:
        cell['execution_count'] = execution_count
    await run_hook(
        self.on_cell_executed, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
  await self._check_raise_for_error(cell, cell_index, exec_reply)

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1022:


self = <testbook.client.TestbookNotebookClient object at 0x7f913ddaeb80>
cell = {'id': '9d1419f5', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-01T19:47:42.301289Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53
exec_reply = {'buffers': [], 'content': {'ename': 'InferenceServerException', 'engine_info': {'engine_id': -1, 'engine_uuid': '9e99...e, 'engine': '9e9999e2-952c-4aca-942b-67d09de885a3', 'started': '2022-08-01T19:47:42.301679Z', 'status': 'error'}, ...}

async def _check_raise_for_error(
    self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
) -> None:

    if exec_reply is None:
        return None

    exec_reply_content = exec_reply['content']
    if exec_reply_content['status'] != 'error':
        return None

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors
        or exec_reply_content.get('ename') in self.allow_error_names
        or "raises-exception" in cell.metadata.get("tags", [])
    )
    await run_hook(
        self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
    if not cell_allows_errors:
      raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)

E nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:916: CellExecutionError

During handling of the above exception, another exception occurred:

def test_func():
    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "01-Building-Recommender-Systems-with-Merlin.ipynb",
        execute=False,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["NUM_ROWS"] = "10000"
            os.system("mkdir -p /tmp/examples")
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        tb1.execute()
        assert os.path.isdir("/tmp/examples/dlrm")
        assert os.path.isdir("/tmp/examples/feature_repo")
        assert os.path.isdir("/tmp/examples/query_tower")
        assert os.path.isfile("/tmp/examples/item_embeddings.parquet")
        assert os.path.isfile("/tmp/examples/feature_repo/user_features.py")
        assert os.path.isfile("/tmp/examples/feature_repo/item_features.py")

    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb",
        execute=False,
    ) as tb2:
        tb2.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        NUM_OF_CELLS = len(tb2.cells)
        tb2.execute_cell(list(range(0, NUM_OF_CELLS - 3)))
        top_k = tb2.ref("top_k")
        outputs = tb2.ref("outputs")
        assert outputs[0] == "ordered_ids"
      tb2.inject(
            """
            import shutil
            from merlin.models.loader.tf_utils import configure_tensorflow
            configure_tensorflow()
            from merlin.systems.triton.utils import run_ensemble_on_tritonserver
            response = run_ensemble_on_tritonserver(
                "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
            )
            response = [x.tolist()[0] for x in response["ordered_ids"]]
            shutil.rmtree("/tmp/examples/", ignore_errors=True)
            """
        )

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py:57:


/usr/local/lib/python3.8/dist-packages/testbook/client.py:237: in inject
cell = TestbookNode(self.execute_cell(inject_idx)) if run else TestbookNode(code_cell)


self = <testbook.client.TestbookNotebookClient object at 0x7f913ddaeb80>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
            cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)
        except CellExecutionError as ce:
          raise TestbookRuntimeError(ce.evalue, ce, self._get_error_class(ce.ename))

E testbook.exceptions.TestbookRuntimeError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/testbook/client.py:135: TestbookRuntimeError
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
2022-08-01 19:46:14.966870: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-01 19:46:16.951397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-01 19:46:16.952132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown
h.close()
File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 934, in close
self.stream.close()
File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 438, in close
self.watch_fd_thread.join()
AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'
WARNING clustering 243 points to 32 centroids: please provide at least 1248 training points
2022-08-01 19:47:35.482680: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-01 19:47:37.453885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-01 19:47:37.454614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
I0801 19:47:42.578465 13358 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f0cf4000000' with size 268435456
I0801 19:47:42.579210 13358 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0801 19:47:42.586504 13358 model_repository_manager.cc:1191] loading: 1_predicttensorflow:1
I0801 19:47:42.686830 13358 model_repository_manager.cc:1191] loading: 0_queryfeast:1
I0801 19:47:42.787130 13358 model_repository_manager.cc:1191] loading: 2_queryfaiss:1
I0801 19:47:42.887435 13358 model_repository_manager.cc:1191] loading: 3_queryfeast:1
I0801 19:47:42.970931 13358 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0801 19:47:42.970971 13358 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0801 19:47:42.970978 13358 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0801 19:47:42.970984 13358 tensorflow.cc:2221] backend configuration:
{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}
I0801 19:47:42.971017 13358 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)
I0801 19:47:42.975879 13358 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)
I0801 19:47:42.987723 13358 model_repository_manager.cc:1191] loading: 4_unrollfeatures:1
I0801 19:47:43.088021 13358 model_repository_manager.cc:1191] loading: 5_predicttensorflow:1
I0801 19:47:43.188348 13358 model_repository_manager.cc:1191] loading: 6_softmaxsampling:1
2022-08-01 19:47:43.321751: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-01 19:47:43.326233: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-01 19:47:43.326286: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-01 19:47:43.326392: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-01 19:47:43.363618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12901 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-01 19:47:43.408267: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-01 19:47:43.485822: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-01 19:47:43.510601: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 188868 microseconds.
I0801 19:47:43.510850 13358 model_repository_manager.cc:1345] successfully loaded '1_predicttensorflow' version 1
I0801 19:47:43.514915 13358 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 5_predicttensorflow (version 1)
I0801 19:47:43.516941 13358 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 0)
I0801 19:47:45.830072 13358 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
I0801 19:47:45.830321 13358 model_repository_manager.cc:1345] successfully loaded '0_queryfeast' version 1
I0801 19:47:48.209479 13358 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 3_queryfeast (GPU device 0)
I0801 19:47:48.212104 13358 model_repository_manager.cc:1345] successfully loaded '2_queryfaiss' version 1
I0801 19:47:50.523682 13358 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 4_unrollfeatures (GPU device 0)
I0801 19:47:50.523962 13358 model_repository_manager.cc:1345] successfully loaded '3_queryfeast' version 1
I0801 19:47:52.604495 13358 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 5_predicttensorflow (GPU device 0)
I0801 19:47:52.604738 13358 model_repository_manager.cc:1345] successfully loaded '4_unrollfeatures' version 1
2022-08-01 19:47:52.605814: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-01 19:47:52.624015: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-01 19:47:52.624061: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-01 19:47:52.626129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12901 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-01 19:47:52.647996: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-01 19:47:52.804828: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-01 19:47:52.856505: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 250699 microseconds.
I0801 19:47:52.856644 13358 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 6_softmaxsampling (GPU device 0)
I0801 19:47:52.856723 13358 model_repository_manager.cc:1345] successfully loaded '5_predicttensorflow' version 1
I0801 19:47:54.915295 13358 model_repository_manager.cc:1345] successfully loaded '6_softmaxsampling' version 1
I0801 19:47:54.917999 13358 model_repository_manager.cc:1191] loading: ensemble_model:1
I0801 19:47:55.018802 13358 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1
I0801 19:47:55.018977 13358 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0801 19:47:55.019086 13358 server.cc:583]
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}} |
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0801 19:47:55.019197 13358 server.cc:626]
+---------------------+---------+--------+
| Model | Version | Status |
+---------------------+---------+--------+
| 0_queryfeast | 1 | READY |
| 1_predicttensorflow | 1 | READY |
| 2_queryfaiss | 1 | READY |
| 3_queryfeast | 1 | READY |
| 4_unrollfeatures | 1 | READY |
| 5_predicttensorflow | 1 | READY |
| 6_softmaxsampling | 1 | READY |
| ensemble_model | 1 | READY |
+---------------------+---------+--------+

I0801 19:47:55.081627 13358 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0801 19:47:55.082478 13358 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/examples/poc_ensemble |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0801 19:47:55.083273 13358 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001
I0801 19:47:55.083817 13358 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0801 19:47:55.125012 13358 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0801 19:47:56.103417 13358 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0801 19:47:56.103482 13358 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0801 19:47:57.103637 13358 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0801 19:47:57.103689 13358 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0801 19:47:58.123238 13358 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0801 19:47:58.123286 13358 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
0801 19:47:58.836168 13615 pb_stub.cc:749] Failed to process the request(s) for model '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

At:
/tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

I0801 19:47:58.840756 13358 server.cc:257] Waiting for in-flight requests to complete.
I0801 19:47:58.840808 13358 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0801 19:47:58.840826 13358 model_repository_manager.cc:1223] unloading: ensemble_model:1
I0801 19:47:58.840927 13358 model_repository_manager.cc:1223] unloading: 6_softmaxsampling:1
I0801 19:47:58.840997 13358 model_repository_manager.cc:1223] unloading: 5_predicttensorflow:1
I0801 19:47:58.841108 13358 model_repository_manager.cc:1223] unloading: 4_unrollfeatures:1
I0801 19:47:58.841128 13358 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1
I0801 19:47:58.841169 13358 model_repository_manager.cc:1223] unloading: 3_queryfeast:1
I0801 19:47:58.841197 13358 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0801 19:47:58.841236 13358 model_repository_manager.cc:1223] unloading: 2_queryfaiss:1
I0801 19:47:58.841300 13358 model_repository_manager.cc:1223] unloading: 1_predicttensorflow:1
I0801 19:47:58.841360 13358 model_repository_manager.cc:1223] unloading: 0_queryfeast:1
I0801 19:47:58.841411 13358 server.cc:288] All models are stopped, unloading models
I0801 19:47:58.841435 13358 server.cc:295] Timeout 30: Found 7 live models and 0 in-flight non-inference requests
I0801 19:47:58.841442 13358 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0801 19:47:58.841515 13358 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0801 19:47:58.841735 13358 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0801 19:47:58.854147 13358 model_repository_manager.cc:1328] successfully unloaded '1_predicttensorflow' version 1
I0801 19:47:58.864042 13358 model_repository_manager.cc:1328] successfully unloaded '5_predicttensorflow' version 1
I0801 19:47:59.841572 13358 server.cc:295] Timeout 29: Found 5 live models and 0 in-flight non-inference requests
I0801 19:48:00.182062 13358 model_repository_manager.cc:1328] successfully unloaded '6_softmaxsampling' version 1
I0801 19:48:00.243280 13358 model_repository_manager.cc:1328] successfully unloaded '4_unrollfeatures' version 1
I0801 19:48:00.401753 13358 model_repository_manager.cc:1328] successfully unloaded '2_queryfaiss' version 1
I0801 19:48:00.841729 13358 server.cc:295] Timeout 28: Found 2 live models and 0 in-flight non-inference requests
I0801 19:48:01.841848 13358 server.cc:295] Timeout 27: Found 2 live models and 0 in-flight non-inference requests
I0801 19:48:02.841987 13358 server.cc:295] Timeout 26: Found 2 live models and 0 in-flight non-inference requests
I0801 19:48:03.842125 13358 server.cc:295] Timeout 25: Found 2 live models and 0 in-flight non-inference requests
I0801 19:48:04.842258 13358 server.cc:295] Timeout 24: Found 2 live models and 0 in-flight non-inference requests
I0801 19:48:05.842395 13358 server.cc:295] Timeout 23: Found 2 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0801 19:48:06.239124 13358 model_repository_manager.cc:1328] successfully unloaded '3_queryfeast' version 1
I0801 19:48:06.842529 13358 server.cc:295] Timeout 22: Found 1 live models and 0 in-flight non-inference requests
I0801 19:48:07.842682 13358 server.cc:295] Timeout 21: Found 1 live models and 0 in-flight non-inference requests
I0801 19:48:08.842822 13358 server.cc:295] Timeout 20: Found 1 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0801 19:48:09.280242 13358 model_repository_manager.cc:1328] successfully unloaded '0_queryfeast' version 1
I0801 19:48:09.843023 13358 server.cc:295] Timeout 19: Found 0 live models and 0 in-flight non-inference requests
=========================== short test summary info ============================
FAILED tests/unit/examples/test_building_deploying_multi_stage_RecSys.py::test_func
=================== 1 failed, 1 passed in 128.53s (0:02:08) ====================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins12257471342168228237.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #496 of commit 75b4ee4b4ab80a5a032a3eaaf88b71ea93ee21b4, no merge conflicts.
Running as SYSTEM
Setting status of 75b4ee4b4ab80a5a032a3eaaf88b71ea93ee21b4 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/298/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/496/*:refs/remotes/origin/pr/496/* # timeout=10
 > git rev-parse 75b4ee4b4ab80a5a032a3eaaf88b71ea93ee21b4^{commit} # timeout=10
Checking out Revision 75b4ee4b4ab80a5a032a3eaaf88b71ea93ee21b4 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 75b4ee4b4ab80a5a032a3eaaf88b71ea93ee21b4 # timeout=10
Commit message: "fix unrolled feats"
 > git rev-list --no-walk c68297f2870235ee7d247eab35ed5264579fc194 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins15538796526900657261.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 2 items

tests/unit/test_version.py . [ 50%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py . [100%]

======================== 2 passed in 135.18s (0:02:15) =========================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins11878219302774522311.sh

@rnyak rnyak added the help wanted Extra attention is needed label Aug 3, 2022
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #496 of commit 4c808d607d4e78a20aa4e43138a6c87eafe0274d, no merge conflicts.
Running as SYSTEM
Setting status of 4c808d607d4e78a20aa4e43138a6c87eafe0274d to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/305/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/496/*:refs/remotes/origin/pr/496/* # timeout=10
 > git rev-parse 4c808d607d4e78a20aa4e43138a6c87eafe0274d^{commit} # timeout=10
Checking out Revision 4c808d607d4e78a20aa4e43138a6c87eafe0274d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4c808d607d4e78a20aa4e43138a6c87eafe0274d # timeout=10
Commit message: "Merge branch 'main' into poc_with_filtering"
 > git rev-list --no-walk 24ec50b91c077343ae97b54287bf4bb06584db7c # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins16905772863289261204.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py . [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py F [100%]

=================================== FAILURES ===================================
__________________________________ test_func ___________________________________

def test_func():
    with testbook(
        REPO_ROOT / "examples" / "scaling-criteo" / "02-ETL-with-NVTabular.ipynb",
        execute=False,
        timeout=180,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["BASE_DIR"] = "/tmp/input/criteo/"
            os.environ["INPUT_DATA_DIR"] = "/tmp/input/criteo/"
            os.environ["OUTPUT_DATA_DIR"] = "/tmp/output/criteo/"
            os.system("mkdir -p /tmp/input/criteo")
            os.system("mkdir -p /tmp/output/criteo")

            from merlin.datasets.synthetic import generate_data

            train, valid = generate_data("criteo", int(1000000), set_sizes=(0.7, 0.3))

            train.to_ddf().compute().to_parquet('/tmp/input/criteo/day_0.parquet')
            valid.to_ddf().compute().to_parquet('/tmp/input/criteo/day_1.parquet')
            """
        )
      tb1.execute()

tests/unit/examples/test_scaling_criteo_merlin_models.py:31:


/usr/local/lib/python3.8/dist-packages/testbook/client.py:147: in execute
super().execute_cell(cell, index)
/usr/local/lib/python3.8/dist-packages/nbclient/util.py:85: in wrapped
return just_run(coro(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/nbclient/util.py:60: in just_run
return loop.run_until_complete(coro)
/usr/lib/python3.8/asyncio/base_events.py:616: in run_until_complete
return future.result()
/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1022: in async_execute_cell
await self._check_raise_for_error(cell, cell_index, exec_reply)


self = <testbook.client.TestbookNotebookClient object at 0x7f8bfc6a7790>
cell = {'id': '941f8b23', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-03T14:46:17.043133Z',...ry: CUDA error at: /usr/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory']}]}
cell_index = 27
exec_reply = {'buffers': [], 'content': {'ename': 'MemoryError', 'engine_info': {'engine_id': -1, 'engine_uuid': '211856b1-e407-4e4...e, 'engine': '211856b1-e407-4e47-80fa-8b3e546a200a', 'started': '2022-08-03T14:46:17.043313Z', 'status': 'error'}, ...}

async def _check_raise_for_error(
    self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
) -> None:

    if exec_reply is None:
        return None

    exec_reply_content = exec_reply['content']
    if exec_reply_content['status'] != 'error':
        return None

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors
        or exec_reply_content.get('ename') in self.allow_error_names
        or "raises-exception" in cell.metadata.get("tags", [])
    )
    await run_hook(
        self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
    if not cell_allows_errors:
      raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)

E nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E ------------------
E
E import os
E os.environ["BASE_DIR"] = "/tmp/input/criteo/"
E os.environ["INPUT_DATA_DIR"] = "/tmp/input/criteo/"
E os.environ["OUTPUT_DATA_DIR"] = "/tmp/output/criteo/"
E os.system("mkdir -p /tmp/input/criteo")
E os.system("mkdir -p /tmp/output/criteo")
E
E from merlin.datasets.synthetic import generate_data
E
E train, valid = generate_data("criteo", int(1000000), set_sizes=(0.7, 0.3))
E
E train.to_ddf().compute().to_parquet('/tmp/input/criteo/day_0.parquet')
E valid.to_ddf().compute().to_parquet('/tmp/input/criteo/day_1.parquet')
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mMemoryError�[0m Traceback (most recent call last)
E Input �[0;32mIn [16]�[0m, in �[0;36m<cell line: 12>�[0;34m()�[0m
E �[1;32m 8�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mdatasets�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msynthetic�[39;00m �[38;5;28;01mimport�[39;00m generate_data
E �[1;32m 10�[0m train, valid �[38;5;241m=�[39m generate_data(�[38;5;124m"�[39m�[38;5;124mcriteo�[39m�[38;5;124m"�[39m, �[38;5;28mint�[39m(�[38;5;241m1000000�[39m), set_sizes�[38;5;241m=�[39m(�[38;5;241m0.7�[39m, �[38;5;241m0.3�[39m))
E �[0;32m---> 12�[0m �[43mtrain�[49m�[38;5;241;43m.�[39;49m�[43mto_ddf�[49m�[43m(�[49m�[43m)�[49m�[38;5;241;43m.�[39;49m�[43mcompute�[49m�[43m(�[49m�[43m)�[49m�[38;5;241;43m.�[39;49m�[43mto_parquet�[49m�[43m(�[49m�[38;5;124;43m'�[39;49m�[38;5;124;43m/tmp/input/criteo/day_0.parquet�[39;49m�[38;5;124;43m'�[39;49m�[43m)�[49m
E �[1;32m 13�[0m valid�[38;5;241m.�[39mto_ddf()�[38;5;241m.�[39mcompute()�[38;5;241m.�[39mto_parquet(�[38;5;124m'�[39m�[38;5;124m/tmp/input/criteo/day_1.parquet�[39m�[38;5;124m'�[39m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:5535�[0m, in �[0;36mDataFrame.to_parquet�[0;34m(self, path, args, **kwargs)�[0m
E �[1;32m 5532�[0m �[38;5;124;03m"""{docstring}"""�[39;00m
E �[1;32m 5533�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mcudf�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mio�[39;00m �[38;5;28;01mimport�[39;00m parquet �[38;5;28;01mas�[39;00m pq
E �[0;32m-> 5535�[0m �[38;5;28;01mreturn�[39;00m �[43mpq�[49m�[38;5;241;43m.�[39;49m�[43mto_parquet�[49m�[43m(�[49m�[38;5;28;43mself�[39;49m�[43m,�[49m�[43m �[49m�[43mpath�[49m�[43m,�[49m�[43m �[49m�[38;5;241;43m
�[39;49m�[43margs�[49m�[43m,�[49m�[43m �[49m�[38;5;241;43m�[39;49m�[38;5;241;43m�[39;49m�[43mkwargs�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101�[0m, in �[0;36mannotate.call..inner�[0;34m(args, **kwargs)�[0m
E �[1;32m 98�[0m �[38;5;129m@wraps�[39m(func)
E �[1;32m 99�[0m �[38;5;28;01mdef�[39;00m �[38;5;21minner�[39m(�[38;5;241m
�[39margs, �[38;5;241m�[39m�[38;5;241m�[39mkwargs):
E �[1;32m 100�[0m libnvtx_push_range(�[38;5;28mself�[39m�[38;5;241m.�[39mattributes, �[38;5;28mself�[39m�[38;5;241m.�[39mdomain�[38;5;241m.�[39mhandle)
E �[0;32m--> 101�[0m result �[38;5;241m=�[39m �[43mfunc�[49m�[43m(�[49m�[38;5;241;43m�[39;49m�[43margs�[49m�[43m,�[49m�[43m �[49m�[38;5;241;43m�[39;49m�[38;5;241;43m�[39;49m�[43mkwargs�[49m�[43m)�[49m
E �[1;32m 102�[0m libnvtx_pop_range(�[38;5;28mself�[39m�[38;5;241m.�[39mdomain�[38;5;241m.�[39mhandle)
E �[1;32m 103�[0m �[38;5;28;01mreturn�[39;00m result
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:609�[0m, in �[0;36mto_parquet�[0;34m(df, path, engine, compression, index, partition_cols, partition_file_name, partition_offsets, statistics, metadata_file_path, int96_timestamps, row_group_size_bytes, row_group_size_rows, args, **kwargs)�[0m
E �[1;32m 601�[0m �[38;5;28;01mif�[39;00m partition_offsets:
E �[1;32m 602�[0m kwargs[�[38;5;124m"�[39m�[38;5;124mpartitions_info�[39m�[38;5;124m"�[39m] �[38;5;241m=�[39m �[38;5;28mlist�[39m(
E �[1;32m 603�[0m �[38;5;28mzip�[39m(
E �[1;32m 604�[0m partition_offsets,
E �[1;32m 605�[0m np�[38;5;241m.�[39mroll(partition_offsets, �[38;5;241m-�[39m�[38;5;241m1�[39m) �[38;5;241m-�[39m partition_offsets,
E �[1;32m 606�[0m )
E �[1;32m 607�[0m )[:�[38;5;241m-�[39m�[38;5;241m1�[39m]
E �[0;32m--> 609�[0m �[38;5;28;01mreturn�[39;00m �[43m_write_parquet�[49m�[43m(�[49m
E �[1;32m 610�[0m �[43m �[49m�[43mdf�[49m�[43m,�[49m
E �[1;32m 611�[0m �[43m �[49m�[43mpaths�[49m�[38;5;241;43m=�[39;49m�[43mpath�[49m�[43m �[49m�[38;5;28;43;01mif�[39;49;00m�[43m �[49m�[43mis_list_like�[49m�[43m(�[49m�[43mpath�[49m�[43m)�[49m�[43m �[49m�[38;5;28;43;01melse�[39;49;00m�[43m �[49m�[43m[�[49m�[43mpath�[49m�[43m]�[49m�[43m,�[49m
E �[1;32m 612�[0m �[43m �[49m�[43mcompression�[49m�[38;5;241;43m=�[39;49m�[43mcompression�[49m�[43m,�[49m
E �[1;32m 613�[0m �[43m �[49m�[43mindex�[49m�[38;5;241;43m=�[39;49m�[43mindex�[49m�[43m,�[49m
E �[1;32m 614�[0m �[43m �[49m�[43mstatistics�[49m�[38;5;241;43m=�[39;49m�[43mstatistics�[49m�[43m,�[49m
E �[1;32m 615�[0m �[43m �[49m�[43mmetadata_file_path�[49m�[38;5;241;43m=�[39;49m�[43mmetadata_file_path�[49m�[43m,�[49m
E �[1;32m 616�[0m �[43m �[49m�[43mint96_timestamps�[49m�[38;5;241;43m=�[39;49m�[43mint96_timestamps�[49m�[43m,�[49m
E �[1;32m 617�[0m �[43m �[49m�[43mrow_group_size_bytes�[49m�[38;5;241;43m=�[39;49m�[43mrow_group_size_bytes�[49m�[43m,�[49m
E �[1;32m 618�[0m �[43m �[49m�[43mrow_group_size_rows�[49m�[38;5;241;43m=�[39;49m�[43mrow_group_size_rows�[49m�[43m,�[49m
E �[1;32m 619�[0m �[43m �[49m�[38;5;241;43m
�[39;49m�[38;5;241;43m
�[39;49m�[43mkwargs�[49m�[43m,�[49m
E �[1;32m 620�[0m �[43m �[49m�[43m)�[49m
E �[1;32m 622�[0m �[38;5;28;01melse�[39;00m:
E �[1;32m 623�[0m �[38;5;28;01mif�[39;00m partition_offsets �[38;5;129;01mis�[39;00m �[38;5;129;01mnot�[39;00m �[38;5;28;01mNone�[39;00m:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101�[0m, in �[0;36mannotate.call..inner�[0;34m(args, **kwargs)�[0m
E �[1;32m 98�[0m �[38;5;129m@wraps�[39m(func)
E �[1;32m 99�[0m �[38;5;28;01mdef�[39;00m �[38;5;21minner�[39m(�[38;5;241m
�[39margs, �[38;5;241m�[39m�[38;5;241m�[39mkwargs):
E �[1;32m 100�[0m libnvtx_push_range(�[38;5;28mself�[39m�[38;5;241m.�[39mattributes, �[38;5;28mself�[39m�[38;5;241m.�[39mdomain�[38;5;241m.�[39mhandle)
E �[0;32m--> 101�[0m result �[38;5;241m=�[39m �[43mfunc�[49m�[43m(�[49m�[38;5;241;43m�[39;49m�[43margs�[49m�[43m,�[49m�[43m �[49m�[38;5;241;43m�[39;49m�[38;5;241;43m�[39;49m�[43mkwargs�[49m�[43m)�[49m
E �[1;32m 102�[0m libnvtx_pop_range(�[38;5;28mself�[39m�[38;5;241m.�[39mdomain�[38;5;241m.�[39mhandle)
E �[1;32m 103�[0m �[38;5;28;01mreturn�[39;00m result
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:69�[0m, in �[0;36m_write_parquet�[0;34m(df, paths, compression, index, statistics, metadata_file_path, int96_timestamps, row_group_size_bytes, row_group_size_rows, partitions_info, **kwargs)�[0m
E �[1;32m 65�[0m write_parquet_res �[38;5;241m=�[39m libparquet�[38;5;241m.�[39mwrite_parquet(
E �[1;32m 66�[0m df, filepaths_or_buffers�[38;5;241m=�[39mfile_objs, �[38;5;241m
�[39m�[38;5;241m�[39mcommon_args
E �[1;32m 67�[0m )
E �[1;32m 68�[0m �[38;5;28;01melse�[39;00m:
E �[0;32m---> 69�[0m write_parquet_res �[38;5;241m=�[39m �[43mlibparquet�[49m�[38;5;241;43m.�[39;49m�[43mwrite_parquet�[49m�[43m(�[49m
E �[1;32m 70�[0m �[43m �[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43mfilepaths_or_buffers�[49m�[38;5;241;43m=�[39;49m�[43mpaths_or_bufs�[49m�[43m,�[49m�[43m �[49m�[38;5;241;43m
�[39;49m�[38;5;241;43m*�[39;49m�[43mcommon_args�[49m
E �[1;32m 71�[0m �[43m �[49m�[43m)�[49m
E �[1;32m 73�[0m �[38;5;28;01mreturn�[39;00m write_parquet_res
E
E File �[0;32mcudf/_lib/parquet.pyx:287�[0m, in �[0;36mcudf._lib.parquet.write_parquet�[0;34m()�[0m
E
E File �[0;32mcudf/_lib/parquet.pyx:397�[0m, in �[0;36mcudf._lib.parquet.write_parquet�[0;34m()�[0m
E
E �[0;31mMemoryError�[0m: std::bad_alloc: out_of_memory: CUDA error at: /usr/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory
E MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /usr/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:916: CellExecutionError
----------------------------- Captured stderr call -----------------------------
2022-08-03 14:46:09,021 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-08-03 14:46:09,125 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Process Dask Worker process (from Nanny):
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.8/dist-packages/distributed/process.py", line 175, in _run
target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/distributed/nanny.py", line 918, in _run
loop.run_sync(do_stop)
File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 524, in run_sync
self.start()
File "/usr/local/lib/python3.8/dist-packages/tornado/platform/asyncio.py", line 199, in start
self.asyncio_loop.run_forever()
File "/usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 688, in
lambda f: self._run_callback(functools.partial(callback, future))
File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 741, in _run_callback
ret = callback()
File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 765, in _discard_future_result
future.result()
File "/usr/local/lib/python3.8/dist-packages/distributed/nanny.py", line 911, in _run
loop.run_sync(run)
File "/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py", line 524, in run_sync
self.start()
File "/usr/local/lib/python3.8/dist-packages/tornado/platform/asyncio.py", line 199, in start
self.asyncio_loop.run_forever()
File "/usr/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/usr/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
handle._run()
File "/usr/lib/python3.8/asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.8/dist-packages/distributed/worker.py", line 1116, in heartbeat
response = await retry_operation(
File "/usr/local/lib/python3.8/dist-packages/distributed/utils_comm.py", line 386, in retry_operation
return await retry(
File "/usr/local/lib/python3.8/dist-packages/distributed/utils_comm.py", line 371, in retry
return await coro()
File "/usr/local/lib/python3.8/dist-packages/distributed/core.py", line 922, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/distributed/core.py", line 691, in send_recv
response = await comm.read(deserializers=deserializers)
File "/usr/local/lib/python3.8/dist-packages/distributed/comm/tcp.py", line 234, in read
n = await stream.read_into(chunk)
File "/usr/local/lib/python3.8/dist-packages/tornado/iostream.py", line 471, in read_into
self._read_bytes = n
KeyboardInterrupt
/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 24 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
=========================== short test summary info ============================
FAILED tests/unit/examples/test_scaling_criteo_merlin_models.py::test_func - ...
=================== 1 failed, 2 passed in 160.20s (0:02:40) ====================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins11606507128745558063.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #496 of commit 550aeb40387b2e4ff05da58b0905360fcb34dd70, no merge conflicts.
Running as SYSTEM
Setting status of 550aeb40387b2e4ff05da58b0905360fcb34dd70 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/317/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/496/*:refs/remotes/origin/pr/496/* # timeout=10
 > git rev-parse 550aeb40387b2e4ff05da58b0905360fcb34dd70^{commit} # timeout=10
Checking out Revision 550aeb40387b2e4ff05da58b0905360fcb34dd70 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 550aeb40387b2e4ff05da58b0905360fcb34dd70 # timeout=10
Commit message: "Merge branch 'main' into poc_with_filtering"
 > git rev-list --no-walk 32fc61dbf99f58fc4d0d65fb5b65a7291b2f757a # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins2973867071528776025.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py . [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py . [100%]

======================== 3 passed in 224.21s (0:03:44) =========================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins3356946998286665610.sh

"source": [
"# Filter out anything that was in the user's current session\n",
"filtering = retrieval[\"candidate_ids\"] >> FilterCandidates(\n",
" filter_out=user_features[\"item_id_seen\"]\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try making this filter_out=user_features["item_id_seen_1"]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still doesnt work unfortunately.

@viswa-nvidia viswa-nvidia added this to the Merlin 22.08 milestone Aug 5, 2022
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #496 of commit 9f4c0f6634920117dd7b044e481575b49fb55d75, no merge conflicts.
Running as SYSTEM
Setting status of 9f4c0f6634920117dd7b044e481575b49fb55d75 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/323/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/496/*:refs/remotes/origin/pr/496/* # timeout=10
 > git rev-parse 9f4c0f6634920117dd7b044e481575b49fb55d75^{commit} # timeout=10
Checking out Revision 9f4c0f6634920117dd7b044e481575b49fb55d75 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9f4c0f6634920117dd7b044e481575b49fb55d75 # timeout=10
Commit message: "Merge branch 'main' into poc_with_filtering"
 > git rev-list --no-walk 3830d193ebeee1c9be697465a1d160b000e17a58 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins13759369557583878690.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py . [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py . [100%]

======================== 3 passed in 234.33s (0:03:54) =========================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins14961743522442213556.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #496 of commit 5bbe4aa591d1c1e4eaa00ccb9ea025167586b66c, no merge conflicts.
Running as SYSTEM
Setting status of 5bbe4aa591d1c1e4eaa00ccb9ea025167586b66c to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/327/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/496/*:refs/remotes/origin/pr/496/* # timeout=10
 > git rev-parse 5bbe4aa591d1c1e4eaa00ccb9ea025167586b66c^{commit} # timeout=10
Checking out Revision 5bbe4aa591d1c1e4eaa00ccb9ea025167586b66c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5bbe4aa591d1c1e4eaa00ccb9ea025167586b66c # timeout=10
Commit message: "Merge branch 'main' into poc_with_filtering"
 > git rev-list --no-walk 792f14a5f1b0ef07690ba865698bd1a675dab632 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins13636343195833768746.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py F [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py . [100%]

=================================== FAILURES ===================================
__________________________________ test_func ___________________________________

self = <testbook.client.TestbookNotebookClient object at 0x7fb0a3590b80>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
          cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)

/usr/local/lib/python3.8/dist-packages/testbook/client.py:133:


args = (<testbook.client.TestbookNotebookClient object at 0x7fb0a3590b80>, {'id': 'fc65d78d', 'cell_type': 'code', 'metadata'...ast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}, 53)
kwargs = {}

def wrapped(*args, **kwargs):
  return just_run(coro(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:85:


coro = <coroutine object NotebookClient.async_execute_cell at 0x7fb0a34863c0>

def just_run(coro: Awaitable) -> Any:
    """Make the coroutine run, even if there is an event loop running (using nest_asyncio)"""
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    if loop is None:
        had_running_loop = False
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
    else:
        had_running_loop = True
    if had_running_loop:
        # if there is a running loop, we patch using nest_asyncio
        # to have reentrant event loops
        check_ipython()
        import nest_asyncio

        nest_asyncio.apply()
        check_patch_tornado()
  return loop.run_until_complete(coro)

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:60:


self = <_UnixSelectorEventLoop running=False closed=False debug=False>
future = <Task finished name='Task-369' coro=<NotebookClient.async_execute_cell() done, defined at /usr/local/lib/python3.8/dis...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n\n')>

def run_until_complete(self, future):
    """Run until the Future is done.

    If the argument is a coroutine, it is wrapped in a Task.

    WARNING: It would be disastrous to call run_until_complete()
    with the same coroutine twice -- it would wrap it in two
    different Tasks and that can't be good.

    Return the Future's result, or raise its exception.
    """
    self._check_closed()
    self._check_running()

    new_task = not futures.isfuture(future)
    future = tasks.ensure_future(future, loop=self)
    if new_task:
        # An exception is raised if the future didn't complete, so there
        # is no need to log the "destroy pending task" message
        future._log_destroy_pending = False

    future.add_done_callback(_run_until_complete_cb)
    try:
        self.run_forever()
    except:
        if new_task and future.done() and not future.cancelled():
            # The coroutine raised a BaseException. Consume the exception
            # to not log a warning, the caller doesn't have access to the
            # local task.
            future.exception()
        raise
    finally:
        future.remove_done_callback(_run_until_complete_cb)
    if not future.done():
        raise RuntimeError('Event loop stopped before Future completed.')
  return future.result()

/usr/lib/python3.8/asyncio/base_events.py:616:


self = <testbook.client.TestbookNotebookClient object at 0x7fb0a3590b80>
cell = {'id': 'fc65d78d', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-08T18:59:30.419537Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53, execution_count = None, store_history = True

async def async_execute_cell(
    self,
    cell: NotebookNode,
    cell_index: int,
    execution_count: t.Optional[int] = None,
    store_history: bool = True,
) -> NotebookNode:
    """
    Executes a single code cell.

    To execute all cells see :meth:`execute`.

    Parameters
    ----------
    cell : nbformat.NotebookNode
        The cell which is currently being processed.
    cell_index : int
        The position of the cell within the notebook object.
    execution_count : int
        The execution count to be assigned to the cell (default: Use kernel response)
    store_history : bool
        Determines if history should be stored in the kernel (default: False).
        Specific to ipython kernels, which can store command histories.

    Returns
    -------
    output : dict
        The execution output payload (or None for no output).

    Raises
    ------
    CellExecutionError
        If execution failed and should raise an exception, this will be raised
        with defaults about the failure.

    Returns
    -------
    cell : NotebookNode
        The cell which was just processed.
    """
    assert self.kc is not None

    await run_hook(self.on_cell_start, cell=cell, cell_index=cell_index)

    if cell.cell_type != 'code' or not cell.source.strip():
        self.log.debug("Skipping non-executing cell %s", cell_index)
        return cell

    if self.skip_cells_with_tag in cell.metadata.get("tags", []):
        self.log.debug("Skipping tagged cell %s", cell_index)
        return cell

    if self.record_timing:  # clear execution metadata prior to execution
        cell['metadata']['execution'] = {}

    self.log.debug("Executing cell:\n%s", cell.source)

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors or "raises-exception" in cell.metadata.get("tags", [])
    )

    await run_hook(self.on_cell_execute, cell=cell, cell_index=cell_index)
    parent_msg_id = await ensure_async(
        self.kc.execute(
            cell.source, store_history=store_history, stop_on_error=not cell_allows_errors
        )
    )
    await run_hook(self.on_cell_complete, cell=cell, cell_index=cell_index)
    # We launched a code cell to execute
    self.code_cells_executed += 1
    exec_timeout = self._get_timeout(cell)

    cell.outputs = []
    self.clear_before_next_output = False

    task_poll_kernel_alive = asyncio.ensure_future(self._async_poll_kernel_alive())
    task_poll_output_msg = asyncio.ensure_future(
        self._async_poll_output_msg(parent_msg_id, cell, cell_index)
    )
    self.task_poll_for_reply = asyncio.ensure_future(
        self._async_poll_for_reply(
            parent_msg_id, cell, exec_timeout, task_poll_output_msg, task_poll_kernel_alive
        )
    )
    try:
        exec_reply = await self.task_poll_for_reply
    except asyncio.CancelledError:
        # can only be cancelled by task_poll_kernel_alive when the kernel is dead
        task_poll_output_msg.cancel()
        raise DeadKernelError("Kernel died")
    except Exception as e:
        # Best effort to cancel request if it hasn't been resolved
        try:
            # Check if the task_poll_output is doing the raising for us
            if not isinstance(e, CellControlSignal):
                task_poll_output_msg.cancel()
        finally:
            raise

    if execution_count:
        cell['execution_count'] = execution_count
    await run_hook(
        self.on_cell_executed, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
  await self._check_raise_for_error(cell, cell_index, exec_reply)

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1022:


self = <testbook.client.TestbookNotebookClient object at 0x7fb0a3590b80>
cell = {'id': 'fc65d78d', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-08T18:59:30.419537Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53
exec_reply = {'buffers': [], 'content': {'ename': 'InferenceServerException', 'engine_info': {'engine_id': -1, 'engine_uuid': '4ef2...e, 'engine': '4ef26c54-25ac-4e00-b478-75fd2bdd6078', 'started': '2022-08-08T18:59:30.420255Z', 'status': 'error'}, ...}

async def _check_raise_for_error(
    self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
) -> None:

    if exec_reply is None:
        return None

    exec_reply_content = exec_reply['content']
    if exec_reply_content['status'] != 'error':
        return None

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors
        or exec_reply_content.get('ename') in self.allow_error_names
        or "raises-exception" in cell.metadata.get("tags", [])
    )
    await run_hook(
        self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
    if not cell_allows_errors:
      raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)

E nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:916: CellExecutionError

During handling of the above exception, another exception occurred:

def test_func():
    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "01-Building-Recommender-Systems-with-Merlin.ipynb",
        execute=False,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["NUM_ROWS"] = "10000"
            os.system("mkdir -p /tmp/examples")
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        tb1.execute()
        assert os.path.isdir("/tmp/examples/dlrm")
        assert os.path.isdir("/tmp/examples/feature_repo")
        assert os.path.isdir("/tmp/examples/query_tower")
        assert os.path.isfile("/tmp/examples/item_embeddings.parquet")
        assert os.path.isfile("/tmp/examples/feature_repo/user_features.py")
        assert os.path.isfile("/tmp/examples/feature_repo/item_features.py")

    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb",
        execute=False,
    ) as tb2:
        tb2.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        NUM_OF_CELLS = len(tb2.cells)
        tb2.execute_cell(list(range(0, NUM_OF_CELLS - 3)))
        top_k = tb2.ref("top_k")
        outputs = tb2.ref("outputs")
        assert outputs[0] == "ordered_ids"
      tb2.inject(
            """
            import shutil
            from merlin.models.loader.tf_utils import configure_tensorflow
            configure_tensorflow()
            from merlin.systems.triton.utils import run_ensemble_on_tritonserver
            response = run_ensemble_on_tritonserver(
                "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
            )
            response = [x.tolist()[0] for x in response["ordered_ids"]]
            shutil.rmtree("/tmp/examples/", ignore_errors=True)
            """
        )

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py:57:


/usr/local/lib/python3.8/dist-packages/testbook/client.py:237: in inject
cell = TestbookNode(self.execute_cell(inject_idx)) if run else TestbookNode(code_cell)


self = <testbook.client.TestbookNotebookClient object at 0x7fb0a3590b80>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
            cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)
        except CellExecutionError as ce:
          raise TestbookRuntimeError(ce.evalue, ce, self._get_error_class(ce.ename))

E testbook.exceptions.TestbookRuntimeError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/testbook/client.py:135: TestbookRuntimeError
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
2022-08-08 18:57:53.709028: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-08 18:57:55.715355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-08 18:57:55.716094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown
h.close()
File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 934, in close
self.stream.close()
File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 438, in close
self.watch_fd_thread.join()
AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'
WARNING clustering 251 points to 32 centroids: please provide at least 1248 training points
2022-08-08 18:59:23.509067: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-08 18:59:25.501653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-08 18:59:25.502395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
I0808 18:59:30.696669 3304 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f12d6000000' with size 268435456
I0808 18:59:30.697395 3304 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0808 18:59:30.704341 3304 model_repository_manager.cc:1191] loading: 0_queryfeast:1
I0808 18:59:30.804641 3304 model_repository_manager.cc:1191] loading: 1_predicttensorflow:1
I0808 18:59:30.810448 3304 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 0)
I0808 18:59:30.904960 3304 model_repository_manager.cc:1191] loading: 2_queryfaiss:1
I0808 18:59:31.005244 3304 model_repository_manager.cc:1191] loading: 3_queryfeast:1
I0808 18:59:31.105509 3304 model_repository_manager.cc:1191] loading: 4_unrollfeatures:1
I0808 18:59:31.205811 3304 model_repository_manager.cc:1191] loading: 5_predicttensorflow:1
I0808 18:59:31.306120 3304 model_repository_manager.cc:1191] loading: 6_softmaxsampling:1
I0808 18:59:33.157809 3304 model_repository_manager.cc:1345] successfully loaded '0_queryfeast' version 1
I0808 18:59:33.437744 3304 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0808 18:59:33.437777 3304 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0808 18:59:33.437785 3304 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0808 18:59:33.437791 3304 tensorflow.cc:2221] backend configuration:
{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}
I0808 18:59:33.437826 3304 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)
I0808 18:59:33.439990 3304 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 5_predicttensorflow (version 1)
I0808 18:59:33.442875 3304 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)
2022-08-08 18:59:33.787680: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-08 18:59:33.791053: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-08 18:59:33.791081: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-08 18:59:33.791191: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-08 18:59:33.826993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12648 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-08 18:59:33.872406: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-08 18:59:33.954231: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-08 18:59:33.984633: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 196972 microseconds.
I0808 18:59:33.984756 3304 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
I0808 18:59:33.984832 3304 model_repository_manager.cc:1345] successfully loaded '1_predicttensorflow' version 1
I0808 18:59:36.354033 3304 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 3_queryfeast (GPU device 0)
I0808 18:59:36.355476 3304 model_repository_manager.cc:1345] successfully loaded '2_queryfaiss' version 1
I0808 18:59:38.655828 3304 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 5_predicttensorflow (GPU device 0)
I0808 18:59:38.656067 3304 model_repository_manager.cc:1345] successfully loaded '3_queryfeast' version 1
2022-08-08 18:59:38.656571: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-08 18:59:38.675039: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-08 18:59:38.675082: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-08 18:59:38.677182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12648 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-08 18:59:38.699505: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-08 18:59:38.857152: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-08 18:59:38.910981: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 254424 microseconds.
I0808 18:59:38.911136 3304 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 6_softmaxsampling (GPU device 0)
I0808 18:59:38.911212 3304 model_repository_manager.cc:1345] successfully loaded '5_predicttensorflow' version 1
I0808 18:59:40.988698 3304 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 4_unrollfeatures (GPU device 0)
I0808 18:59:40.988968 3304 model_repository_manager.cc:1345] successfully loaded '6_softmaxsampling' version 1
I0808 18:59:43.037045 3304 model_repository_manager.cc:1345] successfully loaded '4_unrollfeatures' version 1
I0808 18:59:43.039937 3304 model_repository_manager.cc:1191] loading: ensemble_model:1
I0808 18:59:43.140784 3304 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1
I0808 18:59:43.140950 3304 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0808 18:59:43.141063 3304 server.cc:583]
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}} |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0808 18:59:43.141177 3304 server.cc:626]
+---------------------+---------+--------+
| Model | Version | Status |
+---------------------+---------+--------+
| 0_queryfeast | 1 | READY |
| 1_predicttensorflow | 1 | READY |
| 2_queryfaiss | 1 | READY |
| 3_queryfeast | 1 | READY |
| 4_unrollfeatures | 1 | READY |
| 5_predicttensorflow | 1 | READY |
| 6_softmaxsampling | 1 | READY |
| ensemble_model | 1 | READY |
+---------------------+---------+--------+

I0808 18:59:43.206275 3304 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0808 18:59:43.207121 3304 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/examples/poc_ensemble |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0808 18:59:43.207936 3304 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001
I0808 18:59:43.208420 3304 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0808 18:59:43.249593 3304 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0808 18:59:44.234058 3304 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0808 18:59:44.234122 3304 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0808 18:59:45.234281 3304 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0808 18:59:45.234330 3304 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0808 18:59:46.257599 3304 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0808 18:59:46.257658 3304 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
0808 18:59:47.947784 3561 pb_stub.cc:749] Failed to process the request(s) for model '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

At:
/tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

I0808 18:59:47.952453 3304 server.cc:257] Waiting for in-flight requests to complete.
I0808 18:59:47.952501 3304 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0808 18:59:47.952520 3304 model_repository_manager.cc:1223] unloading: ensemble_model:1
I0808 18:59:47.952624 3304 model_repository_manager.cc:1223] unloading: 6_softmaxsampling:1
I0808 18:59:47.952699 3304 model_repository_manager.cc:1223] unloading: 5_predicttensorflow:1
I0808 18:59:47.952805 3304 model_repository_manager.cc:1223] unloading: 4_unrollfeatures:1
I0808 18:59:47.952819 3304 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1
I0808 18:59:47.952854 3304 model_repository_manager.cc:1223] unloading: 3_queryfeast:1
I0808 18:59:47.952922 3304 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0808 18:59:47.952993 3304 model_repository_manager.cc:1223] unloading: 2_queryfaiss:1
I0808 18:59:47.953045 3304 model_repository_manager.cc:1223] unloading: 1_predicttensorflow:1
I0808 18:59:47.953112 3304 model_repository_manager.cc:1223] unloading: 0_queryfeast:1
I0808 18:59:47.953170 3304 server.cc:288] All models are stopped, unloading models
I0808 18:59:47.953205 3304 server.cc:295] Timeout 30: Found 7 live models and 0 in-flight non-inference requests
I0808 18:59:47.953223 3304 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0808 18:59:47.953309 3304 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0808 18:59:47.953487 3304 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0808 18:59:47.969044 3304 model_repository_manager.cc:1328] successfully unloaded '1_predicttensorflow' version 1
I0808 18:59:47.981520 3304 model_repository_manager.cc:1328] successfully unloaded '5_predicttensorflow' version 1
I0808 18:59:48.953413 3304 server.cc:295] Timeout 29: Found 5 live models and 0 in-flight non-inference requests
I0808 18:59:49.511825 3304 model_repository_manager.cc:1328] successfully unloaded '2_queryfaiss' version 1
I0808 18:59:49.541047 3304 model_repository_manager.cc:1328] successfully unloaded '4_unrollfeatures' version 1
I0808 18:59:49.559879 3304 model_repository_manager.cc:1328] successfully unloaded '6_softmaxsampling' version 1
I0808 18:59:49.953673 3304 server.cc:295] Timeout 28: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:50.953809 3304 server.cc:295] Timeout 27: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:51.953940 3304 server.cc:295] Timeout 26: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:52.954069 3304 server.cc:295] Timeout 25: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:53.954197 3304 server.cc:295] Timeout 24: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:54.954329 3304 server.cc:295] Timeout 23: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:55.954454 3304 server.cc:295] Timeout 22: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:56.954571 3304 server.cc:295] Timeout 21: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:57.954697 3304 server.cc:295] Timeout 20: Found 2 live models and 0 in-flight non-inference requests
I0808 18:59:58.954827 3304 server.cc:295] Timeout 19: Found 2 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0808 18:59:59.398262 3304 model_repository_manager.cc:1328] successfully unloaded '0_queryfeast' version 1
I0808 18:59:59.954958 3304 server.cc:295] Timeout 18: Found 1 live models and 0 in-flight non-inference requests
I0808 19:00:00.955091 3304 server.cc:295] Timeout 17: Found 1 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0808 19:00:01.646321 3304 model_repository_manager.cc:1328] successfully unloaded '3_queryfeast' version 1
I0808 19:00:01.955225 3304 server.cc:295] Timeout 16: Found 0 live models and 0 in-flight non-inference requests
=========================== short test summary info ============================
FAILED tests/unit/examples/test_building_deploying_multi_stage_RecSys.py::test_func
=================== 1 failed, 2 passed in 237.58s (0:03:57) ====================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins18276713338415054018.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #496 of commit 8a5d7799725b1b558b0e3b06a2219b2deb2ad56f, no merge conflicts.
Running as SYSTEM
Setting status of 8a5d7799725b1b558b0e3b06a2219b2deb2ad56f to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/332/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/496/*:refs/remotes/origin/pr/496/* # timeout=10
 > git rev-parse 8a5d7799725b1b558b0e3b06a2219b2deb2ad56f^{commit} # timeout=10
Checking out Revision 8a5d7799725b1b558b0e3b06a2219b2deb2ad56f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8a5d7799725b1b558b0e3b06a2219b2deb2ad56f # timeout=10
Commit message: "Merge branch 'main' into poc_with_filtering"
 > git rev-list --no-walk 829a495f8ece5ecf14b891fdfed41a861a3d6433 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins2944827041103249446.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py F [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py . [100%]

=================================== FAILURES ===================================
__________________________________ test_func ___________________________________

self = <testbook.client.TestbookNotebookClient object at 0x7f16e562dcd0>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
          cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)

/usr/local/lib/python3.8/dist-packages/testbook/client.py:133:


args = (<testbook.client.TestbookNotebookClient object at 0x7f16e562dcd0>, {'id': '9c9a5310', 'cell_type': 'code', 'metadata'...ast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}, 53)
kwargs = {}

def wrapped(*args, **kwargs):
  return just_run(coro(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:85:


coro = <coroutine object NotebookClient.async_execute_cell at 0x7f160ce513c0>

def just_run(coro: Awaitable) -> Any:
    """Make the coroutine run, even if there is an event loop running (using nest_asyncio)"""
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    if loop is None:
        had_running_loop = False
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
    else:
        had_running_loop = True
    if had_running_loop:
        # if there is a running loop, we patch using nest_asyncio
        # to have reentrant event loops
        check_ipython()
        import nest_asyncio

        nest_asyncio.apply()
        check_patch_tornado()
  return loop.run_until_complete(coro)

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:60:


self = <_UnixSelectorEventLoop running=False closed=False debug=False>
future = <Task finished name='Task-369' coro=<NotebookClient.async_execute_cell() done, defined at /usr/local/lib/python3.8/dis...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n\n')>

def run_until_complete(self, future):
    """Run until the Future is done.

    If the argument is a coroutine, it is wrapped in a Task.

    WARNING: It would be disastrous to call run_until_complete()
    with the same coroutine twice -- it would wrap it in two
    different Tasks and that can't be good.

    Return the Future's result, or raise its exception.
    """
    self._check_closed()
    self._check_running()

    new_task = not futures.isfuture(future)
    future = tasks.ensure_future(future, loop=self)
    if new_task:
        # An exception is raised if the future didn't complete, so there
        # is no need to log the "destroy pending task" message
        future._log_destroy_pending = False

    future.add_done_callback(_run_until_complete_cb)
    try:
        self.run_forever()
    except:
        if new_task and future.done() and not future.cancelled():
            # The coroutine raised a BaseException. Consume the exception
            # to not log a warning, the caller doesn't have access to the
            # local task.
            future.exception()
        raise
    finally:
        future.remove_done_callback(_run_until_complete_cb)
    if not future.done():
        raise RuntimeError('Event loop stopped before Future completed.')
  return future.result()

/usr/lib/python3.8/asyncio/base_events.py:616:


self = <testbook.client.TestbookNotebookClient object at 0x7f16e562dcd0>
cell = {'id': '9c9a5310', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-08T21:38:49.791784Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53, execution_count = None, store_history = True

async def async_execute_cell(
    self,
    cell: NotebookNode,
    cell_index: int,
    execution_count: t.Optional[int] = None,
    store_history: bool = True,
) -> NotebookNode:
    """
    Executes a single code cell.

    To execute all cells see :meth:`execute`.

    Parameters
    ----------
    cell : nbformat.NotebookNode
        The cell which is currently being processed.
    cell_index : int
        The position of the cell within the notebook object.
    execution_count : int
        The execution count to be assigned to the cell (default: Use kernel response)
    store_history : bool
        Determines if history should be stored in the kernel (default: False).
        Specific to ipython kernels, which can store command histories.

    Returns
    -------
    output : dict
        The execution output payload (or None for no output).

    Raises
    ------
    CellExecutionError
        If execution failed and should raise an exception, this will be raised
        with defaults about the failure.

    Returns
    -------
    cell : NotebookNode
        The cell which was just processed.
    """
    assert self.kc is not None

    await run_hook(self.on_cell_start, cell=cell, cell_index=cell_index)

    if cell.cell_type != 'code' or not cell.source.strip():
        self.log.debug("Skipping non-executing cell %s", cell_index)
        return cell

    if self.skip_cells_with_tag in cell.metadata.get("tags", []):
        self.log.debug("Skipping tagged cell %s", cell_index)
        return cell

    if self.record_timing:  # clear execution metadata prior to execution
        cell['metadata']['execution'] = {}

    self.log.debug("Executing cell:\n%s", cell.source)

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors or "raises-exception" in cell.metadata.get("tags", [])
    )

    await run_hook(self.on_cell_execute, cell=cell, cell_index=cell_index)
    parent_msg_id = await ensure_async(
        self.kc.execute(
            cell.source, store_history=store_history, stop_on_error=not cell_allows_errors
        )
    )
    await run_hook(self.on_cell_complete, cell=cell, cell_index=cell_index)
    # We launched a code cell to execute
    self.code_cells_executed += 1
    exec_timeout = self._get_timeout(cell)

    cell.outputs = []
    self.clear_before_next_output = False

    task_poll_kernel_alive = asyncio.ensure_future(self._async_poll_kernel_alive())
    task_poll_output_msg = asyncio.ensure_future(
        self._async_poll_output_msg(parent_msg_id, cell, cell_index)
    )
    self.task_poll_for_reply = asyncio.ensure_future(
        self._async_poll_for_reply(
            parent_msg_id, cell, exec_timeout, task_poll_output_msg, task_poll_kernel_alive
        )
    )
    try:
        exec_reply = await self.task_poll_for_reply
    except asyncio.CancelledError:
        # can only be cancelled by task_poll_kernel_alive when the kernel is dead
        task_poll_output_msg.cancel()
        raise DeadKernelError("Kernel died")
    except Exception as e:
        # Best effort to cancel request if it hasn't been resolved
        try:
            # Check if the task_poll_output is doing the raising for us
            if not isinstance(e, CellControlSignal):
                task_poll_output_msg.cancel()
        finally:
            raise

    if execution_count:
        cell['execution_count'] = execution_count
    await run_hook(
        self.on_cell_executed, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
  await self._check_raise_for_error(cell, cell_index, exec_reply)

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1022:


self = <testbook.client.TestbookNotebookClient object at 0x7f16e562dcd0>
cell = {'id': '9c9a5310', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-08T21:38:49.791784Z',...ps/feast.py, line 299 in transform>]"\n\nAt:\n /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute\n']}]}
cell_index = 53
exec_reply = {'buffers': [], 'content': {'ename': 'InferenceServerException', 'engine_info': {'engine_id': -1, 'engine_uuid': '96db...e, 'engine': '96dba1d3-f2b4-4521-8553-f22d8100b50d', 'started': '2022-08-08T21:38:49.792077Z', 'status': 'error'}, ...}

async def _check_raise_for_error(
    self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
) -> None:

    if exec_reply is None:
        return None

    exec_reply_content = exec_reply['content']
    if exec_reply_content['status'] != 'error':
        return None

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors
        or exec_reply_content.get('ename') in self.allow_error_names
        or "raises-exception" in cell.metadata.get("tags", [])
    )
    await run_hook(
        self.on_cell_error, cell=cell, cell_index=cell_index, execute_reply=exec_reply
    )
    if not cell_allows_errors:
      raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)

E nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/nbclient/client.py:916: CellExecutionError

During handling of the above exception, another exception occurred:

def test_func():
    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "01-Building-Recommender-Systems-with-Merlin.ipynb",
        execute=False,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["NUM_ROWS"] = "10000"
            os.system("mkdir -p /tmp/examples")
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        tb1.execute()
        assert os.path.isdir("/tmp/examples/dlrm")
        assert os.path.isdir("/tmp/examples/feature_repo")
        assert os.path.isdir("/tmp/examples/query_tower")
        assert os.path.isfile("/tmp/examples/item_embeddings.parquet")
        assert os.path.isfile("/tmp/examples/feature_repo/user_features.py")
        assert os.path.isfile("/tmp/examples/feature_repo/item_features.py")

    with testbook(
        REPO_ROOT
        / "examples"
        / "Building-and-deploying-multi-stage-RecSys"
        / "02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb",
        execute=False,
    ) as tb2:
        tb2.inject(
            """
            import os
            os.environ["DATA_FOLDER"] = "/tmp/data/"
            os.environ["BASE_DIR"] = "/tmp/examples/"
            """
        )
        NUM_OF_CELLS = len(tb2.cells)
        tb2.execute_cell(list(range(0, NUM_OF_CELLS - 3)))
        top_k = tb2.ref("top_k")
        outputs = tb2.ref("outputs")
        assert outputs[0] == "ordered_ids"
      tb2.inject(
            """
            import shutil
            from merlin.models.loader.tf_utils import configure_tensorflow
            configure_tensorflow()
            from merlin.systems.triton.utils import run_ensemble_on_tritonserver
            response = run_ensemble_on_tritonserver(
                "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
            )
            response = [x.tolist()[0] for x in response["ordered_ids"]]
            shutil.rmtree("/tmp/examples/", ignore_errors=True)
            """
        )

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py:57:


/usr/local/lib/python3.8/dist-packages/testbook/client.py:237: in inject
cell = TestbookNode(self.execute_cell(inject_idx)) if run else TestbookNode(code_cell)


self = <testbook.client.TestbookNotebookClient object at 0x7f16e562dcd0>
cell = [53], kwargs = {}, cell_indexes = [53], executed_cells = [], idx = 53

def execute_cell(self, cell, **kwargs) -> Union[Dict, List[Dict]]:
    """
    Executes a cell or list of cells
    """
    if isinstance(cell, slice):
        start, stop = self._cell_index(cell.start), self._cell_index(cell.stop)
        if cell.step is not None:
            raise TestbookError('testbook does not support step argument')

        cell = range(start, stop + 1)
    elif isinstance(cell, str) or isinstance(cell, int):
        cell = [cell]

    cell_indexes = cell

    if all(isinstance(x, str) for x in cell):
        cell_indexes = [self._cell_index(tag) for tag in cell]

    executed_cells = []
    for idx in cell_indexes:
        try:
            cell = super().execute_cell(self.nb['cells'][idx], idx, **kwargs)
        except CellExecutionError as ce:
          raise TestbookRuntimeError(ce.evalue, ce, self._get_error_class(ce.ename))

E testbook.exceptions.TestbookRuntimeError: An error occurred while executing the following cell:
E ------------------
E
E import shutil
E from merlin.models.loader.tf_utils import configure_tensorflow
E configure_tensorflow()
E from merlin.systems.triton.utils import run_ensemble_on_tritonserver
E response = run_ensemble_on_tritonserver(
E "/tmp/examples/poc_ensemble", outputs, request, "ensemble_model"
E )
E response = [x.tolist()[0] for x in response["ordered_ids"]]
E shutil.rmtree("/tmp/examples/", ignore_errors=True)
E
E ------------------
E
E �[0;31m---------------------------------------------------------------------------�[0m
E �[0;31mInferenceServerException�[0m Traceback (most recent call last)
E Input �[0;32mIn [32]�[0m, in �[0;36m<cell line: 5>�[0;34m()�[0m
E �[1;32m 3�[0m configure_tensorflow()
E �[1;32m 4�[0m �[38;5;28;01mfrom�[39;00m �[38;5;21;01mmerlin�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01msystems�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mtriton�[39;00m�[38;5;21;01m.�[39;00m�[38;5;21;01mutils�[39;00m �[38;5;28;01mimport�[39;00m run_ensemble_on_tritonserver
E �[0;32m----> 5�[0m response �[38;5;241m=�[39m �[43mrun_ensemble_on_tritonserver�[49m�[43m(�[49m
E �[1;32m 6�[0m �[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43m/tmp/examples/poc_ensemble�[39;49m�[38;5;124;43m"�[39;49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest�[49m�[43m,�[49m�[43m �[49m�[38;5;124;43m"�[39;49m�[38;5;124;43mensemble_model�[39;49m�[38;5;124;43m"�[39;49m
E �[1;32m 7�[0m �[43m)�[49m
E �[1;32m 8�[0m response �[38;5;241m=�[39m [x�[38;5;241m.�[39mtolist()[�[38;5;241m0�[39m] �[38;5;28;01mfor�[39;00m x �[38;5;129;01min�[39;00m response[�[38;5;124m"�[39m�[38;5;124mordered_ids�[39m�[38;5;124m"�[39m]]
E �[1;32m 9�[0m shutil�[38;5;241m.�[39mrmtree(�[38;5;124m"�[39m�[38;5;124m/tmp/examples/�[39m�[38;5;124m"�[39m, ignore_errors�[38;5;241m=�[39m�[38;5;28;01mTrue�[39;00m)
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:93�[0m, in �[0;36mrun_ensemble_on_tritonserver�[0;34m(tmpdir, output_columns, df, model_name)�[0m
E �[1;32m 91�[0m response �[38;5;241m=�[39m �[38;5;28;01mNone�[39;00m
E �[1;32m 92�[0m �[38;5;28;01mwith�[39;00m run_triton_server(tmpdir) �[38;5;28;01mas�[39;00m client:
E �[0;32m---> 93�[0m response �[38;5;241m=�[39m �[43msend_triton_request�[49m�[43m(�[49m�[43mdf�[49m�[43m,�[49m�[43m �[49m�[43moutput_columns�[49m�[43m,�[49m�[43m �[49m�[43mclient�[49m�[38;5;241;43m=�[39;49m�[43mclient�[49m�[43m,�[49m�[43m �[49m�[43mtriton_model�[49m�[38;5;241;43m=�[39;49m�[43mmodel_name�[49m�[43m)�[49m
E �[1;32m 95�[0m �[38;5;28;01mreturn�[39;00m response
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:141�[0m, in �[0;36msend_triton_request�[0;34m(df, outputs_list, client, endpoint, request_id, triton_model)�[0m
E �[1;32m 139�[0m outputs �[38;5;241m=�[39m [grpcclient�[38;5;241m.�[39mInferRequestedOutput(col) �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list]
E �[1;32m 140�[0m �[38;5;28;01mwith�[39;00m client:
E �[0;32m--> 141�[0m response �[38;5;241m=�[39m �[43mclient�[49m�[38;5;241;43m.�[39;49m�[43minfer�[49m�[43m(�[49m�[43mtriton_model�[49m�[43m,�[49m�[43m �[49m�[43minputs�[49m�[43m,�[49m�[43m �[49m�[43mrequest_id�[49m�[38;5;241;43m=�[39;49m�[43mrequest_id�[49m�[43m,�[49m�[43m �[49m�[43moutputs�[49m�[38;5;241;43m=�[39;49m�[43moutputs�[49m�[43m)�[49m
E �[1;32m 143�[0m results �[38;5;241m=�[39m {}
E �[1;32m 144�[0m �[38;5;28;01mfor�[39;00m col �[38;5;129;01min�[39;00m outputs_list:
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1322�[0m, in �[0;36mInferenceServerClient.infer�[0;34m(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm)�[0m
E �[1;32m 1320�[0m �[38;5;28;01mreturn�[39;00m result
E �[1;32m 1321�[0m �[38;5;28;01mexcept�[39;00m grpc�[38;5;241m.�[39mRpcError �[38;5;28;01mas�[39;00m rpc_error:
E �[0;32m-> 1322�[0m �[43mraise_error_grpc�[49m�[43m(�[49m�[43mrpc_error�[49m�[43m)�[49m
E
E File �[0;32m/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62�[0m, in �[0;36mraise_error_grpc�[0;34m(rpc_error)�[0m
E �[1;32m 61�[0m �[38;5;28;01mdef�[39;00m �[38;5;21mraise_error_grpc�[39m(rpc_error):
E �[0;32m---> 62�[0m �[38;5;28;01mraise�[39;00m get_error_grpc(rpc_error) �[38;5;28;01mfrom�[39;00m �[38;5;28mNone�[39m
E
E �[0;31mInferenceServerException�[0m: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute
E
E InferenceServerException: [StatusCode.INTERNAL] in ensemble 'ensemble_model', Failed to process the request(s) for model instance '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
E 1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)
E
E Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"
E
E At:
E /tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

/usr/local/lib/python3.8/dist-packages/testbook/client.py:135: TestbookRuntimeError
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
2022-08-08 21:37:08.670176: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-08 21:37:10.666847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-08 21:37:10.667618: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown
h.close()
File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 934, in close
self.stream.close()
File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 438, in close
self.watch_fd_thread.join()
AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'
WARNING clustering 258 points to 32 centroids: please provide at least 1248 training points
2022-08-08 21:38:42.920470: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-08 21:38:44.907515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-08 21:38:44.908268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
I0808 21:38:50.068342 20998 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f0fb4000000' with size 268435456
I0808 21:38:50.069137 20998 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0808 21:38:50.076559 20998 model_repository_manager.cc:1191] loading: 1_predicttensorflow:1
I0808 21:38:50.176887 20998 model_repository_manager.cc:1191] loading: 0_queryfeast:1
I0808 21:38:50.277132 20998 model_repository_manager.cc:1191] loading: 2_queryfaiss:1
I0808 21:38:50.377379 20998 model_repository_manager.cc:1191] loading: 3_queryfeast:1
I0808 21:38:50.464521 20998 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0808 21:38:50.464560 20998 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0808 21:38:50.464567 20998 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0808 21:38:50.464573 20998 tensorflow.cc:2221] backend configuration:
{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}
I0808 21:38:50.464608 20998 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)
I0808 21:38:50.469518 20998 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)
I0808 21:38:50.477607 20998 model_repository_manager.cc:1191] loading: 4_unrollfeatures:1
I0808 21:38:50.577852 20998 model_repository_manager.cc:1191] loading: 5_predicttensorflow:1
I0808 21:38:50.678148 20998 model_repository_manager.cc:1191] loading: 6_softmaxsampling:1
2022-08-08 21:38:50.817297: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-08 21:38:50.820575: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-08 21:38:50.820626: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-08 21:38:50.820730: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-08 21:38:50.857495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12901 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-08 21:38:50.900741: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-08 21:38:50.984629: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-08 21:38:51.008357: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 191081 microseconds.
I0808 21:38:51.008600 20998 model_repository_manager.cc:1345] successfully loaded '1_predicttensorflow' version 1
I0808 21:38:51.012597 20998 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: 5_predicttensorflow (version 1)
I0808 21:38:51.014672 20998 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 0)
I0808 21:38:53.346836 20998 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
I0808 21:38:53.347071 20998 model_repository_manager.cc:1345] successfully loaded '0_queryfeast' version 1
I0808 21:38:55.761972 20998 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 3_queryfeast (GPU device 0)
I0808 21:38:55.763719 20998 model_repository_manager.cc:1345] successfully loaded '2_queryfaiss' version 1
I0808 21:38:58.056697 20998 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 4_unrollfeatures (GPU device 0)
I0808 21:38:58.056954 20998 model_repository_manager.cc:1345] successfully loaded '3_queryfeast' version 1
I0808 21:39:00.131090 20998 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: 5_predicttensorflow (GPU device 0)
I0808 21:39:00.131347 20998 model_repository_manager.cc:1345] successfully loaded '4_unrollfeatures' version 1
2022-08-08 21:39:00.132644: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-08 21:39:00.149784: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-08 21:39:00.149820: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-08 21:39:00.151921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12901 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-08 21:39:00.173533: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-08 21:39:00.330981: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/examples/poc_ensemble/5_predicttensorflow/1/model.savedmodel
2022-08-08 21:39:00.382942: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 250310 microseconds.
I0808 21:39:00.383097 20998 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 6_softmaxsampling (GPU device 0)
I0808 21:39:00.383183 20998 model_repository_manager.cc:1345] successfully loaded '5_predicttensorflow' version 1
I0808 21:39:02.483220 20998 model_repository_manager.cc:1345] successfully loaded '6_softmaxsampling' version 1
I0808 21:39:02.486125 20998 model_repository_manager.cc:1191] loading: ensemble_model:1
I0808 21:39:02.586946 20998 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1
I0808 21:39:02.587133 20998 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0808 21:39:02.587249 20998 server.cc:583]
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}} |
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0808 21:39:02.587363 20998 server.cc:626]
+---------------------+---------+--------+
| Model | Version | Status |
+---------------------+---------+--------+
| 0_queryfeast | 1 | READY |
| 1_predicttensorflow | 1 | READY |
| 2_queryfaiss | 1 | READY |
| 3_queryfeast | 1 | READY |
| 4_unrollfeatures | 1 | READY |
| 5_predicttensorflow | 1 | READY |
| 6_softmaxsampling | 1 | READY |
| ensemble_model | 1 | READY |
+---------------------+---------+--------+

I0808 21:39:02.650352 20998 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0808 21:39:02.651232 20998 tritonserver.cc:2138]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.22.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/examples/poc_ensemble |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0808 21:39:02.652184 20998 grpc_server.cc:4589] Started GRPCInferenceService at 0.0.0.0:8001
I0808 21:39:02.652717 20998 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0808 21:39:02.693995 20998 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0808 21:39:03.675097 20998 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0808 21:39:03.675171 20998 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0808 21:39:04.675332 20998 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0808 21:39:04.675397 20998 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
W0808 21:39:05.695812 20998 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0808 21:39:05.695865 20998 metrics.cc:507] Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0
0808 21:39:07.325537 21255 pb_stub.cc:749] Failed to process the request(s) for model '3_queryfeast', message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
1. c_python_backend_utils.InferenceResponse(output_tensors: List[c_python_backend_utils.Tensor], error: c_python_backend_utils.TritonError = None)

Invoked with: kwargs: tensors=[], error="<class 'TypeError'>, int() argument must be a string, a bytes-like object or a number, not 'NoneType', [<FrameSummary file /tmp/examples/poc_ensemble/3_queryfeast/1/model.py, line 105 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py, line 38 in execute>, <FrameSummary file /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py, line 299 in transform>]"

At:
/tmp/examples/poc_ensemble/3_queryfeast/1/model.py(122): execute

I0808 21:39:07.330154 20998 server.cc:257] Waiting for in-flight requests to complete.
I0808 21:39:07.330205 20998 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0808 21:39:07.330224 20998 model_repository_manager.cc:1223] unloading: ensemble_model:1
I0808 21:39:07.330321 20998 model_repository_manager.cc:1223] unloading: 6_softmaxsampling:1
I0808 21:39:07.330387 20998 model_repository_manager.cc:1223] unloading: 5_predicttensorflow:1
I0808 21:39:07.330471 20998 model_repository_manager.cc:1223] unloading: 4_unrollfeatures:1
I0808 21:39:07.330479 20998 model_repository_manager.cc:1328] successfully unloaded 'ensemble_model' version 1
I0808 21:39:07.330521 20998 model_repository_manager.cc:1223] unloading: 3_queryfeast:1
I0808 21:39:07.330563 20998 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0808 21:39:07.330584 20998 model_repository_manager.cc:1223] unloading: 2_queryfaiss:1
I0808 21:39:07.330657 20998 model_repository_manager.cc:1223] unloading: 1_predicttensorflow:1
I0808 21:39:07.330796 20998 model_repository_manager.cc:1223] unloading: 0_queryfeast:1
I0808 21:39:07.330840 20998 server.cc:288] All models are stopped, unloading models
I0808 21:39:07.330851 20998 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0808 21:39:07.330864 20998 server.cc:295] Timeout 30: Found 7 live models and 0 in-flight non-inference requests
I0808 21:39:07.330910 20998 tensorflow.cc:2368] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0808 21:39:07.331033 20998 tensorflow.cc:2307] TRITONBACKEND_ModelFinalize: delete model state
I0808 21:39:07.342999 20998 model_repository_manager.cc:1328] successfully unloaded '1_predicttensorflow' version 1
I0808 21:39:07.356662 20998 model_repository_manager.cc:1328] successfully unloaded '5_predicttensorflow' version 1
I0808 21:39:08.330996 20998 server.cc:295] Timeout 29: Found 5 live models and 0 in-flight non-inference requests
I0808 21:39:08.671414 20998 model_repository_manager.cc:1328] successfully unloaded '4_unrollfeatures' version 1
I0808 21:39:08.851315 20998 model_repository_manager.cc:1328] successfully unloaded '2_queryfaiss' version 1
I0808 21:39:08.906193 20998 model_repository_manager.cc:1328] successfully unloaded '6_softmaxsampling' version 1
I0808 21:39:09.331138 20998 server.cc:295] Timeout 28: Found 2 live models and 0 in-flight non-inference requests
I0808 21:39:10.331277 20998 server.cc:295] Timeout 27: Found 2 live models and 0 in-flight non-inference requests
I0808 21:39:11.331398 20998 server.cc:295] Timeout 26: Found 2 live models and 0 in-flight non-inference requests
I0808 21:39:12.331525 20998 server.cc:295] Timeout 25: Found 2 live models and 0 in-flight non-inference requests
I0808 21:39:13.331639 20998 server.cc:295] Timeout 24: Found 2 live models and 0 in-flight non-inference requests
I0808 21:39:14.331765 20998 server.cc:295] Timeout 23: Found 2 live models and 0 in-flight non-inference requests
I0808 21:39:15.331891 20998 server.cc:295] Timeout 22: Found 2 live models and 0 in-flight non-inference requests
I0808 21:39:16.332031 20998 server.cc:295] Timeout 21: Found 2 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0808 21:39:16.975295 20998 model_repository_manager.cc:1328] successfully unloaded '0_queryfeast' version 1
I0808 21:39:17.332154 20998 server.cc:295] Timeout 20: Found 1 live models and 0 in-flight non-inference requests
I0808 21:39:18.332333 20998 server.cc:295] Timeout 19: Found 1 live models and 0 in-flight non-inference requests
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/feast.py:15: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
ValueType.FLOAT: (np.float, False, False),
I0808 21:39:19.332448 20998 server.cc:295] Timeout 18: Found 1 live models and 0 in-flight non-inference requests
I0808 21:39:19.358114 20998 model_repository_manager.cc:1328] successfully unloaded '3_queryfeast' version 1
I0808 21:39:20.332576 20998 server.cc:295] Timeout 17: Found 0 live models and 0 in-flight non-inference requests
=========================== short test summary info ============================
FAILED tests/unit/examples/test_building_deploying_multi_stage_RecSys.py::test_func
=================== 1 failed, 2 passed in 240.06s (0:04:00) ====================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins17012005835215531495.sh

@rnyak
Copy link
Contributor Author

rnyak commented Sep 13, 2022

closing for now since this is waiting for NVIDIA-Merlin/systems#173

@rnyak rnyak closed this Sep 13, 2022
@rnyak rnyak deleted the poc_with_filtering branch September 14, 2022 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request examples Adding new examples help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants