Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python binding of Triton server C API #265

Merged
merged 10 commits into from
Sep 19, 2023
Merged

Python binding of Triton server C API #265

merged 10 commits into from
Sep 19, 2023

Conversation

GuanLuo
Copy link
Contributor

@GuanLuo GuanLuo commented Sep 13, 2023

Duplicated from #248 and developed on top of it, because the original PR is targeting the Python wrapper branch which is not finalized and merged into main branch. Please refer to #248 for context of the discussing below.

Remaining item for this PR:

  • setup build script for Python binding wheel file and generate wheel file as part of CMake build
  • (server PR) include the wheel file in docker image so user may install the binding. Setup CI to run the test.

Most of the comment in #248 has been addressed, but there are unresolved discussions and they will be resolved in separated PRs:

  • Simplify ownership management within PyXXX objects (discussion), because for most of the objects, the ownership is fixed.
  • Separate single large source file into per-object files (discussion), this requires separation of declarations and definitions, it will be done in separated PR to fit the timeline.
  • Add CMake option to control whether the Python package should be built

New issues to be resolved in separated PRs:

  • Fix callback resources management for request re-use. Currently callback resources are released as part of the internal PyTritonXXXCallback, which doesn't take into the account that the request can be re-used without re-configuring the callback setting. In that case, the callback resource is not valid for the sequential inferences.
  • Explicit response management. In Triton C API, there is implicit dependency between the server object and response object, which requires extra care when using the Python binding (i.e. explicit del res in testing to make sure server exit properly). Ideally such dependency is handled internally like the dependency between request and server.

Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mostly looks good. Will take a closer look at the testing/bindings in a follow up review.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
python/binding/tritonserver_pybind.cc Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall!

It doesn't have to be part of this PR but It would be great to have a valgrind test to make sure that the common functions (e.g., async_infer, model_load, tracing) do not have a memory leak.

if trace.id not in user_object:
user_object[trace.id] = []

# not owning trace, so must read property out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that if the user stores the trace object it will become invalid?

For example, if I used user_object[trace.id]=trace would it result in segfault if I access trace after returning from callback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the current implementation it will be invalid as the py::object is passed as reference to a local variable (wrapper of Triton trace).

To make your example valid, what we can do is to store the trace wrapper in the trace callback resource (seen_traces) when seen and remove from it when release.

server.infer_async(request, trace)

# [FIXME] WAR due to trace lifecycle is tied to response in Triton core,
# trace reference should drop on response send..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, this WAR is mainly for testing purposes (i.e., to make sure that the trace object really gets deleted)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the coupling is within core logic and we need to make change there to decouple the lifecycle. This is not obvious in direct C API usage because users typically delete response on response complete callback which triggers trace release callback, in Python the time of deletion is less-deterministic and can cause confusion when trace release is being expected.

self.assertEqual(len(res.output_classification_label(0, 1)), 0)
# [FIXME] keep alive behavior is not established between response
# and server, so must explicitly handle the destruction order for now.
del res
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if response is not deleted here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triton response holds a shared pointer of the model and will cause server not exiting properly (waiting for model to fully unload). Not establishing keep_alive makes Python garbage collects server before response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PR due to time constraints but I think we should address this since it affects basic usage of the low-level Python bindings.

version=VERSION,
author="NVIDIA Inc.",
author_email="[email protected]",
description="Python API of the Triton In-Process Server",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Python bindings for TritonServer C-API"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this describes the Python package tritonserver? Python binding is meant to be lower level and not directly exposed.

python/setup.py Outdated Show resolved Hide resolved
python/test/test_binding.py Outdated Show resolved Hide resolved
# wrap in lambda function to avoid early evaluation that raises
# exception before assert
self.assertRaises(triton_bindings.TritonError,
lambda: request.correlation_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this raising an exception because request.correlation_id_string was assigned before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in core it checks the type of stored correlation id and returns error if mismatch

python/tritonserver/_c/tritonserver_pybind.cc Show resolved Hide resolved
@GuanLuo GuanLuo requested a review from Tabrizian September 18, 2023 19:06
@GuanLuo GuanLuo merged commit e3524bb into main Sep 19, 2023
@GuanLuo GuanLuo deleted the gluo-binding-main branch September 19, 2023 00:58
mc-nv pushed a commit that referenced this pull request Sep 19, 2023
* Add Python binding for Triton server API

* Use uintptr_t for pointer. Address exception name. Interact with GIL

* Fix lifetime dependency between request and server

* Better variable name. Pass by reference to avoid extra copy construct

* Add wheel build script. Address comment

* Format

* Avoid core referring to PyBind in backend build

* Using generator expression to build on Windows

* Specify rpath for runtime library lookup. Fix bug

* Address comment. Format
mc-nv pushed a commit that referenced this pull request Sep 19, 2023
* Add Python binding for Triton server API

* Use uintptr_t for pointer. Address exception name. Interact with GIL

* Fix lifetime dependency between request and server

* Better variable name. Pass by reference to avoid extra copy construct

* Add wheel build script. Address comment

* Format

* Avoid core referring to PyBind in backend build

* Using generator expression to build on Windows

* Specify rpath for runtime library lookup. Fix bug

* Address comment. Format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants