-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ragged batching interface for SonicTriton #40814
Conversation
…ged batching (WIP)
+heterogeneous |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
@rappoccio I suppose the related external and data PRs need to be merged too. |
Yes, all the PRs listed in #40814 (comment) need to be merged together or the next IB will fail. |
@rappoccio @perrotta , this PR requires the following PRs too |
@kpedro88 tests show a couple of modified user-floats changes for slimmed pat photons in the single gamma workflow 10805.31. Does it have anything to do with the modifications implemented with cms-data/RecoEgamma-EgammaPhotonProducers#2, which was supposed to be only a "minor syntactic change"? |
@perrotta thanks for pointing these out, I had missed them. It may be due to the data update. This special workflow 10805.31 isn't yet run in production. @ssrothman can you take a look? |
Thank you @kpedro88. |
@perrotta let's merge the externals now if that's okay. These will just go into the first prerelease, and we will definitely fix any unintended regressions before 13_1_X is final. Other development needs to start on top of the code and external changes here. |
Uhm, wouldn't it be easier to identify such a possible regression if there is a "clean" baseline to compare with? |
For two photon userfloats, it's easy enough to run tests in an older release and compare by hand. (And actually I see that pre1 is already out, so this can be done with pre1 vs. pre2.) |
Well, those two photon userfloats could point to some deeper unintended regression which is not limited to them... I think it is better to have this reverted. A revert of the revert can be submitted at the same time, and it can be merged as soon as we are satisfied with the checks. We still have some time before pre2 for it. |
If you insist on reverting, it's ultimately your prerogative. But I have seen these reversion/unreversion cycles introduce their own confusion and regressions. Historically, our policy was that regressions in prereleases were allowed in order to facilitate ongoing development. |
@kpedro88 we discussed it with Sal. Even if not optimal, let have this merged for now, together with all needed externals. |
@kpedro88 we can do the merge, but will make an issue marked as "urgent" #40938. Can you please fix it ASAP (before the next pre release) so we can ensure that there is nothing awry? Also, @cms-sw/reconstruction-l2, we will bypass signatures on but assign to take a look in the issue. |
many thanks @perrotta @rappoccio ! we are actively working on fixing any regressions and should have another PR very soon. |
PR description:
Triton finally added a client-side friendly interface for ragged batching (processing requests with different shapes together). This PR adapts SonicTriton internally to switch between rectangular batching and ragged batching automatically, depending on how shapes are specified, without the user (module developer) having to worry about it. An option to switch manually is also provided. The changes are backwards-compatible: new functionality is available, but existing code that does not use the new functionality will continue to work as before. The README is updated to explain these different cases and the new features. The default Triton server image is also moved to a newer version. This PR may resolve several of the occasional IB failures (etc.) that have been noted in past months; these will be rechecked once it is merged.
Technical details: in Triton, a "request" consists of data that all have the same shape. Therefore, to send multiple inputs with different shapes at the same time, multiple requests must be created. Triton now provides
InferMulti()
andAsyncInferMulti()
interfaces that take a vector of requests and loop over them, in order to handle this more general case.Contingent details: this interface was actually implemented in Summer 2022, specifically intended to be used with ParticleNet (where we currently pad all inputs to achieve a uniform shape). However, because of various limitations in both PyTorch and ONNX, the performance in the ParticleNet case currently does not improve when using ragged batching. We expect this will eventually be resolved. In the meantime, we want to merge this updated interface so further development can continue on top of these rather involved changes.
PR validation: