First PyTorch tests for TorchScript inference CPU/CUDA #43475

valsdav · 2023-12-01T16:59:56Z

PR description:

This PR adds a new subpackage PhysicsTools/PyTorch, including the first tests for TorchScript
inference on CPU and CUDA.

The PR is built on top of #41162 and cmsdist PR cms-sw/cmsdist#8388, which is adding C++ PyTorch library support.

PR validation:

A simple model is exported in TorchScript and saved to disk. The model is loaded in cmssw and runned on CPU and CUDA.
The code to generate the model is included in the PR, but cannot be run directly in CMSSW, as we are not including the Python interface to PyTorch in cmsdist for the moment.

@smuzaffar @makortel @iarspider @wpmccormack

cmsbuild · 2023-12-01T17:07:47Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43475/38037

This PR adds an extra 32KB to repository
Found files with invalid states:
- PhysicsTools/PythonAnalysis/test/test_torch.py:
  - Added: 43e8f5f
  - Deleted: c1b0013
- PhysicsTools/PyTorch/test/testTorchSimpleDnnCuda.cc:
  - Added: baa5bc9
  - Deleted: 590cdd2
There are other open Pull requests which might conflict with changes you have proposed:
- File PhysicsTools/PythonAnalysis/test/BuildFile.xml modified in PR(s): Add test for (py3-)torch #41162
- File PhysicsTools/PythonAnalysis/test/testTorch.cc modified in PR(s): Add test for (py3-)torch #41162
- File PhysicsTools/PythonAnalysis/test/time_serie_prediction.cpp modified in PR(s): Add test for (py3-)torch #41162

cmsbuild · 2023-12-01T17:08:11Z

A new Pull Request was created by @valsdav (Davide Valsecchi) for master.

It involves the following packages:

PhysicsTools/PyTorch (****)
PhysicsTools/PythonAnalysis (analysis)

The following packages do not have a category, yet:

PhysicsTools/PyTorch
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild, @tvami can you please review it and eventually sign? Thanks.
@antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

makortel · 2023-12-01T21:29:56Z

PhysicsTools/PyTorch/test/testBase.h

+  }
+}
+
+std::string testBasePyTorch::cmsswPath(std::string path) {


Would it be feasible to use edm::FileInPath instead?

makortel · 2023-12-01T21:30:35Z

PhysicsTools/PyTorch/test/simple_dnn.pt

@smuzaffar Should this binary file be placed in cms-data?

We should avoid adding binary files to cmssw. It is better to move it to cms-data

I'm changing this to use the cmsml docker image to generate the binary files for tests on the fly.

makortel · 2023-12-01T21:32:37Z

PhysicsTools/PyTorch/test/testBase.h

This header file seems to be very tied to the testTorchSimpleDnn.cc. Would it be feasible to just include the contents of the header in the source file? Or is the header expected to be used by multiple source files in the future?

Same question for testBaseCUDA.h and testTorchSimpleDnnCUDA.cc.

I would like to follow the same approach as in the TensorFlow tests https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/TensorFlow/test/testBaseCUDA.h and I'm planning to add more tests with a similar structure

PhysicsTools/PyTorch/test/testTorchSimpleDnn.cc

makortel · 2023-12-01T21:33:29Z

PhysicsTools/PyTorch/test/testTorchSimpleDnn.cc

+//   runModel("/data/user/dvalsecc/simple_dnn.pt", cuda);
+
+//   return 0;
+// }


Is this commented out code still useful?

PhysicsTools/PyTorch/test/testTorchSimpleDnnCUDA.cc

makortel · 2023-12-01T21:35:26Z

PhysicsTools/PyTorch/test/testTorchSimpleDnnCUDA.cc

+//   runModel("/data/user/dvalsecc/simple_dnn.pt", cuda);
+
+//   return 0;
+// }


Is this commented out code still useful?

makortel · 2023-12-01T21:36:49Z

PhysicsTools/PythonAnalysis/test/BuildFile.xml

+<bin name="testTorchTimeSeries" file="time_serie_prediction.cpp">
+  <use name="pytorch"/>
+  <use name="pytorch-cuda"/>
+</bin>


@smuzaffar Does this test binary definition need any of the <iftool name="cuda">, or does the binary get ignored implicitly when cuda is not available?

These tests came from the @iarspider PR #41162
We can move them in the new PyTorch subpackage I think.

makortel · 2023-12-01T21:37:30Z

PhysicsTools/PyTorch/test/testBaseCUDA.h

+#include "FWCore/Utilities/interface/Exception.h"
+#include "FWCore/Utilities/interface/ResourceInformation.h"
+#include "HeterogeneousCore/CUDAServices/interface/CUDAInterface.h"
+#include "HeterogeneousCore/CUDAUtilities/interface/requireDevices.h"


This #include seems to be unused.

it is used to test the presence of devices if (!cms::cudatest::testDevices())

Thanks. I see the call to cms::cudatest::testDevices() is in testTorchSimpleDnnCUDA.cc, so this #include should be moved there.

On the other hand, that check is not really needed, because (nowadays) scram b runtests will skip the test if the node has no GPU. And in the case of USER_UNIT_TESTS=cuda scram b runtests we want the test program to fail. Therefore, I'd suggest to either remove the call to cms::cudatest::testDevices(), or if you want more clear error message in case of devices not being available, use the not testDevices() to print the clearer error message and fail the test program.

makortel · 2023-12-01T21:38:36Z

PhysicsTools/PyTorch/test/BuildFile.xml

+    <use name="FWCore/ServiceRegistry"/>
+    <use name="FWCore/Utilities"/>
+    <use name="HeterogeneousCore/CUDAServices"/>
+    <use name="HeterogeneousCore/CUDAUtilities"/>


Looks like the dependence on CUDAUtilities is not really needed.

makortel · 2023-12-01T21:40:20Z

test parameters:

pull_request = Add recipe for pytorch (C++ interface only) cmsdist#8388
enable = gpu

makortel · 2023-12-01T21:40:26Z

@cmsbuild, please test

cmsbuild · 2023-12-02T08:15:33Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c749ee/36272/summary.html
COMMIT: 590cdd2
CMSSW: CMSSW_14_0_X_2023-12-01-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43475/36272/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c749ee/36272/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c749ee/36272/git-merge-result

Comparison Summary

Summary:

You potentially added 271 lines to the logs
Reco comparison results: 20 differences found in the comparisons
DQMHistoTests: Total files compared: 50
DQMHistoTests: Total histograms compared: 3370032
DQMHistoTests: Total failures: 1196
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3368814
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
Checked 214 log files, 167 edm output root files, 50 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 48 differences found in the comparisons
DQMHistoTests: Total files compared: 3
DQMHistoTests: Total histograms compared: 39740
DQMHistoTests: Total failures: 1588
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 38152
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
Checked 8 log files, 10 edm output root files, 3 DQM output files
TriggerResults: no differences found

fwyzard · 2023-12-02T09:10:02Z

@valsdav coukd you add a README.md file for the new package, with a short description and the instructions for re-creating the model binary ?

valsdav · 2023-12-04T12:50:00Z

@valsdav coukd you add a README.md file for the new package, with a short description and the instructions for re-creating the model binary ?

Hi @fwyzard sure, I will add a README and also improve the generation of the model binaries using apptainer and the cms-ml docker image for that.

cmsbuild · 2023-12-08T10:02:03Z

Pull request #43475 was updated. @wpmccormack, @valsdav, @tvami can you please check and sign again.

cmsbuild · 2024-07-29T13:33:23Z

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c749ee/40662/summary.html
COMMIT: 2d2c5ff
CMSSW: CMSSW_14_1_X_2024-07-28-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43475/40662/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 20 lines to the logs
Reco comparison results: 92 differences found in the comparisons
DQMHistoTests: Total files compared: 45
DQMHistoTests: Total histograms compared: 3423961
DQMHistoTests: Total failures: 2467
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3421474
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 44 files compared)
Checked 196 log files, 165 edm output root files, 45 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 2 differences found in the comparisons
DQMHistoTests: Total files compared: 6
DQMHistoTests: Total histograms compared: 37022
DQMHistoTests: Total failures: 1307
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 35715
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 5 files compared)
Checked 20 log files, 25 edm output root files, 6 DQM output files
TriggerResults: no differences found

valsdav · 2024-07-30T07:16:33Z

+ml

tvami · 2024-07-30T14:38:38Z

+1

PR adds tests for PyTorch
tests pass

cmsbuild · 2024-07-30T14:39:06Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

valsdav · 2024-07-30T15:24:40Z

Just wanted to add that a README with docs and a more refined user-facing interface for PyTorch models will be added in a later PR. We wanted to merge this one adding basic tests for the pytorch inference functionality.

mandrenguyen · 2024-07-31T19:35:01Z

assign core
It's not immediately clear that all Matti's comments were addressed so let's let core sign, and then hopefully we can merge this long-standing PR.

cmsbuild · 2024-07-31T19:35:25Z

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel · 2024-08-01T13:44:26Z

+Core

cmsbuild · 2024-08-01T13:44:53Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

mandrenguyen · 2024-08-01T18:31:43Z

+1

mandrenguyen · 2024-08-03T13:57:53Z

Are these unit test failures coming from this PR?
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_aarch64_gcc12/CMSSW_14_1_X_2024-08-03-1100/unitTestLogs/PhysicsTools/PyTorch#/

makortel · 2024-08-05T14:11:47Z

Are these unit test failures coming from this PR?

Yes. The cause is in

===== Test "testTorchSimpleDnn" ====
Running .cmd: apptainer exec -B /data/cmsbld/jenkins_a/workspace/ib-run-qa/CMSSW_14_1_X_2024-08-03-1100  /cvmfs/unpacked.cern.ch/registry.hub.docker.com/cmsml/cmsml:3.11  python /data/cmsbld/jenkins_a/workspace/ib-run-qa/CMSSW_14_1_X_2024-08-03-1100/src/PhysicsTools/PyTorch/test/create_simple_dnn.py /data/cmsbld/jenkins_a/workspace/ib-run-qa/CMSSW_14_1_X_2024-08-03-1100/test/el8_aarch64_gcc12/52de-5310-c164-e523
FATAL:   image targets 'amd64', cannot run on 'arm64'

i.e. we either need the container for aarch64, or restrict these tests to be run only on amd64.

Other tests in this package show a printout

[W803 13:27:40.025787660 CUDAFunctions.cpp:108] Warning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (function operator())

Does PyTorch try to initialize CUDA every time? Is there a way to control the CUDA initialization explicitly?

cmsbuild added this to the CMSSW_14_0_X milestone Dec 1, 2023

cmsbuild added analysis-pending pending-signatures tests-pending orp-pending new-package-pending code-checks-pending labels Dec 1, 2023

valsdav force-pushed the pytorch-tests branch from 0bfcc4a to 590cdd2 Compare December 1, 2023 17:02

cmsbuild added code-checks-approved and removed code-checks-pending labels Dec 1, 2023

valsdav mentioned this pull request Dec 1, 2023

Added PhysicsTools/PyTorch and new category for ML tools cms-sw/cms-bot#2121

Merged

makortel reviewed Dec 1, 2023

View reviewed changes

cmsbuild added tests-started requires-external and removed tests-pending labels Dec 1, 2023

cmsbuild added tests-approved and removed tests-started labels Dec 2, 2023

smuzaffar mentioned this pull request Dec 4, 2023

Add recipe for pytorch (C++ interface only) cms-sw/cmsdist#8388

Merged

cmsbuild added ml-pending and removed new-package-pending labels Dec 8, 2023

valsdav force-pushed the pytorch-tests branch from 590cdd2 to 07c40c0 Compare December 19, 2023 14:56

cmsbuild removed the tests-approved label Dec 19, 2023

cmsbuild added tests-approved and removed tests-started labels Jul 29, 2024

cmsbuild added ml-approved and removed ml-pending labels Jul 30, 2024

cmsbuild added analysis-approved fully-signed and removed analysis-pending pending-signatures labels Jul 30, 2024

cmsbuild added core-pending pending-signatures and removed fully-signed labels Jul 31, 2024

cmsbuild added core-approved fully-signed and removed core-pending pending-signatures labels Aug 1, 2024

cmsbuild added orp-approved and removed orp-pending labels Aug 1, 2024

cmsbuild merged commit a00ad1a into cms-sw:master Aug 1, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First PyTorch tests for TorchScript inference CPU/CUDA #43475

First PyTorch tests for TorchScript inference CPU/CUDA #43475

valsdav commented Dec 1, 2023

cmsbuild commented Dec 1, 2023

cmsbuild commented Dec 1, 2023 •

edited

Loading

makortel Dec 1, 2023

makortel Dec 1, 2023

smuzaffar Dec 1, 2023

valsdav Dec 4, 2023

makortel Dec 1, 2023

valsdav Dec 4, 2023

makortel Dec 1, 2023

makortel Dec 1, 2023

makortel Dec 1, 2023

valsdav Dec 4, 2023

makortel Dec 1, 2023

valsdav Jul 26, 2024

makortel Jul 26, 2024

makortel Dec 1, 2023

makortel commented Dec 1, 2023

makortel commented Dec 1, 2023

cmsbuild commented Dec 2, 2023

fwyzard commented Dec 2, 2023

valsdav commented Dec 4, 2023

cmsbuild commented Dec 8, 2023

cmsbuild commented Jul 29, 2024

valsdav commented Jul 30, 2024

tvami commented Jul 30, 2024

cmsbuild commented Jul 30, 2024

valsdav commented Jul 30, 2024

mandrenguyen commented Jul 31, 2024

cmsbuild commented Jul 31, 2024

makortel commented Aug 1, 2024

cmsbuild commented Aug 1, 2024

mandrenguyen commented Aug 1, 2024

mandrenguyen commented Aug 3, 2024

makortel commented Aug 5, 2024

First PyTorch tests for TorchScript inference CPU/CUDA #43475

First PyTorch tests for TorchScript inference CPU/CUDA #43475

Conversation

valsdav commented Dec 1, 2023

PR description:

PR validation:

cmsbuild commented Dec 1, 2023

cmsbuild commented Dec 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

makortel commented Dec 1, 2023

makortel commented Dec 1, 2023

cmsbuild commented Dec 2, 2023

Comparison Summary

GPU Comparison Summary

fwyzard commented Dec 2, 2023

valsdav commented Dec 4, 2023

cmsbuild commented Dec 8, 2023

cmsbuild commented Jul 29, 2024

Comparison Summary

GPU Comparison Summary

valsdav commented Jul 30, 2024

tvami commented Jul 30, 2024

cmsbuild commented Jul 30, 2024

valsdav commented Jul 30, 2024

mandrenguyen commented Jul 31, 2024

cmsbuild commented Jul 31, 2024

makortel commented Aug 1, 2024

cmsbuild commented Aug 1, 2024

mandrenguyen commented Aug 1, 2024

mandrenguyen commented Aug 3, 2024

makortel commented Aug 5, 2024

cmsbuild commented Dec 1, 2023 •

edited

Loading