Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch custom operators for graph operations #9094

Merged

Conversation

valsdav
Copy link
Contributor

@valsdav valsdav commented Mar 22, 2024

The PR add to the externals 3 small libraries of pytorch operators for optimized graph clustering and sparse tensors operations:

These libraries are used in graph neural networks to perform clustering operations, i.e. in the DRN model cms-sw/cmssw#37134

A pytorch-custom-ops tool is defined to link all of them.
Only the CPU ops are compiled for the moment. Kernels for CUDA are also available.

PR validation

I have linked successfully a CMSSW test, loading a DRN TorchScript model. The PyTorch engine finds the operators and does not complain. I will create a PR to CMSSW with tests in the next days.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @valsdav for branch IB/CMSSW_14_1_X/master.

@iarspider, @smuzaffar, @aandvalenzuela, @cmsbuild can you please review it and eventually sign? Thanks.
@antoniovilela, @sextonkennedy, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 22, 2024

cms-bot internal usage

@valsdav valsdav changed the title Pytorch custom operators for graph operations PyTorch custom operators for graph operations Mar 22, 2024
@smuzaffar
Copy link
Contributor

@valsdav , can you please apply the following changes

  • Update the comment # Make sure the default c++sdt stand is c++11 to # Make sure the default c++sdt stand is c++14 (as the grep command after this comment is looking for 14
  • Remove the empty %post sections
  • pytorch-custom-ops.xml tool file is not going to be part of cmssw as there is no pytorch-custom-ops.spec. I would suggest to create a meta package pytorch-custom-ops.spec which does nothing in %prep, %build, %install sections but it depends on other pytorch-* tools e.g something like [a]
  • Add pytorch-custom-ops dependency in cmssw-tool-conf.spec .

[a]

### RPM external pytorch-custom-ops 1.0
Source: none
Requires: pytorch-scatter
Requires: pytorch-cluster
Requires: pytorch-sparse
%prep
%build
%install

@cmsbuild
Copy link
Contributor

Pull request #9094 was updated.

@smuzaffar
Copy link
Contributor

please test

@@ -0,0 +1,10 @@
<tool name="pytorch-cluster" version="@TOOL_VERSION@">
<client>
<environment name="TORCH_CLUSTER_BASE" default="@TOOL_ROOT@"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valsdav , can you please update all the newly added xml files to have PYTORCH_ instead of TORCH_ e.g. replace TORCH_CLUSTER_BASE with PYTORCH_CLUSTER_BASE every where. The name of this variable should match the name of tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 492ec08

@cmsbuild
Copy link
Contributor

Pull request #9094 was updated.

@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

please test for el8_aarch64_gcc12

@smuzaffar
Copy link
Contributor

please test for el8_ppc64le_gcc12

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38419/summary.html
COMMIT: 492ec08
CMSSW: CMSSW_14_1_X_2024-03-25-2300/el8_ppc64le_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9094/38419/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38419/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38419/git-merge-result

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38418/summary.html
COMMIT: 492ec08
CMSSW: CMSSW_14_1_X_2024-03-25-2300/el8_aarch64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9094/38418/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38418/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38418/git-merge-result

RelVals

The relvals timed out after 4 hours.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38403/summary.html
COMMIT: 492ec08
CMSSW: CMSSW_14_1_X_2024-03-25-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9094/38403/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38403/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f196af/38403/git-merge-result

Comparison Summary

Summary:

  • You potentially added 38 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 51890 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3297537
  • DQMHistoTests: Total failures: 305653
  • DQMHistoTests: Total nulls: 255
  • DQMHistoTests: Total successes: 2991609
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.686 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.596 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 13034.0 ): 0.357 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): 0.234 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): -0.225 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 7.3 ): -0.276 KiB SiStrip/MechanicalView
  • Checked 202 log files, 165 edm output root files, 48 DQM output files
  • TriggerResults: found differences in 20 / 46 workflows

@smuzaffar
Copy link
Contributor

+externals

@smuzaffar smuzaffar merged commit c8952e8 into cms-sw:IB/CMSSW_14_1_X/master Mar 27, 2024
20 of 21 checks passed
@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_14_1_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants