Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TF AOT interface. #43941

Merged
merged 6 commits into from
Mar 14, 2024
Merged

Add TF AOT interface. #43941

merged 6 commits into from
Mar 14, 2024

Conversation

riga
Copy link
Contributor

@riga riga commented Feb 12, 2024

PR description

This PR adds the interface for using ahead-of-time (AOT) compiled TensorFlow models, as presented in the core software meeting on Dec 12 (@Bogdan-Wiederspan).

The accompanying cmsdist PR is cms-sw/cmsdist#9005.

As discussed before (see talk above), we would like to propose a 2-stage integration process:

  1. This PR includes two main components.
    • A general interface for working with statically batched AOT models, as they do not support dynamic batching out-of-the-box. AOT models are only accessible after the AOT compilation process that provides one header and one object file per batch size. The compilation is based on a TF-shipped command-line tool saved_model_cli that we wrapped through the external cmsml package that is included through the cmsdist PR linked above. The interface wraps functionality to handle multiple batch sizes and provides a convenience layer on top of the rather low-level AOT objects (for instance to evaluate a model with n inputs and m outputs of different types).
    • As models now technically become software dependencies (as they are used through compiled code), we included a "development workflow" that allows groups to test their models locally without having to go through the cms-external → cmsdist → toolfile steps (see stage 2 below). This workflow can be triggered through a simple script, PhysicsTools/TensorFlowAOT/scripts/compile_model.py, that mimics the integration workflow and provides a tool file that plugins can <use ....
  2. In a second stage we would include test models (and maybe a first port of an existing production model) as a cms-external with a spec file in cmsdist to (AOT) compile them as a software dependency, including a tool file that plugins in CMSSW could use (for instance, say, <use name="tfaot-model-btv-deepflavor"/>).

PR validation

We added test cases for the development workflow and the general compilation procedure. A third test case is implemented but disabled for now, as it requires header and object files of previously compiled models. We would enable this test in the second stage of the integration process.

@valsdav @ml-l2

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 12, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43941/38811

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @riga (Marcel Rieger) for master.

It involves the following packages:

  • PhysicsTools/TensorFlowAOT (****)

The following packages do not have a category, yet:

PhysicsTools/TensorFlowAOT
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild can you please review it and eventually sign? Thanks.
@makortel this is something you requested to watch as well.
@sextonkennedy, @rappoccio, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43941/38813

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

Pull request #43941 was updated. @cmsbuild can you please check and sign again.

@makortel
Copy link
Contributor

assign ml

@cmsbuild
Copy link
Contributor

New categories assigned: ml

@valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks

// #define EIGEN_USE_THREADS
// #define EIGEN_USE_CUSTOM_THREAD_POOL

#include "FWCore/Framework/interface/Frameworkfwd.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this #include needed?

Suggested change
#include "FWCore/Framework/interface/Frameworkfwd.h"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2e0dad2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose if this header? I see it used only in PhysicsTools/TensorFlowAOT/test/testInterface.cc. We generally don't have headers that only #include other headers.

Copy link
Contributor Author

@riga riga Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought it could be useful for future plugins to include just a single file, instead of - as is the case right now - Batching.h and Model.h. But if the recommendation is to do this differently (i.e., remove this header), we will of course do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Batching.h and Model.h something users (i.e. human written code) need to #include?

Are the two always used together? (well, Model.h already includes Batching.h, so I guess the question is if there is any use for including Batching.h alone?)

Or maybe the question should be, are users expected to interact only with Model (from Model.h), or also directly with the classes in Batching.h? (I'm wondering if Model.h alone would be sufficient to provide the user-facing interface)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users have to interact with Model.h in any case.
In case they want to configure custom batch rules (e.g. for a batch size of 3 they could instruct the model to use "2+1" instead of the default "1+1+1"), they would also have to include Batching.h.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Then I would have all users to #include the Model.h, and those who need the custom batch rules (and want to be pedantic) would #include Batching.h in addition.

Copy link
Contributor Author

@riga riga Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then we go ahead and remove the AOT.h header.
Done in 1e5643c.

Comment on lines 23 to 24
// destructor
~BatchRule() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either leave out or

Suggested change
// destructor
~BatchRule() {}
// destructor
~BatchRule() = default;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2e0dad2.

};

// stream operator
std::ostream& operator<<(std::ostream& out, const BatchRule& rule);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function would imply the need to #include <ostream>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 2e0dad2.

class BatchStrategy {
public:
// constructor
explicit BatchStrategy(){};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
explicit BatchStrategy(){};
explicit BatchStrategy() = default;

would be better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2e0dad2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test do anything else than run python commands and check existence of files? If this is the case, I'd find it easier to understand as a shell script (or as a python script).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Translated to python in e08b8ba.

std::cout << "tesing simplemodel" << std::endl;

// initialize the model
auto model = std::make_unique<tfaot::Model<PhysicsTools_TensorFlowAOT::simplemodel>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a heap allocation as opposed to

Suggested change
auto model = std::make_unique<tfaot::Model<PhysicsTools_TensorFlowAOT::simplemodel>>();
tfaot::Model<PhysicsTools_TensorFlowAOT::simplemodel model;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in e08b8ba.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the generated header file end up committed in git? If yes, how about we (eventually) add a test that checks the generated header passes the clang-format and clang-tidy (I'd be tempted with static analyzer as well)? We have such test in FWCore/Skeletons for the mk* generation scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated headers will not end up in cmssw.
In the development workflow, header files are created only for local tests.
Once the production mechanism is established, generated headers will be part of externals provided through cmsdist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Maybe a visual inspection of the header template would be sufficient then.

Comment on lines 10 to 12
// disable eigen thread pool (therefore explicitly commented out)
// #define EIGEN_USE_THREADS
// #define EIGEN_USE_CUSTOM_THREAD_POOL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the comment mean the #define EIGEN_USE_... being commented out disables the use Eigen thread pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was rather meant as a reminder, but we removed the comment in e08b8ba. The default is that no threadpool / multithreading is used.

Comment on lines 4 to 5
<use name="FWCore/MessageLogger"/>
<use name="FWCore/ServiceRegistry"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two dependencies seem unused?

Suggested change
<use name="FWCore/MessageLogger"/>
<use name="FWCore/ServiceRegistry"/>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 2e0dad2.

@makortel
Copy link
Contributor

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build HeaderConsistency ClangBuild
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-01b342/37387/summary.html
COMMIT: 2551773
CMSSW: CMSSW_14_1_X_2024-02-12-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43941/37387/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

------- copying files from src/PhysicsTools/TensorFlowAOT/scripts -------
Entering library rule at PhysicsTools/TensorFlowAOT
>> Compiling  src/PhysicsTools/TensorFlowAOT/src/Batching.cc
>> Compiling  src/PhysicsTools/TensorFlowAOT/src/Wrapper.cc
In file included from src/PhysicsTools/TensorFlowAOT/src/Wrapper.cc:10:
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-02-12-1100/src/PhysicsTools/TensorFlowAOT/interface/Wrapper.h:12:10: fatal error: tensorflow/compiler/tf2xla/xla_compiled_cpu_function.h: No such file or directory
   12 | #include "tensorflow/compiler/tf2xla/xla_compiled_cpu_function.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from src/PhysicsTools/TensorFlowAOT/src/Wrapper.cc:10:
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-02-12-1100/src/PhysicsTools/TensorFlowAOT/interface/Wrapper.h:12:10: fatal error: tensorflow/compiler/tf2xla/xla_compiled_cpu_function.h: No such file or directory


Clang Build

I found compilation error while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' scram build -k -j 32 COMPILER='llvm compile'

>> Local Products Rules ..... done
****WARNING: Invalid tool tensorflow-xla-runtime. Please fix src/PhysicsTools/TensorFlowAOT/BuildFile.xml file.
>> Creating project symlinks
>> Entering Package PhysicsTools/TensorFlowAOT
>> Compile sequence completed for CMSSW CMSSW_14_1_X_2024-02-12-1100
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1
+ eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-02-12-1100/tmp/el8_amd64_gcc12/cache/log/src '||' 'true)'
++ scram build outputlog
>> Entering Package PhysicsTools/TensorFlowAOT
------- copying files from src/PhysicsTools/TensorFlowAOT/scripts -------
Entering library rule at PhysicsTools/TensorFlowAOT


@cmsbuild
Copy link
Contributor

Pull request #43941 was updated. @wpmccormack, @valsdav, @cmsbuild can you please check and sign again.

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-01b342/37516/summary.html
COMMIT: 1e5643c
CMSSW: CMSSW_14_1_X_2024-02-15-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43941/37516/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 40 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3248554
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3248529
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 200 log files, 161 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

makortel commented Mar 8, 2024

@cmsbuild, please test

Sorry, where are we with this PR? :)

I guess PhysicsTools/TensorFlowAOT should be added for ml to sign.

Has @cms-sw/ml-l2 taken a look of this PR yet?

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 9, 2024

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-01b342/38003/summary.html
COMMIT: 1e5643c
CMSSW: CMSSW_14_1_X_2024-03-08-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43941/38003/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-01b342/38003/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-01b342/38003/git-merge-result

Comparison Summary

Summary:

@riga
Copy link
Contributor Author

riga commented Mar 11, 2024

Sorry, where are we with this PR? :)

We addressed all review comments so nothing to add from our side 👍

@valsdav
Copy link
Contributor

valsdav commented Mar 13, 2024

+1

from @cms-sw/ml-l2

The TF AOT implementation has been discussed in the ML production group and we are looking forward to testing in on the production models! Thanks @riga

@smuzaffar
Copy link
Contributor

cms-sw/cms-bot#2195 adds this new package under @cms-sw/ml-l2

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)
Notice This PR was tested with additional Pull Request(s), please also merge them if necessary: cms-sw/cmsdist#9005

@rappoccio
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 16ae21f into cms-sw:master Mar 14, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants