-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TF AOT interface. #43941
Add TF AOT interface. #43941
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43941/38811
|
A new Pull Request was created by @riga (Marcel Rieger) for master. It involves the following packages:
The following packages do not have a category, yet: PhysicsTools/TensorFlowAOT @cmsbuild can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43941/38813
|
assign ml |
New categories assigned: ml @valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks |
// #define EIGEN_USE_THREADS | ||
// #define EIGEN_USE_CUSTOM_THREAD_POOL | ||
|
||
#include "FWCore/Framework/interface/Frameworkfwd.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this #include
needed?
#include "FWCore/Framework/interface/Frameworkfwd.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2e0dad2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose if this header? I see it used only in PhysicsTools/TensorFlowAOT/test/testInterface.cc
. We generally don't have headers that only #include
other headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We thought it could be useful for future plugins to include just a single file, instead of - as is the case right now - Batching.h
and Model.h
. But if the recommendation is to do this differently (i.e., remove this header), we will of course do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are Batching.h
and Model.h
something users (i.e. human written code) need to #include
?
Are the two always used together? (well, Model.h
already includes Batching.h
, so I guess the question is if there is any use for including Batching.h
alone?)
Or maybe the question should be, are users expected to interact only with Model
(from Model.h
), or also directly with the classes in Batching.h
? (I'm wondering if Model.h
alone would be sufficient to provide the user-facing interface)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users have to interact with Model.h
in any case.
In case they want to configure custom batch rules (e.g. for a batch size of 3 they could instruct the model to use "2+1" instead of the default "1+1+1"), they would also have to include Batching.h
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Then I would have all users to #include
the Model.h
, and those who need the custom batch rules (and want to be pedantic) would #include
Batching.h
in addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then we go ahead and remove the AOT.h header.
Done in 1e5643c.
// destructor | ||
~BatchRule() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either leave out or
// destructor | |
~BatchRule() {} | |
// destructor | |
~BatchRule() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2e0dad2.
}; | ||
|
||
// stream operator | ||
std::ostream& operator<<(std::ostream& out, const BatchRule& rule); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function would imply the need to #include <ostream>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 2e0dad2.
class BatchStrategy { | ||
public: | ||
// constructor | ||
explicit BatchStrategy(){}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit BatchStrategy(){}; | |
explicit BatchStrategy() = default; |
would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 2e0dad2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test do anything else than run python commands and check existence of files? If this is the case, I'd find it easier to understand as a shell script (or as a python script).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Translated to python in e08b8ba.
std::cout << "tesing simplemodel" << std::endl; | ||
|
||
// initialize the model | ||
auto model = std::make_unique<tfaot::Model<PhysicsTools_TensorFlowAOT::simplemodel>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a heap allocation as opposed to
auto model = std::make_unique<tfaot::Model<PhysicsTools_TensorFlowAOT::simplemodel>>(); | |
tfaot::Model<PhysicsTools_TensorFlowAOT::simplemodel model; |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in e08b8ba.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the generated header file end up committed in git? If yes, how about we (eventually) add a test that checks the generated header passes the clang-format
and clang-tidy
(I'd be tempted with static analyzer as well)? We have such test in FWCore/Skeletons
for the mk*
generation scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generated headers will not end up in cmssw.
In the development workflow, header files are created only for local tests.
Once the production mechanism is established, generated headers will be part of externals provided through cmsdist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Maybe a visual inspection of the header template would be sufficient then.
// disable eigen thread pool (therefore explicitly commented out) | ||
// #define EIGEN_USE_THREADS | ||
// #define EIGEN_USE_CUSTOM_THREAD_POOL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the comment mean the #define EIGEN_USE_...
being commented out disables the use Eigen thread pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was rather meant as a reminder, but we removed the comment in e08b8ba. The default is that no threadpool / multithreading is used.
<use name="FWCore/MessageLogger"/> | ||
<use name="FWCore/ServiceRegistry"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two dependencies seem unused?
<use name="FWCore/MessageLogger"/> | |
<use name="FWCore/ServiceRegistry"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 2e0dad2.
@cmsbuild, please test |
-1 Failed Tests: Build HeaderConsistency ClangBuild BuildI found compilation error when building: ------- copying files from src/PhysicsTools/TensorFlowAOT/scripts ------- Entering library rule at PhysicsTools/TensorFlowAOT >> Compiling src/PhysicsTools/TensorFlowAOT/src/Batching.cc >> Compiling src/PhysicsTools/TensorFlowAOT/src/Wrapper.cc In file included from src/PhysicsTools/TensorFlowAOT/src/Wrapper.cc:10: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-02-12-1100/src/PhysicsTools/TensorFlowAOT/interface/Wrapper.h:12:10: fatal error: tensorflow/compiler/tf2xla/xla_compiled_cpu_function.h: No such file or directory 12 | #include "tensorflow/compiler/tf2xla/xla_compiled_cpu_function.h" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. In file included from src/PhysicsTools/TensorFlowAOT/src/Wrapper.cc:10: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-02-12-1100/src/PhysicsTools/TensorFlowAOT/interface/Wrapper.h:12:10: fatal error: tensorflow/compiler/tf2xla/xla_compiled_cpu_function.h: No such file or directory Clang BuildI found compilation error while trying to compile with clang. Command used:
>> Local Products Rules ..... done ****WARNING: Invalid tool tensorflow-xla-runtime. Please fix src/PhysicsTools/TensorFlowAOT/BuildFile.xml file. >> Creating project symlinks >> Entering Package PhysicsTools/TensorFlowAOT >> Compile sequence completed for CMSSW CMSSW_14_1_X_2024-02-12-1100 gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1 + eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_14_1_X_2024-02-12-1100/tmp/el8_amd64_gcc12/cache/log/src '||' 'true)' ++ scram build outputlog >> Entering Package PhysicsTools/TensorFlowAOT ------- copying files from src/PhysicsTools/TensorFlowAOT/scripts ------- Entering library rule at PhysicsTools/TensorFlowAOT |
Pull request #43941 was updated. @wpmccormack, @valsdav, @cmsbuild can you please check and sign again. |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-01b342/37516/summary.html Comparison SummarySummary:
|
@cmsbuild, please test Sorry, where are we with this PR? :) I guess Has @cms-sw/ml-l2 taken a look of this PR yet? |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-01b342/38003/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison SummarySummary:
|
We addressed all review comments so nothing to add from our side 👍 |
+1 from @cms-sw/ml-l2 The TF AOT implementation has been discussed in the ML production group and we are looking forward to testing in on the production models! Thanks @riga |
cms-sw/cms-bot#2195 adds this new package under @cms-sw/ml-l2 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description
This PR adds the interface for using ahead-of-time (AOT) compiled TensorFlow models, as presented in the core software meeting on Dec 12 (@Bogdan-Wiederspan).
The accompanying cmsdist PR is cms-sw/cmsdist#9005.
As discussed before (see talk above), we would like to propose a 2-stage integration process:
saved_model_cli
that we wrapped through the externalcmsml
package that is included through the cmsdist PR linked above. The interface wraps functionality to handle multiple batch sizes and provides a convenience layer on top of the rather low-level AOT objects (for instance to evaluate a model with n inputs and m outputs of different types).cms-external → cmsdist → toolfile
steps (see stage 2 below). This workflow can be triggered through a simple script,PhysicsTools/TensorFlowAOT/scripts/compile_model.py
, that mimics the integration workflow and provides a tool file that plugins can<use ...
.<use name="tfaot-model-btv-deepflavor"/>
).PR validation
We added test cases for the development workflow and the general compilation procedure. A third test case is implemented but disabled for now, as it requires header and object files of previously compiled models. We would enable this test in the second stage of the integration process.
@valsdav @ml-l2