-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable TF XLA, add tensorflow-xla-runtime. #9005
Enable TF XLA, add tensorflow-xla-runtime. #9005
Conversation
A new Pull Request was created by @riga (Marcel Rieger) for branch IB/CMSSW_14_1_X/master. @cmsbuild, @aandvalenzuela, @smuzaffar, @iarspider can you please review it and eventually sign? Thanks. |
cms-bot internal usage |
cmssw-vectorization.file
Outdated
@@ -1,2 +1,2 @@ | |||
%define vectorized_packages zlib fastjet tensorflow-sources tensorflow OpenBLAS rivet gbl | |||
%define vectorized_packages zlib fastjet tensorflow-sources tensorflow tensorflow-xla-runtime OpenBLAS rivet gbl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riga, please also update https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/vectorization/cmsdist_packages.py#L14-L22 and include tensorflow-xla-runtime
in the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry, I forgot to mention, please also add scram-tools.file/tools/tensorflow-xla-runtime/vectorized.tmpl
file same as https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/scram-tools.file/tools/tensorflow/vectorized.tmpl except that the tool name
should be tensorflow-xla-runtime_@TOOL_VECTORIZATION@
Pull request #9005 was updated. |
tensorflow-xla-runtime.spec
Outdated
export CPATH="${CPATH}:%{i}/tensorflow/include/third_party/eigen3" | ||
|
||
pushd tensorflow/xla_aot_runtime_src | ||
cmake . -DCMAKE_CXX_FLAGS="-fPIC" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to properly build for multi-vectorization please add -msse3
here e.g. -DCMAKE_CXX_FLAGS="-fPIC -msse3"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 👍
please test lets see the external build step |
Pull request #9005 was updated. |
@@ -0,0 +1,32 @@ | |||
### RPM external tensorflow-xla-runtime 2.12.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riga, we need tensorflow-xla-runtime
to be part of cmssw dependency chain, so please add it in https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/cmssw-tool-conf.spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 👍
Pull request #9005 was updated. |
please test |
pip/tensorflow.file
Outdated
@@ -1,7 +1,7 @@ | |||
Requires: py3-tensorboard py3-opt-einsum py3-tensorflow-estimator py3-wrapt py3-google-pasta py3-scipy | |||
Requires: tensorflow-sources | |||
%define PipPreBuildPy PIPFILE=${TENSORFLOW_SOURCES_ROOT}/tensorflow-%{realversion}-cp%{cms_python3_major_minor}-cp%{cms_python3_major_minor}-linux_%{_arch}.whl | |||
%define PipPostBuild rm -f %{i}/bin/tensorboard* ; ls %{i}/bin/* | xargs -i mv '{}' '{}3' | |||
%define PipPostBuild rm -f %{i}/bin/tensorboard* ; rm -r %{i}/lib/python%{cms_python3_major_minor_version}/site-packages/tensorflow/xla_aot_runtime_src ; ls %{i}/bin/* | xargs -i mv '{}' '{}3' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riga any specific reason to delete it? Looks like this is what you get out of tensorflow-sources wheel file. May be if we do not delete it here then you can just add dependency on py3-tensorflow
and copy $PY3_TENSORFLOW_ROOT/$PYTHON3_LIB_SITE_PACKAGES/tensorflow/xla_aot_runtime_src to build tensorflow-xla-runtime
... right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. We changed this in f77258c.
tensorflow-xla-runtime.spec
Outdated
export CPATH="${CPATH}:%{i}/tensorflow/include/third_party/eigen3" | ||
|
||
pushd tensorflow/xla_aot_runtime_src | ||
cmake . -DCMAKE_CXX_FLAGS="-fPIC -msse3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can add -DBUILD_SHARED_LIBS=OFF
to disable shared library and directly build the archive lib. In this case you can avoid running gcc -shared ....
command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also please add -DCMAKE_CXX_STANDARD=%{cms_cxx_standard}
just to make sure to use the correct c++std. Ypu need to include cpp-standard.file
too see https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/tbb.spec#L2C12-L2C24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both added in f77258c.
tensorflow-xla-runtime.spec
Outdated
|
||
%install | ||
|
||
mv tensorflow/include %{i} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riga, the contents of tensorflow/include
are same as the one in $PY3_TENSORFLOW_ROOT/$PYTHON3_LIB_SITE_PACKAGES/tensorflow/include
. I would suggest that instead of duplicating this why not we get the include from py3-tensorflow ( we can add an extra toolfile so that these headers are accessible in cmssw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In f77258c we changed the dependency of tensorflow-xla-runtime
from tensorflow-sources
to py3-tensorflow
and added a separate tool file py3-tensorflow/tensorflow-includes.xml
that we use
d in the toolfile of the xla runtime.
tensorflow-xla-runtime.spec
Outdated
|
||
mkdir -p %{i}/lib/archive | ||
mv tensorflow/xla_aot_runtime_src/libtf_xla_runtime.a %{i}/lib/archive/libtf_xla_runtime-static.a | ||
mv tensorflow/xla_aot_runtime_src/libtf_xla_runtime.so %{i}/lib/libtf_xla_runtime.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the shared lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for now. We removed it in f77258c.
-1 Failed Tests: UnitTests Unit TestsI found 5 errors in the following unit tests: ---> test testTFConstSession had ERRORS ---> test testTFGraphLoading had ERRORS ---> test testTFThreadPools had ERRORS and more ... Comparison SummarySummary:
|
-1 Failed Tests: AddOn The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: AddOn Tests
Comparison SummarySummary:
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-656fa9/38038/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison SummarySummary:
|
REMINDER @antoniovilela, @rappoccio, @sextonkennedy: This PR was tested with cms-sw/cmssw#43941, please check if they should be merged together |
Pull request #9005 was updated. |
@riga, |
Does this mean that in order to fully utilize the multi-arch build (i.e. wider vector instructions) for the AOT-compiled models we need some further development? (like making |
yes @makortel
currently the multi-arch build works by linking the shared lib and then loading the correct micro-arch library at runtime. For archive libs we need to change build rules to use the correct micro-arch library at build time. It is doable but need some development
I do not know if this will work for xla cmssw build. @riga , remind me why we have this archive libs instead of shared. |
@smuzaffar The shared library had some symbols missing when linked to certain AOT models. However, we did not fully understand whether this is the intended behavior and the library is only meant to be used as a archive, or whether there is a way to build it correctly. This is unfortunately not documented. If having only the static archive is a show stopper, we can look deeper in the *.so build.
Should this be reverted in any case? |
My take is that it is not a showstopper, but something that probably should be worked on in the (near) future. |
Ok. Since we plan a second PR with the production workflow for the compilation of models anyway, we would work on the shared library build then. |
Pull request #9005 was updated. |
+1 Tests are already successful in the sibling PR in cmssw. |
merge |
Hi!
This PR enables TensorFlow XLA and adds a new tool
tensorflow-xla-runtime
, which provides the library needed by AOT compiled ML models.Changes in particular:
tf2xla_supported_ops
to bazel targets, providing an executable that lists XLA and AOT compatible TF ops for the current env.tensorflow-xla-runtime
spec, which is based ontensorflow-source
which contains everything necessary to compile the runtime via cmake.wrapt
is downgraded to from 1.15.0 to 1.14.1 since 1.15 introduces a bug in TF's SavedModel interface (issue linked next toNO_AUTO_UPDATE
instruction)The PR is required by cms-sw/cmssw#43941.