Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot register 2 metrics with same name #28

Closed
daspk04 opened this issue Jul 16, 2020 · 30 comments
Closed

Cannot register 2 metrics with same name #28

daspk04 opened this issue Jul 16, 2020 · 30 comments
Labels
bug Something isn't working

Comments

@daspk04
Copy link
Contributor

daspk04 commented Jul 16, 2020

Hello @remicres ,

I was trying to import OTB and TensorFlow via python. Looks like both cannot be imported or used at the same time either I have to use otb or TensorFlow separately. Is it because OTB uses the same library that is used by TensorFlow .?

As I understand I can create a separate python program to do tasks related to OTB and task related to TensorFlow and run them separately. Or should I import TensorFlow and call the OTB applications via the command line (os.subprocess).? (haven't tested this one tho)

Any suggestions.?

image

image

@remicres
Copy link
Owner

remicres commented Jul 17, 2020

This reminds me of something. I encountered this issue in the early versions of otbtf. I did not really find the cause, but I did find a workaround at this time.

OTB needs an environment variable for the applications path (i.e. the installed/lib/otb/applications/* libraries files, .so under nux / .dll under windows I guess). In the beginning of OTB, I believe that the environment variable used was ITK_AUTOLOAD_PATH, but this changed later in OTB_APPLICATION_PATH. I did notice that, when I used the ITK_AUTOLOAD_PATH, the same error like yours happened. But after using only OTB_APPLICATION_PATH, I did avoid the error. I just checked with otbtf1.X, python -c "import tensorflow; import otbApplication" works fine.

I think that this error happen in otbtf2.X. Can you confirm that?
I bet that if you unset OTB_APPLICATION_PATH, you are able to import both tf and otb (...but it's useless because you won't be able to use the OTB apps).
I don't know what is happening: it's like tf tries to initialize twice some component.

@remicres remicres added the bug Something isn't working label Jul 17, 2020
@remicres
Copy link
Owner

I just give a try inside the otbtf2.0 and I can't reproduce your issue.

root@61276408e13f:/home/otbuser# python -c "import otbApplication; import tensorflow"
2020-07-17 09:58:05.757528: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
root@61276408e13f:/home/otbuser#

What is the version used?

@daspk04
Copy link
Contributor Author

daspk04 commented Jul 17, 2020

I have tried this on otbtf1.7:gpu as well as otbtf2.0:gpu. This one works for me as well python -c "import tensorflow; import otbApplication". But the problem is when I call otb modules or tensorflow modules after import, it failed to work. Once it call any otb module then I cannot import TensorFlow vice versa.

Can you try these.?

python -c "import otbApplication; print(otbApplication.Registry_GetAvailableApplications()); import tensorflow; print(tensorflow.__version__)"

python -c " import tensorflow; import otbApplication; print(otbApplication.Registry_GetAvailableApplications())"

@remicres
Copy link
Owner

Damn, I can reproduce this bug. Indeed, in projects where I use both, I don't import tf and otb in the same .py file. I think that otbtf1.X is also impacted.

@remicres
Copy link
Owner

According to this tensorflow issue, building TF using --config=monolithic could fix.
Here are the steps I did to manage to find a fix:

pip install 'setuptools>=41.0.0'
pip install 'numpy<1.19.0'
bazel clean
bazel build -c opt --copt=-march=native --copt=-mfpmath=both //tensorflow:libtensorflow_framework.so //tensorflow:libtensorflow_cc.so //tensorflow/tools/pip_package:build_pip_package --noincompatible_do_not_split_linking_cmdline --config=monolithic

The error message is now different... but it still doesn't work

The following:

python -c "import otbApplication; print(otbApplication.Registry_GetAvailableApplications()); import tensorflow; print(tensorflow.__version__)"

Throws:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 101, in <module>
    from tensorflow_core import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/__init__.py", line 46, in <module>
    from . _api.v2 import compat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/__init__.py", line 39, in <module>
    from . import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/__init__.py", line 32, in <module>
    from . import compat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/compat/__init__.py", line 39, in <module>
    from . import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/compat/v1/__init__.py", line 29, in <module>
    from tensorflow._api.v2.compat.v1 import app
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/__init__.py", line 39, in <module>
    from . import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/__init__.py", line 32, in <module>
    from . import compat
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/compat/__init__.py", line 39, in <module>
    from . import v1
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/compat/v1/__init__.py", line 49, in <module>
    from tensorflow._api.v2.compat.v1 import lite
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/lite/__init__.py", line 11, in <module>
    from . import experimental
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/lite/experimental/__init__.py", line 10, in <module>
    from . import nn
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/_api/v2/compat/v1/lite/experimental/nn/__init__.py", line 10, in <module>
    from tensorflow.lite.python.lite import TFLiteLSTMCell
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/lite.py", line 34, in <module>
    from tensorflow.lite.experimental.microfrontend.python.ops import audio_microfrontend_op  # pylint: disable=unused-import
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/experimental/microfrontend/python/ops/audio_microfrontend_op.py", line 30, in <module>
    resource_loader.get_path_to_datafile("_audio_microfrontend_op.so"))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/load_library.py", line 57, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.AlreadyExistsError: Op with name _Arg

And:

python -c "import tensorflow; import otbApplication; print(otbApplication.Registry_GetAvailableApplications())"

Results in:

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:118] File already exists in database: google/protobuf/any.proto
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1367] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 

@lifeiteng
Copy link

any progress? I meet same error google-deepmind/reverb#24

@remicres
Copy link
Owner

remicres commented Jan 4, 2021

Hi @lifeiteng , unfortunately not yet

@lifeiteng
Copy link

Hi @lifeiteng , unfortunately not yet

I have fixed it. Check the link options.

@remicres
Copy link
Owner

remicres commented Jan 4, 2021

Many thanks!

I am not sure to fully understand your fix though... I don't see what you do with the PYTHON_LIB_PATH, could you please detail a bit?

@lifeiteng
Copy link

lifeiteng commented Jan 5, 2021

Many thanks!

I am not sure to fully understand your fix though... I don't see what you do with the PYTHON_LIB_PATH, could you please detail a bit?

I will push the code ASAP.

@lifeiteng
Copy link

@vijdz
Copy link
Contributor

vijdz commented Jan 18, 2021

@remicres @Pratyush1991 I found something interesting :
On my system I have /opt/otb and /opt/tensorflow .
I can import otb (which I built against libtensorflow_cc ) without my TF env variables (PATH and LD_LIBRARY_PATH).

I tried the test command :
import otbApplication ; print(otbApplication.Registry_GetAvailableApplications())
2021-01-18 20:09:44.499243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
So even if I don't import tensorflow, it seems the TF lib is loaded anyway by OTB when I access any of the OTBTF app (but not with the other apps). So when we import both otbApplication and tensorflow, it is loaded twice in memory, thus the protobuf error.

@remicres
Copy link
Owner

Yes, it is something like that. I guess that the c++ tensorflow classes used in otb applications "triggers" something that is done twice if we also import tensoflow in python. I did not quite understood the fix that @lifeiteng did in deepmind/reverd, but it looks like it is a matter of the way to link tf libs...

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

The error looks the same but I don't think the problem is. I believe their issue was some executable in the project being linked to tensorflow while it wasn't required. I guess they fixed just by removing unnecessary links.
But since you need those apps linked to tf anyway, may be there's no solution: we just can't import both tf and OTBTF apps in the same process...

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

It would probably require to modify the TF core in order to avoid re-loading the lib if it was already loaded by OTB in the same python thread...

@remicres
Copy link
Owner

For now, we can't import both, but that's a bit sad. It is like you couldn't import otbApplication and gdal in the same python code 😭

@remicres
Copy link
Owner

I am reading this. Do we have a way to know which version of protobuf is used from the OTB applications, and from the import tensorflow in python?

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

I also saw it yesterday. In the beginning I thought that could be the problem, if OTB was already using another protobuf version (because of OpenCV may be ?).
But since we can import both tensorflow and any other otbApp (which isn't linked to tensorflow) without the protobuf error, I really think it is just because tf is loaded twice !

For the record TF2.4 is built with protobuf 3.14

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

May be tensorflow/tensorflow#22810 , they're talking about building plugins with libtensorflow_cc.
They mentioned static linking, it seems hard to build, and I guess that's not an option with the OTB module architecture...

@remicres
Copy link
Owner

remicres commented Jan 19, 2021

They mentioned static linking, it seems hard to build, and I guess that's not an option with the OTB module architecture...

I don't think that it is a limitation, but this would imply some cmake/c++ magic

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

Do you think it could work ? There's probably no way to be sure without trying...
But the error doesn't imply protobuf.

Also yesterday I tried something :
After I realized libtensorflow_framework.so is already installed in the wheel, I thought this could be a reason if there are two libs and OTB did not compile with the same framework lib file.
So in a -dev container I tried to recompile with the file from the wheel? Nope, same error. I guess if the file are the same this doesn't change anything where it is located.
I don't understand this linking thing very well.

But regarding the docker build may be we can remove this target from BZL_TARGETS and use the one installed in site-packages/ since it seems it is built by default with the build_pip_package config.

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

May be the problem is that you just can't import tensorflow C and C++ instances in the same python process ?
I don't know how it works in memory when otbApplication is loaded, but libtensorflow.so is loaded with tf from python, and OTB will try to load libtensorflow_cc.so, may be somehow bazel treat both libs the same way ?
So the File already exists in database would be the result of protobuf trying to create two instances of tensorflow using different libs, and/or just because both are named tensorflow in memory, even with a _cc suffix the namespace is still tensorflow...

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

The best "quick and dirty" fix for python scripts could imply subprocess or multiprocessing

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

@Pratyush1991 I don't know if that will help you a lot but in fact you can import both tensorflow and every otb applications + PatchesSelection, PatchesExtraction, LabelImageSampleSelection and DensePolygonClassStatistics in the same python script without the protobuf error.
It should occur only when using the TensorFlowModel* and *ClassifierFromDeepFeatures applications. And you can't list every apps because it will load the tf framework.
But since you're probably going to write your image and save your model to disk anyway, it should work with a python script where you just run those things related to tf models with a suprocess, or you'll need 2 scripts. You just can't import tensorflow if you're going to use it also via otbApplication in the same thread.

@remicres
Copy link
Owner

Nearly same issue here

@vijdz
Copy link
Contributor

vijdz commented Jan 19, 2021

Wow, such a dead SO thread...
Well I really think this comment is a good explanation. And it doesn't look like something you can overpass without patching the tf code... But since TF is really low level it may be just a bad idea to load it twice in memory

@remicres remicres reopened this Sep 21, 2022
@remicres
Copy link
Owner

remicres commented Apr 4, 2023

This 3 years old issue will be fixed in the otbtf 4.0.0 release

@vijdz
Copy link
Contributor

vijdz commented Apr 4, 2023

So..what kind of black magic did you invoke to make this work ?

@remicres
Copy link
Owner

remicres commented Apr 4, 2023

TBH I don't know if its the linking I did a bit differently, or moving to Tensorflow 2.12

@remicres
Copy link
Owner

remicres commented Apr 5, 2023

closed with r4.0.0

@remicres remicres closed this as completed Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants