-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak on tf.GraphModel.predict #6937
Comments
@LucasMarianoVieira have you tried with latest tfjs version ? |
@rthadur , yes. |
Hi @rthadur, We have been building an object detection module for the open source Node-RED system, so we can run the detections on a Coral TPU usb stick. That works amazing fast, but we now see that it is leaking memory at the
The Our setup is not quite the same as the one from @LucasMarianoVieira, since we use TfLite in Tfjs which is still in alpha phase. We did not want to start duplicate issues, but don't hesistate to let us know if you want us to create a new issue! Kind regards, |
Hi, When I compare two heap dumps via Chrome developer tools (with 'recording stack trace enabled'), it shows no usable information. It only tells that the memory was allocated before the profiler was started. Since my profiler was already running before I started processing images, I assume that means that the array buffer allocations are happing outside of our NodeJs processes? But that is unfortunately above my paygrade... Thanks!! |
I tried it locally and got constant memory usage, haven't been able to reproduce so far |
I tried again...I'm using Conda to make a virtual environment (but several of the machines we use, I install node directly, no virtual environment involved). I also just updated Node to version 18.12.1, npm version 8.19.2, all running under Ubuntu 18.04. Same code I presented up there, now with Tensorflow JS version 4.1.0, and the memory leak is still there. It's pretty dramatic, in a matter of an hour or two, it ends up using all the computer's memory. |
@ahmedsabie, Do you know perhaps something that we can try, to find the root cause of the leak on our platform (i.e. Raspberry Pi 4). As mentioned above, the delta between two successive heap dumps doesn't contain useful information. @LucasMarianoVieira: I assume you already have tried it, but could an explicit call of |
@LucasMarianoVieira, But I can confirm that I also have a slow memory leak like you have.
Yesterday evening the memory usage on my Raspberry Pi 4 was between 377Mb and 399Mb: After it has been running over a night, the memory usage has now increased: If you only execute the decoding, is that enough to start leaking?? |
@bartbutenaers , oh indeed, I tried With the same result. The memory seems to be leaking from within When I run just |
System information
Describe the current behavior
I'm doing a simple detection task with a TensorFlow saved model (I didn't use the tensorflow_converter tool) that was trained for a given task we have here at the company. I'm not using a GPU. I load the model normally and then load the images into the model for detection of some specific elements in the image. I have tried literally everything I could find. It seems that every time I run the model.predict method (the model being loaded as a tf.GraphModel) the program leaks the memory corresponding to the tensor fed into the method.
The application we have here, makes tens of thousands of detections per day, which causes the program's memory footprint to grow several GB's until it takes all the memory in the computer and crashes. I have already tried the usual methods, like disposing the tensors with tf.dispose, or using tf.tidy to contain code handling tensors, but it's really the model.predict call that is leaking the memory as far as I can tell. Below I give a simple test code I was using that loads a basic model from the TensorFlow Model Zoo.
Describe the expected behavior
The program shouldn't leak memory. So after tens of thousands of detections, the memory footprint should be roughly the same size.
Standalone code to reproduce the issue
Just run the following code with this model SSD ResNet50 V1 FPN 640x640 here with the image avalaible here and the memory footprint of the program will start to increase as it makes more and more detections.
Observation
I tried a walkaround where I converted the model using tensorflow_converter, and it seems it doesn't leak memory when I use the model.executeAsync method, but I can't use that with the retrained model we have here (it's the same resnet, but trained over a dataset as instructions here), because loading it causes an TypeError
Uncaught TypeError: Cannot read properties of undefined (reading 'children')
which I really have no idea why this happens.I thank you for any help, this has been making me lose my mind for weeks already, is there something I'm missing?
The text was updated successfully, but these errors were encountered: