Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible TargetCustom Leak #96

Closed
jeromenelson opened this issue Sep 1, 2020 · 11 comments
Closed

Possible TargetCustom Leak #96

jeromenelson opened this issue Sep 1, 2020 · 11 comments
Labels
question Further information is requested

Comments

@jeromenelson
Copy link

Hello,

I am using VIPS on a ASP.NET Core application, in order to avoid frequent allocation of memory, I am using TargetCustom to directly write the output to a PipeWriter that is attached to the HTTP response stream. This works fine on all normal circumstance. However, when the requests to the app increases and the client starts to disconnect abruptly, there seem to be some issue while disposing TargetCustom.

NetVips: 1.2.4
NetVips.Native: v8.10

Here is the error message I am getting:
Process terminated. A callback was made on a garbage collected delegate of type 'NetVips!NetVips.Internal.VipsSourceCustom+ReadSignal::Invoke'.

This is rough idea of how my code looks:

            PipeWriter writer; // piped to the HTTP response stream
            ...
            using var target = new TargetCustom();
            target.OnWrite += (buffer, length) =>
            {
                var span = new ReadOnlySpan<byte>(buffer, 0, length);
                writer.Write(span);
                return length;
            };
            target.OnFinish += () => writer.Complete();
            _image.WriteToTarget(target

I tried to manually detach the OnWrite and OnFinish delegate, and the error still remains. Is there any work around for this?

Secondly, the same application also seem to have some other issue, occasionally it crashes with:
Segmentation fault (core dumped)

on a windows server it produces:
The program has exited with code -1073741819 (0xc0000005) 'Access violation'.

Unfortunately I cannot find any other stack trace. Since it is a web app, each request can independently resize images in parallel, I am afraid this may be related to multithreading. The method ConcurrencySet is not used, leaving VIPS to automatically choose based on the number of CPUs. The issue also seems to happen when the main memory is overwhelmed, and is not easily reproduced.

When the app is first initialized, I am statically setting these:

            NetVips.Init();
            NetVips.CacheSetMax(0);
            NetVips.CacheSetMaxFiles(0);
            NetVips.CacheSetMaxMem(0);
            NetVips.CacheSetTrace(false);
            NetVips.LeakSet(true);

Thank you so much for taking the time to look into the issues.

-Jerome

kleisauke added a commit that referenced this issue Sep 3, 2020
@kleisauke kleisauke added the triage This issue is being investigated label Sep 3, 2020
@kleisauke
Copy link
Owner

Hi @jeromenelson,

Are you able to provide a minimal, standalone code sample that demonstrates this issue? I couldn't reproduce this with commit 1d497d6 on the issue-96 branch.

You mentioned that TargetCustom might leak, but the error message you posted is from SourceCustom. Do you also use this class (or SourceStream)?

A possible reason for the A callback was made on a garbage collected delegate of type error and the associated segfault is that the SourceCustom class has been disposed too early. Make sure to dispose this class only at the end of the image computation (I actually had a similar problem in this library, see commit: 9f099f7 and issue #58).

@jerome-zenfolio
Copy link

@kleisauke, Thank you for the response. We have a true streaming style setup and I referred to that document for implementation. The input stream is s3 however the output is a HTTP response stream. I totally didn't see the exception is thrown by the SourceCustom, thanks for pointing that out. Since I started noticing the issue after introducing TargetCustom, I just assumed that is the culprit. Your answer has given me tips on how to reproduce this scenario, it very well could be a bug in my code, this is roughly how I am currently handling:

                Stream inputS3Stream = GetS3Stream();
                Stream outputResponseStream = GetHttpResponseStream();

                Image image = Image.ThumbnailStream(inputS3Stream, w, h);
                image.WriteToTarget(..);
                image.Dispose();

                inputS3Stream.Dispose();
                outputResponseStream.Dispose();

I am guessing the problem is that, the inputS3Stream is getting disposed unexpectedly for various reasons (timeouts, network issues etc that I don't have control over) before the image is completely resized and written to the output stream. The same scenario can happen for output stream as well. I will attempt to create a reproducible code to simulate this situation. In the mean time if you have any tips on how to handle this scenario let me know.

As a temporary workaround, instead of writing to HTTP response stream, I am writing to a memory stream. This surprisingly stopped the A callback was made on a garbage collected delegate issue, however I am noticing the Segmentation fault once in a while. Again, thank you for taking the time to look at this.

kleisauke added a commit that referenced this issue Sep 9, 2020
@kleisauke
Copy link
Owner

A reproducible sample will be useful to debug this further. You could try to wrap those variables in a using statement, for example:

@@ -1,9 +1,5 @@
-                Stream inputS3Stream = GetS3Stream();
-                Stream outputResponseStream = GetHttpResponseStream();
+                using Stream inputS3Stream = GetS3Stream();
+                using Stream outputResponseStream = GetHttpResponseStream();
 
-                Image image = Image.ThumbnailStream(inputS3Stream, w, h);
+                using Image image = Image.ThumbnailStream(inputS3Stream, w, h);
                 image.WriteToTarget(..);
-                image.Dispose();
-
-                inputS3Stream.Dispose();
-                outputResponseStream.Dispose();

This ensure that the objects are correctly disposed. I just added an S3 sample, perhaps this could help:
https://github.com/kleisauke/net-vips/blob/issue-96/samples/NetVips.Samples/Samples/ThumbnailS3.cs

I wonder what caused that segmentation fault, are you able to provide a GDB backtrace? To exclude things, I'll try to release the prebuilt binaries of libvips v8.10.1 on NuGet today.

@kleisauke
Copy link
Owner

NetVips.Native v8.10.1 is now available.

(This release contains a fix for libvips/libvips#1792, which might be helpful for this issue)

@jerome-zenfolio
Copy link

@kleisauke, I have tried this week with different test cases and couldn't reproduce consistently, there was one case where VIPS printed a bunch of debug info that I have attached here.

There are other info I should have mentioned in the first place: resize is not the only operation that was performed on an image, but in some cases other enhancements like sharpening, contrast and more importantly PNG overlay is applied with transparency: for the later, it creates multiple VIPS Image objects (main image object, png image object, object after applying opacity, object after creating the composite image etc) I believe all these are not disposed in the right order and it appears like some are disposed too early and some are not properly disposed at all (ex: image = image.Sharpen(..) or image -= 30, overlooked the fact that overwriting doesn't exactly dispose the overwritten image ). In normal circumstances, it worked okay but when there is a memory pressure it caused abrupt termination. I need to dig a bit more into VIPS architecture to see when is the right time to dispose since some of the above mentioned intermediate image objects seem to be accessed when writing the target. Like you pointed out the issue was with disposing - especially VIPS image objects: the solution was to dispose of all these intermediate VIPS objects after writing the target and I have not noticed any segfault or the callback issue.

Although I could not reproduce Segfault, I noticed that my app gets terminated by Kubernates because of OOM. I replaced VIPS with ImageMagick and the occurrence of this crash has reduced however the issue still happens with IM. The Segfault may have been related to K8s restricting memory for the app or it could be because of all the earlier issues that were fixed. Is there any way to set memory limits on VIPS? If you could point me to any articles that give tips on running VIPS in a high throughput scenario, it would greatly help in tuning my app. Thanks again for your time and effort, truly appreciate it! 

@jerome-zenfolio
Copy link

The issue with app crashing is nothing to do with VIPS, there seem to be similar open issues with .NET Core app running in k8s. Once the CPU count was increased and more memory was allocated to the replica, the behavior and performance of the GC became normal and the app worked fine without any reported crashes. Please feel free to close this issue, thanks!

@kleisauke
Copy link
Owner

Sorry for the delay, I should spend more time on replying to issues.

Great to hear that the crashes have been resolved now! I've no experience with Kubernetes, but can provide some performance/security tips for image processing with libvips. These tips are mainly based on my experience with images.weserv.nl, which currently processes 7 million (7×106) images per hour.

  • Avoid using ThumbnailImage, if possible. It can't do any of the shrink-on-load tricks, so it's much, much slower.
  • Consider using a different memory allocator such as jemalloc to prevent memory fragmentation in long-running, multi-threaded, glibc-based Linux processes. See Leak on Linux? lovell/sharp#955 to learn more about memory allocation and why the choice of a memory allocator is important.
  • The libvips' operation cache is not useful for image proxies (i.e. processing many different images). Consider to disable this.
  • Only enable the loading libraries you really need. For example, libvips could open microscope images, which most websites won't need.
  • Always sanity check image dimensions to protect you from decompression bombs like those described at https://www.bamsoftware.com/hacks/deflate.html.

@jerome-zenfolio
Copy link

kleisauke, Thank you for the suggestions!

  • Shrink on load is used for most of the scenarios already, and definitely performs better.
  • Tried jemalloc on my local docker instance and noticed ~300Mi better mem utilization on my mild load test, waiting to get this verified on the production hopefully with a bigger gain.
  • I am allocating a little bit of cache for the operations, hoping that it would help with scenarios where we had to watermark images, so even though different images are processed, expecting about 5% of the images to have the same overlay image. Although I have no data points to prove that this is helping - might as well turn it off.
  • I need to find out how to enable/disable libraries
  • Wow never thought of decompression bombs!

@kleisauke
Copy link
Owner

Commit c021dac ensures images are disposed early throughout the codebase. If you were using the image.NewFromImage() and/or the image.Ifthenelse() operation, I can recommend that you use the nightly version of NetVips, since (temporary) images within these operations were not disposed early.

@kleisauke kleisauke added question Further information is requested and removed triage This issue is being investigated labels Feb 24, 2021
@kleisauke
Copy link
Owner

NetVips v2.0.0 is now available with the above fix included. Thanks for reporting this!

@jerome-zenfolio
Copy link

Thank you Kleisauke & congrats on the 2.0 release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants