-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay breaking out of the parse loop when max_tree_depth is hit #3100
Delay breaking out of the parse loop when max_tree_depth is hit #3100
Conversation
When a token causes a node to be added to the DOM which increases the depth of the DOM to exceed the `max_tree_depth` _and_ the token needs to be reprocessed, memory is leaked. By delaying breaking out of the loop until after the token has been completely handled, this appears to fix the leak. Fixes sparklemotion#3098
@stevecheckoway If you can add a test case (like I did in 84f1706 for example), then the |
OK - more explicit instructions on how to reproduce. Add this to it "libgumbo max depth exceeded" do
html = '<html><body>'
memwatch(__method__) do
begin
Nokogiri::HTML5.parse(html, max_tree_depth: 1)
rescue ArgumentError
end
end
end then run:
which will show you memory utilization (
or to get valgrind to dump more detailed info:
which emits:
|
4d1a082
to
2e26a72
Compare
Thanks for the help with the test! I still need to make sure that without this change the test fails. Somehow none of the Linux machines in my house are set up as dev machines currently, but I'll test in a VM a bit later today. |
@stevecheckoway please consider using the CI infrastructure, we run those memory tests (both flavors of memory leak test) in CI. so you could push the repro to this branch, see it fail in CI, then push the fix to the branch, and see it pass. let me know if I can help at all? (the output I posted above is actually from the repro so I can test it both ways locally very easily) |
With this PR applied, I get
Without it, I get
If this looks good to you, I think we should commit this and then figure out what's causing
when running |
Yes! Ship it. |
Oh good idea. I'll do that in the future. |
When a token causes a node to be added to the DOM which increases the depth of the DOM to exceed the
max_tree_depth
and the token needs to be reprocessed, memory is leaked. By delaying breaking out of the loop until after the token has been completely handled, this appears to fix the leak.Fixes #3098
What problem is this PR intended to solve?
#3098
Have you included adequate test coverage?
No! I don't know how to test this. Help is appreciated here.
Does this change affect the behavior of either the C or the Java implementations?
Fixes a bug in the C implementation.
More investigation is needed to be sure it actually fixes the issue, however.