-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Document / Nodes aren't compaction friendly #2578
Comments
Ack. I've reproduced. Let's talk tomorrow. |
Better patch: diff --git a/ext/nokogiri/xml_node.c b/ext/nokogiri/xml_node.c
index 14f1a871..aa2c66cb 100644
--- a/ext/nokogiri/xml_node.c
+++ b/ext/nokogiri/xml_node.c
@@ -24,6 +24,20 @@ static void
_xml_node_mark(xmlNodePtr node)
{
xmlDocPtr doc = node->doc;
+
+ // Mark this node's wrapper object. This is very silly because the mark
+ // callback is only firing because this node has been marked. Marking
+ // ourselves won't cause an infinite loop (the GC handles that), but the
+ // reason we're calling `rb_gc_mark` is to pin the wrapper object.
+ // These nodes are normally marked via an Array stored on the document,
+ // and since Ruby Arrays allow their contents to move, then the wrapper
+ // object was allowed to move too. That means the reference in `_private`
+ // could go bad (since we don't update references yet)
+ // Calling `rb_gc_mark` here will pin the wrapper object. Though really
+ // we should update this object to support compaction and update references
+ // accordingly.
+ rb_gc_mark(node->_private);
+
if (doc->type == XML_DOCUMENT_NODE || doc->type == XML_HTML_DOCUMENT_NODE) {
if (DOC_RUBY_OBJECT_TEST(doc)) {
rb_gc_mark(DOC_RUBY_OBJECT(doc)); Still not ideal, but better than the patch above. I think ideally we should add compaction support. The first step toward that would be switching from |
I've been able to reliably track the 1.13.6 -> 1.13.7 update to a segfault whilst using creek 2.5.3. (We're on CRuby 3.0.3 and Rails 7.0.3. Reproduced on linux and osx). I haven't been able to reproduce it reliably on every run, unfortunately, and the trace is a little different every time. Would you like me to make an issue? Otherwise, I'm just sticking this comment here in case it helps a future googler. |
@bf4 Yes, please open an issue. I also commented on the upstream issue. |
@bf4 a script that will reproduce this, even if intermittently, would be extremely valuable. I know nothing about Creek. |
Please describe the bug
xmlNodePtr
instances keep a reference to their Ruby wrapper inside the_private
member. You can see the_private
member being set to the wrapping Ruby instance here.After it's been wrapped, the instance is optionally stored in an array that is kept on the associated document. You can see the array push here.
Ruby arrays are compaction friendly. When the compactor runs, elements inside the array are allowed to move because the array knows how to update its own references. However, Nokogiri's XML nodes are not compaction friendly. They don't know to update their references. Normally this isn't a problem because objects are required to call
rb_gc_mark
on their references, andrb_gc_mark
will pin the references. ThexmlNodePtr
instance references the Ruby wrapper, but since the Ruby wrapper is allowed to move, the_private
pointer can go bad when compaction runs.Help us reproduce what you're seeing
This script will SEGV because the node objects move, and then we try to reference them.
Expected behavior
It shouldn't segv.
Possible Fixes
I'm not 100% sure how we should go about fixing this. This patch fixes the above code:
But I don't like this patch. It means we iterate the array too many times.
I think a better solution is probably to teach Node instances about compaction so that they can update themselves, but that might be a more involved patch.
The text was updated successfully, but these errors were encountered: