Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsubscribing from server node in UA Expert / UPC UA Client causes segmentation fault #125

Open
LeanderGlanda opened this issue Jul 8, 2024 · 8 comments
Assignees

Comments

@LeanderGlanda
Copy link
Contributor

I just found a bug, which is either caused by the open62541 server itself or this crate. As the segmentation fault happens in the crate's code, this is probably the right place.

How to reproduce the error?

  • Run the server example and connect to it with an UPC UA Client, e.g. UA Expert. (I used it)
  • Subscribe to the "the answer" node.
  • Unsubscribe from it
  • Segmentation fault

If you add multiple subscriptions to the node, then only the last removal will cause the segmentation fault.

I already spent a while debugging the issue. The segmentation fault is caused in src/Server.rs > build > destructor_c. The log message will be printed with the node id, and then the segmentation fault happens, likely by the NodeContext::consume(node_context) call.

In the attached log, you can see that before the Rust code runs, the DeleteSubscriptionsRequest gets processed. I looked in the open62541 source and found out, that the memory pointed to by the node pointer gets freed, but it seems that the pointer isn't set to null. This probably causes the Rust code to think that the pointer isn't null and then it tries to free the memory a second time.

Temporary solution: Remove the NodeContext::consume(...). This gets rid of the segmentation fault, but may cause a memory leak somewhere else.

important_part_of_the_log.txt

@sgoll sgoll self-assigned this Jul 9, 2024
@LeanderGlanda
Copy link
Contributor Author

The destructor_c() in src/server.rs->build() seems to be the problem.
When debugging, I saw the following: SIGNAL SIGSEGV: ADDRESS NOT MAPPED TO OBJECT (FAULT ADDRESS: 0X0)
To reproduce, probably just run an example with a debugger attached.

Just wanted to add this as I just saw it during debugging my own changes.

@sgoll
Copy link
Contributor

sgoll commented Jul 15, 2024

Thank you, I was able to reproduce the issue. I'll start working on a fix.

@sgoll
Copy link
Contributor

sgoll commented Jul 15, 2024

It seems that our destructor callback is called for unrelated nodes that have node_context set. I was under the impression that only explicitly constructed nodes with a custom node_context (e.g. UA_Server_addDataSourceVariableNode()) get this argument set in the destructor. I'll have to investigate further.

@sgoll
Copy link
Contributor

sgoll commented Jul 16, 2024

For reference, I couldn't figure out where the unexpected node_context was coming from. Along with many nodes that are part of the default server node tree (Types/, Views/, Objects/Server/), there are nodes that are created and destroyed when client subscriptions are established and torn down. These carry unexpected node_context but I do not know if there is a way to influence this or identify those nodes some other way.

See open62541/open62541#6598.

uklotzde pushed a commit that referenced this issue Jul 18, 2024
## Description

This PR is a workaround for #125 to prevent segfaults. See that issue
for more details and further investigations to properly solve the
underlying problem.

The changes leak memory: associated data for created nodes is not freed.
As long as the server's node tree is static, this is not a problem. But
we must be aware that repeated node creation and destruction during the
server's runtime will fill up the available heap memory.
@mati865
Copy link

mati865 commented Jul 29, 2024

I think I can help you a bit with the cause of this issue. In our case we are creating the server with function UA_Server_newWithConfig and I can see you do it the same way.
So the problem is: when you create server's configuration on the stack and call UA_Server_newWithConfig you'd expect config struct to be safe to drop at the end of the scope (that would be quite sensible, wouldn't it?). Then to your surprise a wild segfault occurs when a client opens and closes a connection (at least in our case, here OP apparently has to subscribe and unsubscribe).
The problems begin when you call UA_Server_newWithConfig but remain hidden from the sight. This function will perform full copy of the configuration (including pointers) but won't update all the pointers as it can be seen here: open62541/open62541#5778 or open62541/open62541#5675 (there are more cases). Now pair it with the fact the configuration structure contains pointers to itself and you get dangling pointers that turn into use-after-free as soon as you drop config structure from the stack.

In our case (with home-grown bindings to open62541) it was easier to notice because we are using original open62541 logger and I observed logs being present in the code but not in the output, also the crash in our case was happening in the logger. So I started with debugging the logger and found missing logs to be called with context pointing to NULL (UA_Server_newWithConfig zeroes original config after reading it) and the crashing log invocation with context pointer that surely wasn't pointing to logger context.

Once I realised this must be use after free, I allocated the config on the heap and leaked it (Box::leak) and unsurprisingly the server is no longer crashing. Some of the logs that are called with context pointing to the original config allocation are still absent but they weren't important anyway.

I suppose your issue might have similar cause, maybe it's dangling pointer from config as well? In that case leaking the config and reverting #137 shouldn't crash. If it still does that will likely be a different dangling pointer.

@sgoll
Copy link
Contributor

sgoll commented Jul 29, 2024

@mati865 Thank you for your input, it is greatly appreciated.

I gave your suggestion a try; I reactivated the call to NodeContext::consume() that was disabled in #137, and I leaked the config onto the heap using Box::new() Box::leak() before passing it to UA_Server_newWithConfig().

Unfortunately, it did not solve the issue. I had looked deeper into the issue; it turns out we're trying to free pointers that we have not put into the node_context in the first place. This was confirmed in open62541/open62541#6598 (comment).

So this is not so much a matter of dangling pointers, but trying to use data (and interpret it as a pointer to a data structure) that does not match our expectations.

We will still need to keep an eye open for the missing deep copies in UA_ServerConfig that you mentioned in your post.1

Footnotes

  1. For now, we don't use the data structures that are involved, though.

@mati865
Copy link

mati865 commented Jul 29, 2024

I leaked the config onto the heap using Box::new()

You probably meant Box::leak() as Box::new() doesn't leak the memory.

Anyway, good to know. I think you might encounter issues with dangling pointers from the config if you configure security and certificates, but I debugged it a long time ago, so I don't remember the details.

@sgoll
Copy link
Contributor

sgoll commented Jul 30, 2024

You probably meant Box::leak() as Box::new() doesn't leak the memory.

Yes, obviously. Sorry for the misleading typo 😓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants