-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context is preserved between two runPromise calls #63
Comments
I'm seeing this as well. I store a transaction client via set("client", client). Client is released at end of transaction but in a subsequent runPromise call this client remains assigned to that property key, and throws an exception because an attempt is made to release the same client twice. |
I think I'm seeing this in |
I'm dealing with something like this. I have a system that receives HTTP requests and also subscribes to Pubsub topics, it creates a new context for each HTTP request (with Each transaction is supposed to generate a new transactionId (UUID), unless it is sent over by headers or in the message metadata, in that case a new context is created with but with the received transactionId, so we can trace the related transactions over our microservices. The problem is we're seeing the same transactionId being repeated a lot across thousands of different unrelated requests, and we can only assume it's an issue with cls-hooked (and not the uuid generation lib). I don't know if it's related to this issue, because I'm not using the |
Just come across this issue, too. Changed to using |
Have a look at the last comments from me (DavidShard) to the first answer https://stackoverflow.com/questions/66665783/memory-leak-in-cls-hooked-or-wrong-use This is a memory leak bug in cls-hooked that can be fixed by:
The effect is that 900k await ns.runPromise() only leave 5000 contexts in memory. Before node --max-old-space-size=100 was only able to execute 100k of them before the GC killed node. Only: let context = {...this.active }; does not solve the memory leak alone. The test code (ns_simple.js):
With node 14.15.5 time node --max-old-space-size=10 ns_simple.js
With the fix:
|
I think the problem is that Map() creates a cyclic dependency between the map and the context. So releasing the context in this.exit() will never trigger the async_hook destroy to remove the context from the map. Thats what I think. Such a context object looks like:
The cyclic dependency is the map property of a context that points to this._context in the namespace. |
Let's simplify this, try to rerun destroy(asyncId) {
currentUid = async_hooks.executionAsyncId();
if (DEBUG_CLS_HOOKED) {
const triggerId = async_hooks.triggerAsyncId();
const indentStr = ' '.repeat(namespace._indent < 0 ? 0 : namespace._indent);
debug2(`${indentStr}DESTROY (${name}) currentUid:${currentUid} asyncId:${asyncId} triggerId:${triggerId} active:${util.inspect(namespace.active, {showHidden:true, depth:2, colors:true})} context:${util.inspect(namespace._contexts.get(currentUid))}`);
}
debug1(`deleteHookRan here`) // Add this line, so you can see when the destroy hook is called
namespace._contexts.delete(asyncId);
}
});
// As you can't console log inside async hook
function debug1(...args) {
process._rawDebug(`${util.format(...args)}`);
} There is a big difference when you do await version will not fail, while the one without await will fail. This is because I still don't see how this is a memory leak issue, as you're basically overwhelming the process in a way that wasn't supposed to be used. |
@SimonX200 for (let i = 0; i < 90000000; i++) {
const context = await session.run(t2);
console.log(context);
// { _ns_name: 'benchmark', id: 5 }
// { _ns_name: 'benchmark', id: 11 }
// { _ns_name: 'benchmark', id: 14 }
// const context = session.run(t2);
// console.log(context);
// { _ns_name: 'benchmark', id: 1 }
// { _ns_name: 'benchmark', id: 1 }
// { _ns_name: 'benchmark', id: 1 }
if (i % 1000 === 0) {
const now = Date.now();
console.log(` ${i + 1}, dur=${now - start}ms`);
start = now;
}
} |
@Darkripper214 That the destroy never happens when using async but never an await is fine for me. I find it interesting that Map() creates a back-link to itself to all added objects. I find it also interesting that Object.create(previous_context) creates such a deep object. And that the deeper the object becomes, the longer Object.create(...) takes. Note:
Change your loop: and it works again. So any unrelated await will allow destroy to work. |
Yeap, there will be 2 promise generated when there is async func (Can check it via The point is that it is never a memory leak issue, it was that the server was flooded before it can even |
@Darkripper214 Why is it fixed by just changing Map to plain Object and by not using Object.create()? This does not change anything else except the extensive linking of context's (from my point of view: unnecessary. The _contexts is the retainer or the context is on some arguments stack and retained by that. No one access the previous context through the current context). |
My changes did not fix that yet. I see the problem in exit(). this.active is only changed if it is equal to the context to exit. Update: The problem is the management uses asyncId as 'global uniq' while asyncId is not. Update2: I think the use of asyncId as global unique in cls-hooked is a wrong use. In nodejs I see:
This guaranties no uniqueness. It creates a simple sequence of number. I assume that the asyncId is not supposed to be used as global unique by node. @Darkripper214 We have seen that. Some deep low-level methods use httpContext.get() and they got something while running outside of any context. It was expected that undefined is returned. That's why we even put all our timers and intervals into a ns.runPromise to be able to set the httpContext values explicitly to nulls. So multiple ongoing http requests can corrupt other http request httpContext.get() This is a major issue |
It is definitely not a memory leak issue, as You might want to call it as
I will take a look on the cyclic referencing of But still, the use case for this is to enable the passing of context around when the process is running, why is an auto-increment of |
Because it repeats. When debugging the example code from the issue-starter and break in before() I see duplicate asyncId's. In the second ns.runPromise() I see the same asyncId's as in the before of the first ns.runPromise() Either the async_id_fields get reset, or something else is happening. I have not yet found the code in the node repo that could cause that. Update: |
if(namespace.active) {
namespace._contexts.set(asyncId, namespace.active); This is to save the current context to be retrieved when
currentUid = async_hooks.executionAsyncId(); \\ This will give you current execution context
//HACK to work with promises until they are fixed in node > 8.1.1
context = namespace._contexts.get(asyncId) || namespace._contexts.get(currentUid);
// Here, we either get our specific context from `asyncId` or just take the parent from `currentUid` So the erratic behavior that is being reported is actually caused by the way If you look at the When it is not cleared, then the context will be passed as a let context = Object.create(this.active ? this.active : Object.prototype); The core issue with this bug is actually with the feature of // let context = Object.create(this.active ? this.active : Object.prototype);
let context = {} \\ This should work All in all, I think this bug is rather specific, it should only occur when there are multiple @Jeff-Lewis May I know your opinion on this? |
@Darkripper214 My assumption that different running ns.run could share the same context was wrong. To test my assumption I extended the issue-starter's code to multiple ongoing ns.run. With that the context is never shared. Not even at first.
Now regarding my problem: To guarantee that our project core functions get proper values from httpContext.get() when execute outside of http request we use the await httpContext.ns.runPromise(async () => { httpContext.set(...); await actual_function(); }) We have 3 10 seconds interval timer and other minute intervals running. These functions need like 1-2 seconds until their data were returned from database and were processed.
Without the changes I posted above the heap grows by 24MBytes per day with only one http request every few minutes. After 7 hours there are already the mentioned 60k contexts and about 8MBytes. Not sure where the 60k contexts come from. Every 10s for 7h should result in 3x2520 await ns.runPromise() calls. Update: Update2: My mistake. Its not 10ms but 25s. So the 60k contexts in 7h are still not clear. Update3: The applicationinsights uses cls-hooked as well. The timers all do db operations so the http requests to the db are counted by applicationinsights Since very new createContext gets slower by the number of previous createContext calls the whole applications gets slower. This does not only affect the timers but also every ns.run by a http request. For us the increase of 24MBytes per day for nothing is a major issue because our webapp runs continuously as Azure WebApp with limited memory. We already increase the memory from 1.7G to 3G by doubling the Azure costs. We have also noticed that our webservice becomes slower and slower the longer it runs. We investigated in searching for memory leaks in our software and to optimize our code. The current solution is that the 10 seconds timers are not using any express-http-context and to use just one httpContext.set() with an object that stores the needed data. The references inside that object we reset after every http request and timer before we set the httpContext key to null itself. Fortunately the 10 seconds timers actually it didn't needed a httpContext. Here we were lucky and our application is already running much smoother and without a memory leak. The minute timers still have to use httpContext. Update Instead of 24MByte per day now 589kByte. The performance degeneration of the mentioned 2.2ms (on a 5950x at 4.9GHz (single thread), Azure WebApps are running on some old Xeon, so multiply that by 10) per ns.run is reached after 15 days. After 30 days each ns.run will need at least 11ms (on a 5950x at 4.9GHz)
Btw: it makes no difference if I call an explicit global.gc() and do sleeps between await ns.runPromise(). Its just about the number of createContext calls. And there are no overlapping ns.runPromise(). The code is already in a comment above. I understand know why WebApp Instances are usually restarted after a time. In Azure Webapps you have to activate "always on" explicitly. Update2: The 10ms timer was a 25s timer. Got lost in our project code Update3: We are using applicationInsights and that uses cls-hooked to track also http-requests we send. That are the database queries. So the number of 140k contexts is reached much earlier than 30 days. We cannot deactivate applicationInsights as it is essential for our platform monitoring. The memory loss of 24MByte in at least 15 days may not be critical. The performance drop however is absolut critical. We probably have to restart our WebApp Instances once per day |
Can you show how your team is doing the following?
I still don't get it how Nonetheless, I'm thinking that the |
@Darkripper214 The following most simple code shows the degradation perfectly.
This is the most basic use of ns.runPromise() with the most basic function (that shows the problem) being called. You can play around with global.gc() or longer/more sleeps without any change. In the end a single ns.runPromise() needs 7.8ms for doing nothing on a 5950x.
On a Xeon E5-2630v3 a single ns.runPromise() needs 21ms
Note: |
@Darkripper214 The _set is actually a _stack. Update: Deleted the stuff regarding async_hook. Everything is fine |
I still cant reproduce the result on Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.60 GHz. Performance-wise, it shouldn't vary that much. I even remove the This is the result I got.
Yeah, I know that The implementation is sort of messy especially when there is any complex async operation (Try opening a socket or do a fetch with async_hook and you will see). The other way to fix this might be to limit the |
@Darkripper214
node -v I see the same behavior also with node v14.16.0 Update: I also did it for the Xeon test on a complete new environment with fresh installed node. Update2: Update3: Update4: |
The only thing that must be changed for fix performance and memory leak is:
I added a memory test and a performance test to the mocha tests. However I cannot push the branch to open a pull request. |
@SimonX200 I assure you that by replacing As far as I'm concern, there are two issues with the current implementation
|
@Darkripper214 I prefer minimal changes to prevent side effects. Also the performance test runs consistently 100ms (of 16850ms) or 0.6% faster with Map than with Object. Map however needs a few percent more memory. I have cloned the repo and pushed my change into that one. Have a look at the tests memory-leak.test.js and performance-leak.test.js. The memory-leak test requires a max-old-space-size of at most 25M which is set in package.json for npm test. https://github.com/SimonX200/cls-hooked Also I have tried node v15.0.1 and I still see the performance problem when running the performance tests against the original code. I use nvm to install and switch node versions. But I have tried it also with plain node 14 lts installations and a development free laptop and on a linux-vm with a fresh node 14.16. When I enable the console.log in the performance-leak.test.js and the context.js from the master then I see:
The same I see in the linux-vm. BTW: Was just running the performance test in a wsl2 debian linux on windows and there the performance test with my code completes within 2500ms (for 100000 ns.runPromise). In windows on the same PC it needs 16800ms. The Object is still consistently 100ms slower than the Map (2.6s against 2.5s). Looks like a fixed time offset. Update: When changing my t2 function from
to
or
Then there is no performance problem. I don't know why a used undefined argument has this effect but this is consistent as well. Update2: It happens always when the function called through ns.runPromise has an optional argument which is used somewhere.
Most likely something is traversing the 10000 links from the current context to the first one that were established by Object.create(this.active) Interesting that the id difference is always 4. Update3: Ok, its clear.
That this is the case because node must determine if ns.active has a method toNumber() (or so). And because the current context is linked to almost all previous context's it takes some time to traversal through all proto . Interesting: With a function
The distance between the contexts is eratic. And with my fix the context is still copied from the current active context (btw: still shallow copy as before) but without the proto link to the context object. And that is the reason why also the memory leak is fixed. Because the current context retains all previous contexts. The _contexts map however shrinks and grows normally also with the original code. So the map was not the cause of the leak. Update4: The deep linking of the current active context to most of its predecessors is also a problem for a ns.get('some-unset-key') because Object.create() does not copy the attributes but links the objects. At runtime this means that a property is searched through the whole chain. A httpContext.get('optional-value') will take 20ms on a Xeon after e.g. 100k ns.runPromise() (about 25k contexts are linked).
Fortunately we already set all our httpContext variables upfront (and reset them at the request end) and now with just one ns.set() And since ns.active is a global it is random what context is copied when there are multiple active contexts used in asynchronous processing of e.g. multiple http-requests. |
@SimonX200 You're right on the Replacing This would make the context more flat and would solve the case where deep traversal is hurting performance. But would this solve the memory issue you're facing in your project? |
@Darkripper214
Yes, it does. The memory-leak.test.js (executed with --max-old-space-size=20) shows that too. Because ns.active is random when not being inside an async function (which is protected by async_hook) also the sync ns.run() in the express-http-context middleware could increase the depths of the contexts. BTW: |
Also ns.runPromise always leaves an ns.active != null
The while loop never ends. |
For our project we have decided to switch to plain AsyncLocalStorage. It provides everything we need and should be stable. Once I verified that is it stable I will suggest to express-http-context to switch to AsyncLocalStorage as express-http-context is what currently most webserver use. |
You can take a look at this My aim is to implement context library with consistent implementation with AsyncLocalStorage. You can take a look at the test file. I'm planning to add a different flavor of cls-hooked that is more in-line with AsyncLocalStorage, with all the bindemitter stuff for AsyncLocalStorage to work with EE. |
@Darkripper214 While investigating more into AsyncLocalStorage I came across this: We too have countless async that are not really waiting. I found a simple and elegant way to keep a local storage using the V8 call stack-trace by constructing a function inside run() with the context id being coded into its name. No async_hooks and so only the performance of set()/get() is impacted and that will not impact the project performance. Very few code lines and works out-of-the-box. The only theoretical problem is when there are dangling promises still running after the outer run has finished(). |
any update on this issue, or it this repo still maintained? |
Seems to be not maintained anymore. We switched to native AsyncLocalStorage. Recently I was considering to open an issue on express-http-context to notify the community about the memory leak of web servers that are not restarted periodically and the wrong context get's of generic project functions outside of http-requests. Both is caused by this issue. |
I'm running into this issue too with nested |
It works, thank you a lot! |
Running the command in sequence preserves the context. I would expect that each call would create a clean context, like is the case with
run
Is that normal behavior?
The text was updated successfully, but these errors were encountered: