Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mystery objects deallocated by GC #3616

Open
bobrokrol opened this issue Dec 19, 2024 · 12 comments
Open

Mystery objects deallocated by GC #3616

bobrokrol opened this issue Dec 19, 2024 · 12 comments
Labels

Comments

@bobrokrol
Copy link

bobrokrol commented Dec 19, 2024

Possible duplicate of #1125
AIR 51.1.3.1 (but I think SDK version does not matter, this behaviour is for many years)
So this is test code I test with Scout
debug = false
advanced_telemetry false / true (no matter)
-optimize=true
-swf-version=44

testing on Android phone

[SWF(width="600", height="800", frameRate="120", backgroundColor="0x0")]
public class Main extends Sprite{
    public function Main() {

        stage.frameRate = 120;    
        addEventListener(Event.ENTER_FRAME, onEnterFrame);
    }
    
    private function onEnterFrame(event:Event):void {
    }
}

these screenshots shows grows growth of this mystery objects its little bit more than 1 per frame
so for example for 5 658 frames it deletes about 6k

1st_gc

this screenshot is befor GC, showing ~12k allocated "Actioanscript" objects
before_2nd_gc

In same time scout shows allocation / deallocation of Event object every frame ( I'm not sure if its trustfull)
and they are match
image

so average frame time while app running is 8ms ( 7-9), but during this GC cycle it's about 13ms - 15ms and scout right panel shows decrease in about about 6k objects
they do not represented in "Memory Allocations" panel

image

in real app it looks much worse
so this is drop from 24k to 14k of this mystery objects takes ~ 100ms
I do not create any new instances or "delete" (remove all references) and I waited while all my objects where deallocated according to scout. So this screenshot shows moment when runtime delete only this "mystery objects" and im curious why does it take 100ms+?
Is it depends on overal object tree located in the memory? May be it make sense to flatten objects where its possible?

image

so time is not linear and possibly relate to overal amount of classes / instantiated objects.

So questions is what are theese mystery objects, why are they being created every frame? And may be it make sense to delete them as fast as possible so they do not accumulated to such large chunks?

In my case as I do not create / remove any instances during game loop I would not have any gc invocations but theese unknown every frame "Actionscript" objects just trigger gc every 6k frames ~ 40-50s in my case

@bobrokrol bobrokrol added the Bug label Dec 19, 2024
@2jfw
Copy link

2jfw commented Dec 20, 2024

Possible duplicate of #1125

The mysterious objects are likely the Enter-Frame Events themselves.
Each frame, a new Event is created and dispatched, because you listen to them.

I remember there way a request or consideration to recycle this Event (and maybe other events, too), but still cannot find the ticket

@bobrokrol
Copy link
Author

bobrokrol commented Dec 20, 2024

Possible duplicate of #1125

The mysterious objects are likely the Enter-Frame Events themselves. Each frame, a new Event is created and dispatched, because you listen to them.

I remember there way a request or consideration to recycle this Event (and maybe other events, too), but still cannot find the ticket

There is just empty enterframe listener( take a look on code snipped)

public function Main() {

        stage.frameRate = 120;    
        addEventListener(Event.ENTER_FRAME, onEnterFrame);
    }
    
    private function onEnterFrame(event:Event):void {
    }

Also scout shows that enterframe events where recycled each frame.
so according to scout for 6k frames it creates 6k events recycles 6k and then release extra 6+k mysterious objects in next gc iteration.
Its represented on 3rd screenshot.

I remember there way a request or consideration to recycle this Event (and maybe other events, too), but still cannot find the ticket.

I yes I have seen this before, its pretty similar but I could't find it and thought it was closed as 3 years gone. #1125

By the way on as3 side event object reference value does not change or repeat so it seems there is some kind of pool utilzed.

@ajwfrost
Copy link
Collaborator

Hi

The enterFrame event objects should be being collected via reference counting hence the allocation/deallocation each frame (and yes the memory is just from a pool so there should be minimal costs involved in this, it's not actually allocating and deallocating OS memory..)

The drop down in the memory usage and the time taken for that GC activity means that there is something happening that's using memory that can't be cleaned up from reference counts.. You've got debug=false otherwise I would be wondering if this is something around the profiling/stacktrace information or similar. We can see if it's straightforward to find out what's actually being cleared up during this sweep, as this doesn't sound great/efficient..

thanks

@bobrokrol
Copy link
Author

bobrokrol commented Dec 20, 2024

Hi

The enterFrame event objects should be being collected via reference counting hence the allocation/deallocation each frame (and yes the memory is just from a pool so there should be minimal costs involved in this, it's not actually allocating and deallocating OS memory..)

If its collected or getted from pool its not clear for me why does scout show allocation/deallocation each frame? In my imagination its done somwhere on top level and they should never catched by gc in MMGC and shouldn't pe represented in scout in this case?

The drop down in the memory usage and the time taken for that GC activity means that there is something happening that's using memory that can't be cleaned up from reference counts.. You've got debug=false otherwise I would be wondering if this is something around the profiling/stacktrace information or similar.

No its first I blame. But may be airsdk ignore some flags somwhere so it could be still some profiling but not initiated by me. Also according to #1125 there nothing about scout connected. Also I cannot reproduce with objects created manualy so if I created any instance it would be correctly represented in scout and i would see its allocoation / deallocation

We can see if it's straightforward to find out what's actually being cleared up during this sweep, as this doesn't sound great/efficient..

Ok would be great becouse Im curios how would gcPause() work if somwhere objects created every frame.
#3608

Main question could theese objects be a reason of constant triggering gc ( Im not sure what phase is it)
But on empty project it takes 15ms on hi end phone. On real project it takes 100ms. I do not create/remove any instances and all my garbage was collected, so scout shows only deallocation of this mystery objects.

Im not sure what can take such large time.
Is it mark phase of Mark-and-Sweep where it must iterate all over object tree?

Does gc phases always triggered by instances counter / memory allocated or there also time based triggering?

@ajwfrost
Copy link
Collaborator

Hmmm... so I think we can see part of the problem here. Every frame, the runtime goes through to find out if there are event listeners, and it does this by pulling together a list of the listeners that then need to be called. The list implementation has an initial capacity of 4 items, so it looks like every frame, a small bit of memory is being set aside for pointers to listeners for various events (enterFrame, but also the other ones like render, frameConstructed, exitFrame etc).

The problem then is that when you leave the function, the list object goes out of scope, but the memory that it reserved for the listeners is not explicitly cleaned up, instead it's left for GC. See this code:
https://github.com/adobe/avmplus/blob/858d034a3bd3a54d9b70909386435cf4aec81d21/core/avmplusList.h#L245

For now, we just clear the data member, and allow GC to reclaim the buffer.

The objects that were stored in the list are being cleaned up properly, with reference counts going back to what they were previously (see clearRange implementation). But the list memory itself is being left, which is odd given that it's been cleared...

We can check whether there's anything we can do here within avmplus.. there's the possibility of course that other problems may arise from trying to change this! And equally, there may be other similar issues that mean memory doesn't run particularly cleanly..
We can try to minimise this sort of churn in other ways too perhaps, looking at the runtime-specific code and how it's dealing with these event listener lists..

thanks

@ajwfrost
Copy link
Collaborator

Just seen your other questions:

If its collected or getted from pool its not clear for me why does scout show allocation/deallocation each frame?

The memory comes from a pool but this is all counted as an allocation/deallocation. The Flash/AIR runtimes have this "memory management / garbage collection" component which is configured to allocate large chunks of memory from the OS every so often, and then manage that memory themselves (as least, for normal/frequently-used memory - larger allocations can still go direct to an OS call). So there are pools of memory for objects of varying sizes etc.

how would gcPause() work if somewhere objects created every frame.

It would mean that the memory continues to increase .... and would never hit the drop caused by the Sweep phase.

Im not sure what can take such large time.
Is it mark phase of Mark-and-Sweep where it must iterate all over object tree?

I'm also not completely sure why this takes quite so long, except perhaps where there are loads and loads of tiny objects that all have to have a destructor called, etc. Plus the marking is mostly incremental (and yes it marks from the roots to show what objects are reachable), it's just the final part where it has to then mark the stack and some other immediate values when it goes into a "stop-the-world" phase to finish the marking and do the collection.

Does gc phases always triggered by instances counter / memory allocated or there also time based triggering?

It's solely memory-based, no time-based trigger.

thanks

@ajwfrost
Copy link
Collaborator

Here's something that might make a difference, bizarrely... something I noticed when looking at the timings (and excluding the GC bit)

image

There's a line item for "Handling LocalConnection traffic" .. but this is just a basic/empty SWF.
And then I remembered that the way the Adobe bootstrap works, in terms of multiple launches of an AIR app and the forwarding of invoke information to the primary instance .. it creates a localconnection to see if there is already a running application.

So adding this to the top-level <application> element in my app descriptor file:

<allowMultipleInstances>true</allowMultipleInstances>

and then I don't get the LocalConnection manager object created, no more of those time-wasting calls, and I'm pretty sure a lower use of memory. This is on desktop fyi, I don't know that the Android start-up code does this.

Having said that, we are still seeing memory usage increasing over time still. Fairly slowly, but it's still there. Normally this would be lost in the noise of an actual/useful application, but I guess it would be better if we could be more efficient!

We can add a bit of reuse for the enterFrame event, makes a small amount of difference.. we had been wondering about also trying to bring in better memory profiling tools, which would then help with analysing the bits that are left here!

thanks

@bobrokrol
Copy link
Author

bobrokrol commented Dec 21, 2024

Thanks for shedding some light on GC's work.
I always imagined that there is some kind of reference counting mechanism so it doesn't scan all heap on final stage.
So I always tried to cache a lot of objects so object graph just grows and I belive it gives extra overhead on garbge collection at final phase

there is some simple test where I call Sytem.gc() every 4th frame.
it takes: 8-9 ms on Android

then I creted object tree with ~64k nodes that always lie in memory
createLargeTree(3, largeTree);
and now it takes it takes 24-46ms

image

[SWF(width="600", height="800", frameRate="120", backgroundColor="0xFFFFFF")]
public class Main2 extends Sprite{


    var instCountTxt:TextField;
    var instCount:int = 0;
    public function Main2() {
        addEventListener(Event.ENTER_FRAME, onEnterFrame);
        createLargeTree(3, largeTree);

        instCountTxt = new TextField();
        instCountTxt.width = 200;
        addChild(instCountTxt)
        instCountTxt.text = "instCount: " + instCount;


    }

    // just create an object tree, where n is depth
    // and about n^k nodes
    var largeTree:Object = new Object();
    function createLargeTree(n:int, parentObj:Object):void
    {
        // number of objects per level
        var k:int = 40;
        if (!parentObj.childs)
        {
            parentObj.childs = new Array();
        }
        if (n > 0)
        {
            var arr:Array = parentObj.childs as Array;
            while (k-- > 0)
            {
                var newObj:Object = new Object();
                instCount++;
                // just add some fields of various types
                newObj.parent = parentObj;
                newObj.prop1 = new Array();
                newObj.prop2 = "n : " + n;
                arr.push(newObj);
                createLargeTree(n-1, newObj);

            }
            return;
        } else
        {
            return;
        }

    }

    private function onEnterFrame(event:Event):void
    {
        System.gc();
    }
}
}

Just seen your other questions:

If its collected or getted from pool its not clear for me why does scout show allocation/deallocation each frame?

The memory comes from a pool but this is all counted as an allocation/deallocation. The Flash/AIR runtimes have this
memory management / garbage collection" component which is configured to allocate large chunks of memory from the OS every so often, and then manage that memory themselves (as least, for normal/frequently-used memory - larger allocations can still go direct to an OS call). So there are pools of memory for objects of varying sizes etc.

What I can see there is no problem right now with allocation / deallocation big chunks but with deallocation of tons this "event objects" and in final marking. So when you I cache or use pools in my app it seems I just make it worse for GC.
Object graph just grows and then final marking time take a lot of time. Just a guess I dont know.

how would gcPause() work if somewhere objects created every frame.

It would mean that the memory continues to increase .... and would never hit the drop caused by the Sweep phase.

anyway iven if this "event objects" problem cannot be solved, than having possibility time to delay garbage collection will be a great improvment
even if it takes 500ms+ it can be done by desighn in the right place.

Also Im curious if its possible to have something like GC::Mark(n) GC::Sweep(n)
where n just maximum number of elemnts to process, possibly just a counter to break while loops if it make sense

to let us call it before final mark&sweep with to help gc with GC::FinishIncrementalMark

We can check whether there's anything we can do here within avmplus.. there's the possibility of course that other problems may arise from trying to change this!
yes Its better do not touch if it was made by desighn, so I think just add more control over GC to developers will be great improvement and let us take the risks

@bobrokrol
Copy link
Author

Here's something that might make a difference, bizarrely... something I noticed when looking at the timings (and excluding the
true
thanks

Unfortunately I do not see any difference on Android
on empty app

[SWF(width="600", height="800", frameRate="120", backgroundColor="0xFFFFFF")]
public class MainEmpty extends Sprite{

    public function MainEmpty() {
    }
}
}

I see ~1kb+ of "Actionscript Objects" every frame an then drop from ~10k to 4k

image

@ajwfrost
Copy link
Collaborator

I always imagined that there is some kind of reference counting mechanism so it doesn't scan all heap on final stage.

Yes, there's reference counting for objects which is used when possible, but there are conditions where that isn't able to handle the scenarios so the objects are switched over to being mark-and-sweep collected. Objects with a reference count of zero are put onto the "zero count table" and cleaned up each frame.

Object graph just grows and then final marking time take a lot of time. Just a guess I dont know.

The 'heap' marking is done gradually ("incremental" marking) so that shouldn't impact things much. And if there are a lot of (reachable) objects, then this phase takes longer to complete, but again it's a small/limited amount of time used up each frame to do all the marking.

The thing that seems to use the most time in the "finish incremental marking" stage is the marking of objects that are on the stack. The GC basically goes back through the call stack, looking for any value in stack memory that is a pointer to an object that's in the GC-controlled memory range, and marking it. This basically picks up all the local variables that are in the AS3 call stack, but if you had a reference in a local variable to a complex object or vector, then the GC will have to follow through all those references and relationships and do all the extra marking. This can't be incremental because the call stack would change when the AS3 function returns..

Also Im curious if its possible to have something like GC::Mark(n) GC::Sweep(n)
where n just maximum number of elements to process, possibly just a counter to break while loops if it make sense

Trying to do a gradual Sweep is something we wondered about, it's deliberately done though without any pausing. I think this is because there may be related objects that need to be cleaned up at the same time, where if one was cleaned up but the processing was then allowed to resume, the second one could end up calling something that no longer existed. And I think the same challenge would happen if we tried to only to a partial marking phase, i.e. smaller amounts of marking and sweeping, just because of the nature of how the GC works.

Some interesting insights here though (it's always useful to see how other folk address these kinds of issues!)
See the "What’s new in Onirocco" section: https://medium.com/voodoo-engineering/nodejs-internals-v8-garbage-collector-a6eca82540ec
The first thing there (incremental marking) is done; the "lazy sweeping" is similar to what we're proposing with the ability for the developer to decide when the sweep should happen. But concurrent/parallel sweeping is going to be the ideal... in avmplus, the GC is very definitely single-threaded, so this would be a big change.. but being able to have all the GC running in parallel to the AS3 processing thread would just make this problem go away!

I think we may need to review their source code changes to see how they get round the problems of memory accesses in one thread whilst the other thread is changing the memory. There would be a lot of challenges with this multi-threaded approach I think!

thanks

@bobrokrol
Copy link
Author

bobrokrol commented Dec 23, 2024

The thing that seems to use the most time in the "finish incremental marking" stage is the marking of objects that are on the stack. The GC basically goes back through the call stack, looking for any value in stack memory that is a pointer to an object that's in the GC-controlled memory range, and marking it. This basically picks up all the local variables that are in the AS3 call stack, but if you had a reference in a local variable to a complex object or vector, then the GC will have to follow through all those references and relationships and do all the extra marking. This can't be incremental because the call stack would change when the AS3 function returns..

Its not clear why would it mark such complex object twise. This complex object already had somwhere reference and should have been marked anyway.
But I understand that large amount of functions it could be an issue itself

Anyway most of the effort is to delete theese "event objects" . If it was ok with 24fps long time ago, but with 120fps its allocation of about tan 4.5mb / minute just with empty project, and as a reason constant triggering GC, yes there is also overahead possibly for scanning stack as you said but it takes reasonable time in my case. Mb does not matter but it seems tens of objects or hundreds of pointers allocated every frame

Trying to do a gradual Sweep is something we wondered about, it's deliberately done though without any pausing. I think this is because there may be related objects that need to be cleaned up at the same time, where if one was cleaned up but the processing was then allowed to resume, the second one could end up calling something that no longer existed. And I think the same challenge would happen if we tried to only to a partial marking phase, i.e. smaller amounts of marking and sweeping, just because of the nature of how the GC works.

Some interesting insights here though (it's always useful to see how other folk address these kinds of issues!) See the "What’s new in Onirocco" section: https://medium.com/voodoo-engineering/nodejs-internals-v8-garbage-collector-a6eca82540ec The first thing there (incremental marking) is done; the "lazy sweeping" is similar to what we're proposing with the ability for the developer to decide when the sweep should happen. But concurrent/parallel sweeping is going to be the ideal... in avmplus, the GC is very definitely single-threaded, so this would be a big change.. but being able to have all the GC running in parallel to the AS3 processing thread would just make this problem go away!

I think we may need to review their source code changes to see how they get round the problems of memory accesses in one thread whilst the other thread is changing the memory. There would be a lot of challenges with this multi-threaded approach I think!

thanks

Well its sounds interesting but Im not sure that humanity still exist when this approach could be made and become stable.

So in general as I understand this everyframe allocation is made by desighn and there is no easy way to fix it?

Is there any chance to get in near feauture just pause / delay possibility for GC or may be just way to modify parameters in GCPolicyManager?
thank you!

@ajwfrost
Copy link
Collaborator

Its not clear why would it mark such complex object twise. This complex object already had somwhere reference and should have been marked anyway.

True, if the object had another reference that could be reached from a 'root' then it would already be marked and then it wouldn't need to be processed again. But any object that's only referenced via a local variable would not have been marked and needs to be, in order to avoid being collected. In your sample code above where you create a large tree, I think this would be okay because the root of the tree is referenced within the class object.

One of our tasks would be to help people get some better metrics though about what's taking time here .. plus we had looked at delaying the "sweep" part until the application code exited whatever event handler it was in, i.e. there is then no ActionScript stack which should make that part a bit quicker. Sadly this seemed to make the performance a lot worse .. but this may just be because of how these test cases are always very artificial; it's on the list to investigate properly...

this everyframe allocation is made by desighn and there is no easy way to fix it

We've actually just been playing with this a little:

  1. For all these events that are sent per-frame where a list object had been created - we are first testing to see if there are any listeners, before creating the list object to hold them
  2. For the enterFrame event, we are caching the event (one per stage) and then manually setting/updating the target property before dispatching it to each listener.

So those should definitely reduce the churn here. We'll put this into 51.2 which will come out (at least as a pre-release version) in January..

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants