Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlxPoint pooling harms performance? #1189

Closed
JoeCreates opened this issue Jun 22, 2014 · 23 comments
Closed

FlxPoint pooling harms performance? #1189

JoeCreates opened this issue Jun 22, 2014 · 23 comments

Comments

@JoeCreates
Copy link
Member

I thought most GCs will reuse memory when possible, so the new keyword isn't nearly so terrible as in say, regular C++. With a little (crude) testing, it looks like the pooling of FlxPoints tends to actually harm performance, rather than to help it. The performance impact is evident on flash, but even more so on C++.

Have a play with this simple test on different platforms: http://hastebin.com/yanekehuko.scala

It is quite difficult to push it to the point where it makes much of a difference, but when it eventually does make a difference, it doesn't seem to be in favor of the pooling.

@MSGhero
Copy link
Member

MSGhero commented Jun 22, 2014

A better and more consistent way to test is to put the var point:Point outside of the loop, create the initial pool point outside of the loop, and separate the if/else into 2 separate code blocks to test them separately.

It's possible that pool.push/pop() and obj.destroy() are slowing down the pool implementation. The way I learned pools was that you had a counter managing valid array elements. return pool[--count] for get():T and pool[count++] = t for put(t:T). Prealloc would add to count. destroy() probably isn't avoidable.

@JoeCreates
Copy link
Member Author

@MSGhero I have moved the if and the declaration, and it makes no noticeable difference on any platform.

I do think the implementation of pooling is potentially poor, because if there are any dynamic arrays growing and shrinking, that straight-up undoes any of the work for preventing memory reallocation.

I think, until there is a demonstrable problem to fix, we should not have the pooling.

@gamedevsam
Copy link
Contributor

I agree that pooling is not very efficient atm due to heavy array manipulation. We shoulet investigate using a linked list as the base data structure of FlxPool before the decision to eliminate it.

@JoeCreates
Copy link
Member Author

@gamedevsam

@MSGhero's description of pooling would certainly be a lot better than we have now. Figuring out when and how to change the pool size could be tricky, though?

If we fix the pooling we should get rid of the "helper" points. Those _point and flashPoint properties are a nightmare for code readability, and even more a nightmare when you don't know what else might be trying to use them concurrently.

@MSGhero
Copy link
Member

MSGhero commented Jun 23, 2014

var count:Int = 0;

public function get():T {
   if (count == 0) return Type.createInstance(_class, []);
   return _pool[--count];
}

public function put(obj:T):Void {
   if (obj != null && _pool.indexOf(obj) < 0) {
      obj.destroy();
      _pool[count++] = obj;
   }
}

public function putUnsafe(obj:T):Void {
   if (obj != null) {
      obj.destroy();
      _pool[count++] = obj;
   }
}

public function preAllocate(numObjects:Int):Void {
   while (numObjects-- > 0) _pool[count++] = Type.createInstance(_class, []);
}

public function clear():Array<T> {
   count = 0;
   var oldPool = _pool;
   _pool = [];
   return oldPool;
}

Linked list works fine, but it's more code and some Node<T> internal class.

@JoeCreates
Copy link
Member Author

@MSGhero Can you get that to make any meaningful performance difference? I can't seem to.

EDIT: Making some of the functions inline helped on flash. There is a benefit after about 60k FlxPoints. C++ seems to see no benefit.

EDIT 2: C++ sees a benefit when the framerate is less than about 10 (Requiring around 1.2 million FlxPoints)

@gamedevsam
Copy link
Contributor

@JoeCreates, you should try using putUnsafe instead of put in your case, it might provide a meaningful perf difference. I believe we use putUnsafe internally.

@Gama11
Copy link
Member

Gama11 commented Jun 23, 2014

FlxPoint#put() uses putUnsafe() already.

@JoeCreates
Copy link
Member Author

I wonder what the performance difference would be if put() were used instead of putUnsafe(). I imagine that could be fairly significant . . . tries it

Wow . . . Yeah. put instead of putUnsafe makes pooling exceedingly worse than no pooling. With 50k points I can get 60fps without pooling (it's about the limit at which it starts to drop) but with pooling it drops to half on flash.

On C++ without pooling I get 30fps with 800k points. With put() pooling it drops below 10fps.

@Gama11
Copy link
Member

Gama11 commented Jun 23, 2014

Yep, put() uses indexOf(), which means you're looping over the entire point pool each time. It actually had a big impact on state switches.

@JoeCreates
Copy link
Member Author

And the pool size is actually only 5 in my test case. With bigger pools it will get worse much quicker. I think its safe to say that regardless of the eventual solution, the safety check in put() should go. (Although inPool could be used instead?)

@Gama11
Copy link
Member

Gama11 commented Jun 23, 2014

I disagree, it's actually pretty important the same object doesn't end up in the pool twice. putUnsafe() leaves room for optimizations if necessary.

@gamedevsam
Copy link
Contributor

At this point the only thing left to try is using a Linked List for the pooling implementation, I recommend we sit on this issue until we can try that approach (I'll look into building it, but I'll need some time, very busy at work atm).

@JoeCreates
Copy link
Member Author

@gamedevsam using a linkedlist will have zero effect on my test case, which is kind on the pooling because the pool size never needs to increase.

@JoeCreates
Copy link
Member Author

@Gama11 What is the point in pooling with this safety if it has way worse performance than than the completely safe non-use of pooling?

@gamedevsam
Copy link
Contributor

Pooling provides significant benefit on platforms with aggressive GCs (mainly HTML5). We should strive to make pooling as fast as possible, it's worth trying out the Linked List implementation. It might not have an impact on your test case, but it might, we don't know until we try.

@MSGhero
Copy link
Member

MSGhero commented Jun 23, 2014

It's also worth noting that FlxPoints don't have a whole lot going on, 2 floats and 2 booleans. Maybe try with something more substantial, like FlxSprite. If it takes hundreds of thousands of FlxPoints to make a performance difference, we should worry about other classes' test cases.

@MSGhero
Copy link
Member

MSGhero commented Jun 23, 2014

I mean as far as making the FlxPool class better, not referring to FlxPoint's internal pool. It might be that making a FlxPoint is faster than the function call and array access of using a pool.

Linked list might be worth looking at since all you're doing is adding or removing from the head. You would either have a Node class or a nextPoint property in FlxPoint. Creating a new Node might end up being worse for performance (unless you use a FlxPool of Nodes :P).

@azrafe7
Copy link
Contributor

azrafe7 commented Jun 23, 2014

I've fiddled with this a bit, here's how it played out (using @JoeCreates' code to benchmark it):

At first I found out that inlining get() and putUnsafe() in the pool class gave a little boost.

Then I tried a slightly different implementation of the pool that avoids calling pop() and push() and takes track of length internally. It gave another little speed up.

Afterwards I noticed that maybe some more could be squeezed by removing the overhead of calling the setters in FlxPoint. But that would have interfered with FlxCallbackPoint that relies on them.
So I came up with this workaround (that.also allowed to inline some of the methods). Must admit that it's really ugly, I know! 😁 (some better solution should be investigated to achieve the same effect). But the results were quite impressive (unless I'm missing something really obvious):

cpp worked @~60 FPS when pooling, and at half that framerate in non-pooling mode (with 500'000 points o.O).
flash shows similar behaviour... with ~80000 points, hehe.

Please test it as I'm pretty sure I've introduced some bugs in there that may break usage I've not foreseen. Especially the non-duplicates-in-pool policy.

In the process I found this small typo over at

_setYCallback = setXYCallback;
, where the rhs value should be setYCallback.

@gamedevsam
Copy link
Contributor

Conclusions:

  • Pooling benefits all targets when recycling is fast (minimal function calls, no properties access, no dynamics, etc).
  • Use of properties in FlxPoints causes significant overhead.

These are interesting findings, we need to discuss what's the best way to get around this overhead.

@MSGhero
Copy link
Member

MSGhero commented Jul 3, 2014

Was the linked list idea for the FlxPool class in general or just for FlxPoints? Because I could set up a pull for FlxPool if it's the latter.

@gamedevsam
Copy link
Contributor

In general.

@MSGhero
Copy link
Member

MSGhero commented Jul 4, 2014

https://github.com/MSGhero/FlxPoolTests

These tests use Timer.measure() which is more accurate than measuring the framerate. My test results

Tested on Windows-Release, Flash-Debug (can't copy paste from release), Neko-Release

"Control __" is just creating a new FlxPoint.

"Array Method __" is the current FlxPool of using push and pop.

"Array Index __" is the code I posted earlier.

"Top-Level List __" is using the List<T> class which is a linked list. There are both add and push methods so I added them both.

"PutU" is putUnsafe which doesn't check indexOf.

Array indexing is faster on the tested platforms in every test except for "Put+Get" which uses put and not putUnsafe (array methods faster). Not sure why, I didn't test the puts separate from the gets so that might be why. Will check that later.

The length of the pools is just 1 because bigger numbers timed out flash. But array indexing was faster with length=100 on Neko-release, with creating a new point obviously being fastest since it doesn't have to indexOf anything.

Notes:

  • LinkedLists don't have indexOf so I used Lambda.has in that implementation.
  • The time it took to create n points in the controls decreased each test on windows testing.
  • List<T> kinda sucks here, dunno how it works internally. If someone has a normal/better LinkedList I'll take that.
  • I threw this together fairly quickly and might have made a mistake.

EDIT:
I just noticed put() in the array indexing doesn't actually do anything at the moment bc indexOf() will always return >-1 due to the way the pool is implemented. It would need to be changed to compare indexOf() to where count currently is.

Changed it, and it's now consistently faster than the current implementation in every test.

Gama11 added a commit that referenced this issue Aug 3, 2014
Best performance FlxPool, closes #1189
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants