Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-19798: Memory leak in API3 EntityTag get operations #10844

Closed
wants to merge 1 commit into from

Conversation

spelcaster
Copy link

@spelcaster spelcaster commented Aug 10, 2017

Overview

The query results were kept in a global array and never freed, causing memory leak for multiple API calls in the same script.

Before

Bob Silvern script being executed without freed the resources

After

Bob Silvern script being executed with resources freed

Comments

  • I did a slight modification in the script so I could define which API call should be used in command line, instead of using it hard coded;
  • It's possible to note that there's still a small memory leak, this one is not related to query results but could possibly be related to global objects being kept through API calls.

- The methods DB_common::execute and DB_common::query creates an instance
  of DB_result passing the result from the generalized method simpleQuery.
  The DB_result is added to a global array called $_DB_DATAOBJECT and it
  was never freed, causing a huge memory leak when using subsequent api
  calls.
@civicrm-builder
Copy link

Can one of the admins verify this patch?

@seamuslee001
Copy link
Contributor

Jenkins ok to test

@seamuslee001
Copy link
Contributor

@totten @colemanw do either of you want to review this, the screenshots suggest this is useful also ping @eileenmcnaughton

@eileenmcnaughton
Copy link
Contributor

eileenmcnaughton commented Aug 10, 2017

This is good work & I did have a sneaking suspicion that memory leak had come back over time. Note that the change does affect a lot of pathways. I think @totten should take a look & we should also see if we can get some people to production test this one before we merge - maybe @systopia @xurizaemon @jaapjansma @seamuslee001 might be prepared to. (I'll see if I can too)

@spelcaster
Copy link
Author

Well, my first concern is a possible race condition to write in the $_DB_DATAOBJECT, since there's no control over the rights to access the variable, then I've assumed that there is no competition to access it.

The second problem is exactly the amount of pathways this change affects, since I'm totally new to CiviCRM, someone with deeper knowledge can help to figure out possible caveats.

P.S.: I probably earned a few levels with debugging PHP code in this issue >.<.

@MegaphoneJon
Copy link
Contributor

I wonder if this supersedes the fix I did for CRM-20754.

@totten
Copy link
Member

totten commented Aug 10, 2017

@spelcaster - yup, this is great investigative work.

Trying to put this into context, maybe we can imagine this as a dialectic:

  1. "Hey, this system is user-configurable. Let's store lots of metadata and configuration in the database."
  2. "Oh no! We're querying the database too much! That hurts performance!"
  3. "Let's use a cache. Put an end to redundant queries!"
  4. "It's too hard to retrofit caches for every little thing..."
  5. "OK, let's cache all database queries! Who cares if that sounds like a gigantic memory leak? The PHP process will end after a few milliseconds. And if you have some exceptions, then just cleanup manually."

Now fast-forward a few years...

  1. "Hey! I've got a script that runs longer than a few milliseconds, and it suffers a memory-leak!"
  2. "Hey! I've got another script that runs longer than a few milliseconds, and it also suffers a memory-leak!"

Purists might respond in a couple ways:

  • If we think like a C developer, then we'd say: "Everyone who calls any DAO (sub)class should be mindful of the memory usage and consider calling free. If there's a memory leak, then some developer didn't understand the importance of writing API/BAO/utility code which cleans up after itself."
  • If we think like the PHP dev who offered #5, then we'd say: "Applications/scripts which run for a long time are the exception -- those scripts should periodically call free. If there's a memory leak, then some developer didn't understand the importance of having the application manage the cache layer."

Neither of those are very satisfying -- in PHP, everyone expects cleanup to be automatic. Any exception requires active communication/education/enforcement.

The patch in 10844 seems more pragmatic -- both scripts and APIs/BAOs/utilities can remain blissfully unaware of the cache layer. Education problem solved! What's the flipside? The correctness of the approach is still pretty subjective/use-case-dependent, eg

  • On one hand, there'd be cases where it's too quick to clear a useful cache (e.g. because the app-logic is composed of many small API calls).
  • On the other hand, there'd be cases where it's not aggressive enough (e.g. because one API call touches a whole boatload of data).
  • But then again, an "API call" is perhaps the closest thing we have to a "unit of work". And CRM-19798 and CRM-20754 are concrete situations that people have now.

Anyway, that's some "thinking out loud" to understand the situation -- but I haven't actually formed an opinion. Hopefully some of this helps put it in perspective.

@totten
Copy link
Member

totten commented Aug 10, 2017

TBH, I'm not sure the original idea behind DAO-cache is valid today. We've had many years of subsequent development in which folks introduced narrower caches all over the place (for things like option-values / pseudo-constants / settings-metadata -- using static, CRM_Utils_Cache, Civi::$statics, and Symfony container). It's hard for me to imagine major areas where the DAO-cache would make a significant dent.

That's speculation -- I wish we had better SOP for benchmarking. One thing we can do... check the performance of the test suite. I tried running CRM_AllTests with a patch to kill all DAO caching:

diff --git a/DB/DataObject.php b/DB/DataObject.php
index 0d74ece..e4b3243 100644
--- a/DB/DataObject.php
+++ b/DB/DataObject.php
@@ -4269,6 +4269,12 @@ class DB_DataObject extends DB_DataObject_Overload


     }
+
+    function __destruct()
+    {
+      if (getenv('AUTOFREE')) { $this->free(); }
+    }
+
      /**
      * Free global arrays associated with this object.
      *

With caching enabled (AUTOFREE not set), CRM_AllTests took 19m34.724s. With caching disabled (AUTOFREE=1), CRM_AllTests took 19m27.233s. That seems like a negligble difference.

@spelcaster
Copy link
Author

@totten The DB_DataObject uses the cache internally (see DB_DataObject::getDatabaseResult), if the cache will be removed, then the class will need a simple refactoring.

But if the cache will be kept, because someone still need it, then calling the method free from the destructor based on the autofree flag is the way to go.

@spelcaster
Copy link
Author

spelcaster commented Aug 11, 2017

I've been thinking about this and the big problem so far is that there is no
cache reuse, it always do the query again and save the new result in the
cache, instead of use one that may already exist.

So, I have a proposal to improve the query cache:

// md5("select * from civicrm_membership") == fbc91bb5b1ece54e0b7a3d05a8572a0c
DB\Cache::queries[<query md5>] = [
    'result' => DB/DB_result,
    'table' => [
        <list of tables used in the query>
    ],
    'ttl' => <time to expire the cache>,
    'counter' => <the number of times the cache can be used>
];

DB\Cache::tables[<table>] = [
    <query md5> => <query md5> // just for fast access
]

Here are some of the business rules:

  • If the query md5 is present in the cached queries, then the DB_result is returned;
  • Some sql statements should remove the cached queries based on the table it's
    being run;
  • If the ttl has been reached, then que cached query should be removed and null
    will be returned instead;
  • If the counter drop to 0, then que cached query should be removed and null
    will be returned instead;
  • Every table used in a query will have the query md5 registered in
    DB\Cache::tables;
  • The user can easily configure if he wants to use the cache system;

This is the big problem, what if the result is destroyed while someone is
using it? So to mitigate this problem, the use of the cache will need to be
controlled, for example, each cached query will have a use counter, every time
the cache is required the counter will be incremented, when the cache is not
being used anymore the requester should inform that the cache is not being used
and the counter will be decremented, so the cached query can only be removed
if this counter is equal to 0, and if the requester somehow does not inform that he's
not using the cached query, we're screwed.

update

If DB_result::result was an array of associative arrays, we would not even need to
control the cache use, because we can copy this array, the same cannot be done
with the result resource, for example, we cannot clone a mysqli_result.

@totten
Copy link
Member

totten commented Aug 11, 2017

May be interesting to compare how other PHP DB abstractions handle this:

@spelcaster
Copy link
Author

spelcaster commented Aug 12, 2017

@totten, I've done a slight change to test the results with a correct usage of a cache system, and the difference in execution speed is tremendous, I'm attaching the picture and the patches.

cache-system

packages patch
civicrm core patch

@eileenmcnaughton
Copy link
Contributor

My suspicion is the DAO caching doesn't work very well anyway - I have a feeling the CRM_Core_DAO::getFieldValue had it's own caching added sometime around 4.0 & the DBResult caching pre-dates that

@michaelmcandrew
Copy link
Contributor

@spelcaster
Copy link
Author

Hi @michaelmcandrew, until now I didn't even know that this has been accepted, however, as I said In one of my comments above, the problem here is that the query results saved in the cache weren't being used for most part of the code.

However, my guess is that you seem to have found a branch that uses this cache somehow, but the query result has been freed.

@michaelmcandrew
Copy link
Contributor

@spelcaster - I don't think it has been accepted - at least this branch is not merged. I was working on a site that had manually applied the patch.

@colemanw
Copy link
Member

Sounds like there is more work to do on this. Let's reopen it when there is a fix that addresses the bugs raised by @michaelmcandrew and the suggestions made by @totten.

@eileenmcnaughton
Copy link
Contributor

@spelcaster @MegaphoneJon @totten @michaelmcandrew I did some digging on this & came to the conclusion that the caching is NOT conferring any performance benefits. Similar to Tim, I tried various configs against Jenkins and the various runs did not perform statistically differently. Even the memory use was not notably different on jenkins (#11616 ran the test that demonstrated rapidly inflating memory usage locally) - but locally I got possible speed improvement & clear memory use improvement by

Before
Memory in use before & after 100 iterations of EntityTag.get

Start :'49,864,896'
End :'53,503,264'

After
Memory in use before & after 100 iterations of EntityTag.get

Start :'49,851,680'
End:'49,852,168'

This improvement came from switching from a method that does not free the DAO to one that does in the function isMultilingual - #11614

I also tried by the DAO deconstruct method (#11615) . Both had the same impact here but I found a LOT of other places that were consuming excessive memory using the $dao->fetch() method or the executeQuery method.

Regarding any theoretical benefit of the caching in the PEAR class - the ONLY 2 place that call getDatabaseResult are in functions that explictly free the DAO before returning (singleValueQuery) and fetchRow- and hence would not be affected as they have already freed it before deconstruct freeing happens. The executeQuery & fetch() methods are the main methods where the results are preserved and which hog memory. However, when the DAO object is deconstructed those results are still preserved but no longer accessible. They are the main places for memory issues.

Further notes

  • I think relying on mysql query caching is safer than some weird DB query caching. We sensibly cache many query results ourselves
  • I think the DAO retention of results is not conferring any benefits to us & it makes more sense to unravel the harm it does than try to fix it
  • separately we may or may not look at query caching further
  • possibly worth asking @lcdservices to test the patch (I prefer CRM-19798 fix memory leak #11615) as he uses some extra DB configs I believe

@eileenmcnaughton
Copy link
Contributor

SUMMARY - I believe #11615 should be merged & #11614, #10844 & #11616 closed

@michaelmcandrew
Copy link
Contributor

michaelmcandrew commented Feb 1, 2018

FWIW, I agree that this should not be merged. From what I recall (and I didn't do a thorough investigation), this patch prevented me from working multiple objects that inherited from CRM_Core_DAO at the same time, which seems like a fairly common thing to want to do.

@eileenmcnaughton
Copy link
Contributor

@michaelmcandrew - could you test #11615 ? I actually removed a bunch of \CRM_Core_DAO::freeResult(); since I think there could be hidden cases of the problem you described & the change in that patch makes it redundant

@michaelmcandrew
Copy link
Contributor

@eileenmcnaughton - i was debugging the issue was on a client's test infrastructure that was spun up for the purpose. It wouldn't be easy to get it up and running again and patched with this, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants