Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF] AllCoreTables - Simplify array lookups & transformations #29368

Closed
wants to merge 3 commits into from

Conversation

colemanw
Copy link
Member

@colemanw colemanw commented Feb 11, 2024

Overview

Simplifies code, saves memory, deprecates an unused function.
Preliminary cleanup for dev/core#4999.

Before

Caches the same data 3 times in 3 different arrays.

After

Consolidates it to a single array, because calling array_column() really isn't hard.

Comments

This will make further changes easier if we decide to cache this data in a different way, now there's only one thing to store & retrieve.

Also deprecates an unused function.

Previously it was caching the same data in 3 arrays.
This consolidates it to a single array, because calling array_column() it's that hard.
Copy link

civibot bot commented Feb 11, 2024

🤖 Thank you for contributing to CiviCRM! ❤️ We will need to test and review this PR. 👷

Introduction for new contributors...
  • If this is your first PR, an admin will greenlight automated testing with the command ok to test or add to whitelist.
  • A series of tests will automatically run. You can see the results at the bottom of this page (if there are any problems, it will include a link to see what went wrong).
  • A demo site will be built where anyone can try out a version of CiviCRM that includes your changes.
  • If this process needs to be repeated, an admin will issue the command test this please to rerun tests and build a new demo site.
  • Before this PR can be merged, it needs to be reviewed. Please keep in mind that reviewers are volunteers, and their response time can vary from a few hours to a few weeks depending on their availability and their knowledge of this particular part of CiviCRM.
  • A great way to speed up this process is to "trade reviews" with someone - find an open PR that you feel able to review, and leave a comment like "I'm reviewing this now, could you please review mine?" (include a link to yours). You don't have to wait for a response to get started (and you don't have to stop at one!) the more you review, the faster this process goes for everyone 😄
  • To ensure that you are credited properly in the final release notes, please add yourself to contributor-key.yml
  • For more information about contributing, see CONTRIBUTING.md.
Quick links for reviewers...

➡️ Online demo of this PR 🔗

@civibot civibot bot added the master label Feb 11, 2024
@artfulrobot
Copy link
Contributor

This cuts some code out, removes a couple of array-indexes but I don't think the memory/efficiency change will be noticeable. I mean it's ~40times slower now but we probably won't notice.

So if this is a step towards something good, then let's merge.

@colemanw
Copy link
Member Author

@artfulrobot mean it's ~40times slower now but we probably won't notice.

Indeed, if we had to run one of the functions in this class 100,000 times per page request we would have some performance problems (but probably for other reasons, like, why are we calling the same function 100,000 times??? 😆). But I'm not quite sure which function you're benchmarking here, I don't think I've directly changed any return $foo[$bar] into a foreach($foo)... if($foo['bar'] === $bar) return $foo['bar'] with this PR? I think I've indirectly changed return $foo[$bar] into return array_column($foo, 'baz', 'bar')[$bar] but that should be faster than a foreach() loop.

@artfulrobot
Copy link
Contributor

artfulrobot commented Feb 12, 2024

@colemanw sure, array_column is even slower (100x or more) - because every access you're re-indexing the entire array, just to then access a single value. vs a small memory saving (dunno how to calculate that, but suspect ~20kB).

But I'm not quite sure which function you're benchmarking here

I had looked at this code? https://github.com/civicrm/civicrm-core/pull/29368/files#diff-d5169860458a5b642ed284a44d5e0bf1de17d8c319a824185c60fdf29ffb60ebR375-R380

@colemanw
Copy link
Member Author

sure, array_column is even slower (100x or more)

LOL! Good to know!
But the specific example you're pointing to is actually the one change that will be faster because I switched from CRM_Utils_Array::findAll() which loops the entire array to a foreach() which breaks as soon as the item is found. So that's one small win :)

@colemanw
Copy link
Member Author

colemanw commented Feb 12, 2024

@artfulrobot based on your benchmarking I'm tempted to rework this even more so that we sacrifice a little memory (but probably not more than before this PR) and cache an array with 3 indexes:

$cache = [
  'entities' => [
    'Contact' => ['table' => 'civicrm_contact', 'class' => 'CRM_Contact_DAO_Contact'],
    ...
  ],
  'classes' => [
    'CRM_Contact_DAO_Contact' => 'Contact',
    ...
  ],
  'tables' => [
    'civicrm_contact' => 'Contact',
    ...
  ],
];

Before this PR maybe 75% functions in this class used an index, after this PR maybe 50%? The above structure would allow 100% of them to use an index.

@colemanw
Copy link
Member Author

Ok @artfulrobot you inspired me. #29377 saves a bit of memory and speeds things up. Best of both!

@totten
Copy link
Member

totten commented Feb 12, 2024

@colemanw @artfulrobot

FWIW, I got the results back from comparing 3 branches with multiple runs of the phpunit-crm and phpunit-api3 suites. This was a moderately-controller environment (workstation; no other jobs test-jobs or user-activity; but there could be unattended system-processes).

The branch merge-dao-stuff is the same as master plus 4 PR's pending as of yesterday; among them, this was the one most likely to have some kind of effect.

BRANCH crm(1) crm(2) crm(3) api3(1) api3(2) api3(3)
5.70 2973.1 2979.5 2998.1 2468.6 2472.2 2466.7
master 2592.4 2592.5 2593.8 2072.7 2062.4 2146.6
merge-dao-stuff 2604.8 2601.4 2607.4 2083.7 2085.0 2080.5

These are runtimes measured in seconds. (Range of 2060-2979 s aka 34-49 min.) Results were generally pretty consistent for a suite+branch (with one major outlier on master+api3(3) -- smells like a background-process). Anyway, my interpretation is that merge-dao-stuff added 10-15sec (~0.5%) to the overall runtime of each suite.

(Obviously, test-benchmarks are only one kind of benchmark. It's just the convenient one...)

(Aside: I'm not sure why 5.70 was so much slower than master. I guess something along the way was deleted or optimized. 🤷)

@colemanw
Copy link
Member Author

@artfulrobot
Copy link
Contributor

Wow @colemanw good work !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants