-
-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRM-21260: CiviMail compose UI very slow to initialize, CRM-21316: Refactor getRecipients fn #11142
Conversation
ed4c6f8
to
0033ace
Compare
CRM/Mailing/BAO/Mailing.php
Outdated
* @param bool $storeRecipients | ||
* @param bool $dedupeEmail | ||
* @param null $mode | ||
* This function is created after simplifying the code of self::getRecipients(..) and these are the changes made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would only put permanent documentation in the PR, and put descriptions of changes / improvements that are relevant while reviewing into comments on the PR. That way, the repo only has comments that continue to be relevant. So, for example, the PR should contain:
This function provides the recipients of a CiviMail mailing.
It provides a simplified implementation of an earlier function, now removed, self::getRecipients(..).
NB: The Smart Group contact cache is refreshed immediately by this call for all included and excluded smart groups in the mailing.
In comments on the PR, put in 1 - 5b.
Now, a question: do we really want this code to override the smart cache timeout on groups and force their refresh? Wouldn't it be better to rely on the cache than to force it to be rebuilt? We could provide an additional button for those rare cases where someone is concerned about forcing a rebuild, and willing to accept the possibly long rebuild time. Recall that this can take 5 - 10 minutes for some large complex hierarchical smart groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we are not forcing to rebuilt group contact cache because CRM_Contact_BAO_GroupContactCache::load($groupDAO)
respects the cache and refresh time, and on basis of which it will decide to rebuild contact cache for smart group identified by $groupDAO
which is how it's been done in other places.
CRM/Mailing/BAO/Mailing.php
Outdated
$job = CRM_Mailing_BAO_MailingJob::getTableName(); | ||
$mg = CRM_Mailing_DAO_MailingGroup::getTableName(); | ||
$eq = CRM_Mailing_Event_DAO_Queue::getTableName(); | ||
// there is no need to proceed further as no mailing group is selected to include recipients, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change 'as' to 'if', or put this comment inside the if
// there is no need to proceed further if no mailing group is selected to include recipients,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
CRM/Mailing/BAO/Mailing.php
Outdated
|
||
$groupDAO = CRM_Core_DAO::executeQuery($sql); | ||
while ($groupDAO->fetch()) { | ||
if ($groupDAO->cache_date == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you are not rebuilding the cache for all groups, just for ones where the cache is empty/expired? If so, please clarify the DocBlock for the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if cache is empty as per earlier code in getRecipients(...)
, I haven't changed this logic. Will mention it in doc block
CRM/Mailing/BAO/Mailing.php
Outdated
} | ||
|
||
// Create a temp table for contact exclusion. | ||
$excludeTempTablename = "excluded_recipients_mailing_temp"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause errors if two people are sending at the same time. Could you make unique name using a UUID or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will use "excluded_recipients_mailing_temp" . substr(sha1(rand()), 0, 4)
CRM/Mailing/BAO/Mailing.php
Outdated
|
||
// Create a temp table for contact exclusion. | ||
$excludeTempTablename = "excluded_recipients_mailing_temp"; | ||
$includedTempTablename = "included_recipients_mailing_temp"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make unique eg by using UUID
CRM/Mailing/BAO/Mailing.php
Outdated
); | ||
if (!empty($recipientsGroup['Exclude'])) { | ||
$sql = sprintf(" | ||
INSERT INTO %s (contact_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you estimate how much memory these two temp tables might take based on number of groups and size of db (eg for 600k records, with a group involving 10 groups and total of 1.8M members), then post into dev channel for feedback on if this is problem? There are currently around 20 places in codebase where temp tables using ENGINE=HEAP are created.
CRM/Mailing/BAO/Mailing.php
Outdated
INSERT INTO %s (contact_id) | ||
SELECT DISTINCT gc.contact_id | ||
FROM civicrm_group_contact gc | ||
WHERE gc.status = 'Added' AND gc.id IN (%s)", $excludeTempTablename, implode(',', $recipientsGroup['Exclude'])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the names of Groups sanitized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the query-builder can handle INSERT...SELECT...
with validation/escaping/interpolation:
CRM_Utils_SQL_Select::from('civicrm_group_contact')
->select('DISTINCT gc.contact_id')
->where('gc.status = "Added" AND gc.id IN (#groups)')
->param('#groups', $recipientsGroup['Exclude'])
->insertInto($excludeTempTablename, array('contact_id'))
->execute();
CRM/Mailing/BAO/Mailing.php
Outdated
$storeRecipients = FALSE, | ||
$dedupeEmail = FALSE, | ||
$mode = NULL) { | ||
public static function getMailRecipients($mailingObj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put into function signature DocBlock an explanation of what $mailingObj is, eg the current mailing object (as opposed to a former excluded mailing object)?
CRM/Mailing/BAO/Mailing.php
Outdated
FROM civicrm_mailing_event_queue meq | ||
INNER JOIN civicrm_mailing_job mj ON meq.job_id = mj.id | ||
INNER JOIN civicrm_mailing_group mg ON mj.mailing_id = mg.entity_id AND mg.entity_table = 'civicrm_mailing' | ||
WHERE mg.mailing_id = %d AND mg.group_type = 'Exclude'", $excludeTempTablename, $mailingID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably confused, but isn't $mailingID the ID for the current mailing, so the comment on L211 doesn't make sense to me. If $mailingID is the current mailing, then it seems to imply that the Mailing Event Queue has been built, which could be used to retrieve the recipients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right, why refactoring, I overlooked this query and simply included it from here without any thought. I will remove this query as it is no longer necessary here because we are building recipients new without any job or event queue.
$mailing_id = NULL, | ||
$storeRecipients = FALSE, | ||
$dedupeEmail = FALSE, | ||
$mode = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sending via SMS currently using this function? Wouldn't the performance improvements be equally useful for SMS recipients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now to keep it simple I have created this new fn, only to fetch mail recipients, and for civiSMS it will use the old getRecipients() fn. Also this new fn is only called via civiMail screen, so once I am done with the refactoring will figure out a way to extend support for phone recipients.
CRM/Mailing/BAO/Mailing.php
Outdated
// Get all the group contacts we want to include. | ||
$mailingGroup->query( | ||
"CREATE TEMPORARY TABLE $includedTempTablename | ||
(email_id int, contact_id int primary key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to follow the convention of always putting primary key first to avoid confusing people, as you rely on contact_id being primary below when using a REPLACE query:
(contact_id int primary key, email_id int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok .. will swap.
CRM/Mailing/BAO/Mailing.php
Outdated
$groupBy = $groupJoin = ''; | ||
$orderBy = "i.contact_id, i.email_id"; | ||
if ($mailingObj->dedupe_email) { | ||
$orderBy = "MIN(i.contact_id), MIN(i.email_id)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using DISTINCT in the SELECT clause, a synonym for DISTINCTROW, is more efficient than doing this order by / group by as a method of eliminating duplicates. You'll still need the group by, but you can remove the effort to sort/order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old cruft code :( will change
d1a9119
to
7375f51
Compare
Jenkins test this please |
1 similar comment
Jenkins test this please |
@monishdeb I have tested this patch on the site that I mentioned in CRM-21260. I used Chrome, using Inspector -> Network to check the time elapsed until the UI finishes loading, including the WYSIWYG: in my testing this is always the last thing to load, immediately after the recipient count changes from "(Estimating)" to "No recipients". Previous testing for CRM-21260 showed that the number of message templates makes a big difference. So I tested with & without the patch with different numbers of message templates enabled (all vs just the reserved ones). 3 trials in each condition. ResultsTesting 4.7.26RC-201710090252Message templates: 31: Message templates: 1212: Testing 4.7.26RC-201710090252 + PR 11142Message templates: 31: Message templates: 1212: Results SummaryThe patch reduced the average initialization time by 2-3 seconds and the number of AJAX requests by 12. |
@monishdeb See #11267 for @nbrettell's work on loading message templates on demand. Getting excellent improvements in load time here. |
test this please |
@davejenx @monishdeb @mattwire @seamuslee001 (cc @totten since could affect mosaico) A lot of time seems to have gone into testing & discussing this & my take is that it is possibly mergeable on test - pass (which needs to be confirmed) & that it has possibly stalled due to lack of clarity about status |
@monishdeb can you rebase this please |
…mplify getRecipients fn
@mlutfy we have implemented the patch for NYSS and are planning to roll to production this week. |
@lcdservices Has this been running OK in production? |
@JKingsnorth yes, we've been running in a high volume multi-site environment for a couple weeks without issue |
Great, is there any further testing or review required to merge this? |
Merging based on the comments/reviews/testing. Thanks everyone! |
Thank you everyone for your feedback and effort in getting this PR merged :) |
@monishdeb Thanks for all your work! |
well done all |
if (count($includeSmartGroupIDs)) { | ||
CRM_Utils_SQL_Select::from($contact) | ||
->select("$contact.id as contact_id, $email.id as email_id") | ||
->join($email, " INNER JOIN $email ON $email.contact_id = $contact.id ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @monishdeb I've just spotted something - I think these changes will break trying to include a smart group in an SMS type mailing? There is no if statement, so this will insert email IDs into the 'phone ID' column in the temporary table?
I haven't proved this definitively yet, working on setting up my localhost to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @JKingsnorth for notifying this break. I am going to a create a separate PR for this fix.
Unfortunately I haven't been able to test this on dmaster, because every time I try to create a new SMS mailing I hit a DB error. I'm not sure whether that's related to this PR or not yet. However, from reading that part of the code, I think that this PR will introduce a problem in smart groups being included in SMS mailings. |
@@ -1788,7 +1609,7 @@ public static function create(&$params, $ids = array()) { | |||
|
|||
// Create parent job if not yet created. | |||
// Condition on the existence of a scheduled date. | |||
if (!empty($params['scheduled_date']) && $params['scheduled_date'] != 'null' && empty($params['_skip_evil_bao_auto_schedule_'])) { | |||
if (!empty($params['scheduled_date']) && $params['scheduled_date'] != 'null') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opened #11653 to e-instate this check - I assume is was removed on the assumption that it was not being used but it's actually accessible via the api. We should merge that to 4.7.31
In `CRM_Mailing_BAO_Mailing::create` and `Mailing.create` API, there is a (*ahem*) special behavior where setting the `scheduled_date` will immediately trigger scheduling. One shouldn't submit `scheduled_date` for a preview. This was not symptomatic before civicrm#11142/v4.7.31 because all preview operations were wrapped in a transaction and rolled back. But now previews are allowed to have side-effects, so we need some other means to prevent. This copies the workaround from `crmMailingMgr.save()` and applies it to `crmMailingMgr.preview()` (etal).
In `CRM_Mailing_BAO_Mailing::create` and `Mailing.create` API, there is a (*ahem*) special behavior where setting the `scheduled_date` will immediately trigger scheduling. One shouldn't submit `scheduled_date` for a preview. This was not symptomatic before civicrm#11142/v4.7.31 because all preview operations were wrapped in a transaction and rolled back. But now previews are allowed to have side-effects, so we need some other means to prevent. This copies the workaround from `crmMailingMgr.save()` and applies it to `crmMailingMgr.preview()` (etal).
Overview
This PR is intended to reduce slowness of CiviMail screen during initialization and drafting, by addressing various issues:
Before
Mailing.create
API to fetch recipients in an inefficient way, which are:a) Create unnecessary mailing Job later used by
getRecipients(...)
to fetch recipientsb) Cause rollback on DB changes made after creating dummy mailing, it's jobs and mailing_recipients records. See here
c)
getRecipients(...)
is an expensive function as in underlying code it executes some queries needlessly and also dependent on mailing job (only used in temp table name) which is unnecessary.REST API call for
Mailing.getlist({"params":{"id":{"IN":[]}},"extra":["is_hidden"]})
Response:
{"is_error":1,"error_message":"invalid criteria for IN"}
After
getMailRecipients(...)
is created after simplifying the code of self::getRecipients(..) and these are the changes made:a. Takes saved mailing object as the parameter later used mostly to build filters.
b. This function ONLY fetch and store mailing recipients, doesn't handle SMS mode
c. Use fewer JOINs in underlying queries.
d. Simplifies code to fetch contacts from selected mailing smart group(s).
e. Doesn't execute queries needlessly, e.g.
i) if SG (smart groups) are selected then ONLY queries responsible to fetch contacts from SG are executed
ii) If no 'Include' groups are selected then simply return as this also means no groups are selected to fetch recipients.
f. Refreshes contact cache of all smart groups, at once.
g. Kept both functions for now to reduce any unintended regressions
mailing.create
API using a special parameter{get_recipients: TRUE} and removed
force_rollback
and removed dummy mailing, job createTechnical Details
Marked with [WIP] as optimisation needed and reduce more AJAX calls to improve performance.