Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add / make fit for purpose email.getlist api call #16993

Merged
merged 1 commit into from
Apr 8, 2020

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Apr 6, 2020

Overview

Brings email.getlist api to functional parity with old ad hoc CRM_Contact_Page_AJAX::getContactEmail so we can switch calls over & deprecate / remove that

Before

Email.getlist api not really usable for entity reference fields due to gaps below

After

Parity with the legacy function

Technical Details

The function CRM_Contact_Page_AJAX::getContactEmail is one of our earlier ajax attempts & this approach has been largely
replaced with entity Reference fields. In order to switch over we need to bring Email.getlist api to parity which means

  1. searching on sortname first, if less than 10 results on emails include emails
  2. appropriate respect for includeWildCardInName (this should already be in the generic getlist)
  3. filter out on_hold, is_deceased, do_not_email
  4. acl support (should already be part of the api).

The trickiest of these to support is the first - because we need to avoid using a non-performant OR
My current solution is the idea of a fallback field to search if the search results are less than the limit.
in most cases this won't require a second query but when it does it should be fairly quick.

Comments

A notable gap is the ability to filter on groups & tags which don't easily join onto email.get in apiv3. I have left out of scope for now

Note this can be tested on the bcc field on #16936 (not the submit part as I haven't done that side yet). The current PR in that cleanup chain is #16954

@civibot
Copy link

civibot bot commented Apr 6, 2020

(Standard links)

@civibot civibot bot added the master label Apr 6, 2020
@eileenmcnaughton eileenmcnaughton force-pushed the emailget branch 3 times, most recently from 3128c94 to dace9ba Compare April 6, 2020 04:04
1 Outdated Show resolved Hide resolved
@colemanw
Copy link
Member

colemanw commented Apr 6, 2020

Cool. I like the idea of a "fallback" field; like a poor-man's UNION.

@eileenmcnaughton
Copy link
Contributor Author

@colemanw I thought about having it as an array or similar but it felt like complexity that might never be needed & could be added later

$request['params'][$defaults['search_field_fallback']] = $request['params'][$defaults['search_field']];
unset($request['params'][$defaults['search_field']]);
$request['params']['options']['limit'] -= $result['count'];
$result2 = civicrm_api3($entity, 'get', $request['params']);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add something to the params here that excludes the original results.
NOT IN(array_keys($results)) would work partially, but it would only exclude the current page.
What about a "NOT LIKE" clause to exclude any results that would have been returned by searching on the original field.
The reason I bring this up is because we are dealing with a paging situation so duplicate results can mess up the pager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colemanw - that makes sense. In my testing it had no negative performance impact adding that - however my testing did demonstrate that the way it was doing the ordering WAS having a performance impact and that for this to perform well the sort_field needs to be the same as the filter field. I'm on the fence about doing a post-sort in php - as of now its

  1. sort_name matches, ordered by sort_name, labeled by sort_name padded with
  2. email matches, ordered by email, labeled by sort_name

Copy link
Member

@colemanw colemanw Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eileenmcnaughton this is looking really good. My one hesitation is that the 'NOT IN' approach will possibly return redundant results because of the pager. Ex:

  • Type a search query
  • It returns 10 results.
  • Scroll to the end and another 10 results appears (page 2)
  • Scroll down again and 5 results are fetched by the main $request. It then fires the fallback request, using 'NOT IN' to exclude the 5 current results.
  • Since it did not exclude results from pages 1 & 2 (only the five contacts from page 3 were excluded) you'll get duplicates on page 3+.
  • On page 4 you can get even more duplicates since main request will return 0 results, nothing will be excluded.

If it's not a big performance hit, the 'NOT LIKE' method could fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it was OK in the performance tests below so trying it

@eileenmcnaughton
Copy link
Contributor Author

I did some speed testing and we actually have to be a bit careful here - adding the filter above is OK but what is NOT OK is sorting by a field other than the filter. When you do that it uses the sort filter as the index and then does an unindexed filter.

I'm pasting the results below. Note that the time for most queries was only between 10% and 50% of the original time when I tested without the is_deleted_sort_name index but the pattern of which were quicker was the same.

SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = "0") AND (contact_id_to_civicrm_contact.is_deleted = "0")
AND (contact_id_to_civicrm_contact.is_deceased = "0") AND (contact_id_to_civicrm_contact.do_not_email = "0") AND (a.email LIKE "po%")
ORDER BY contact_id_to_civicrm_contact.display_name
LIMIT 10
OFFSET 0;

#10 rows in set (33.70 sec)

With sort_name NOT LIKE 'bob%'

SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = "0")
AND (contact_id_to_civicrm_contact.is_deleted = "0")
AND (contact_id_to_civicrm_contact.is_deceased = "0")
AND (contact_id_to_civicrm_contact.do_not_email = "0")
AND (a.email LIKE "ga%")
AND (contact_id_to_civicrm_contact.sort_name NOT LIKE 'ga%')
ORDER BY contact_id_to_civicrm_contact.display_name
LIMIT 10
OFFSET 0;
#10 rows in set (36.37 sec)

With email.id NOT IN (20,40,60,100,120)

SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = "0")
AND (contact_id_to_civicrm_contact.is_deleted = "0")
AND (contact_id_to_civicrm_contact.is_deceased = "0")
AND (contact_id_to_civicrm_contact.do_not_email = "0")
AND (a.email LIKE "ba%")
AND a.id NOT IN (20,40,60,100,120)

ORDER BY contact_id_to_civicrm_contact.display_name
LIMIT 10
OFFSET 0;
#10 rows in set (27.11 sec)

#basic - order by email
SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = "0") AND (contact_id_to_civicrm_contact.is_deleted = "0")
AND (contact_id_to_civicrm_contact.is_deceased = "0") AND (contact_id_to_civicrm_contact.do_not_email = "0") AND (a.email LIKE "co%")
ORDER BY a.email
LIMIT 10
OFFSET 0;

10 rows in set (0.01 sec)

With sort_name NOT LIKE 'go%' - order by email

SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = "0")
AND (contact_id_to_civicrm_contact.is_deleted = "0")
AND (contact_id_to_civicrm_contact.is_deceased = "0")
AND (contact_id_to_civicrm_contact.do_not_email = "0")
AND (a.email LIKE "go%")
AND (contact_id_to_civicrm_contact.sort_name NOT LIKE 'go%')
ORDER BY a.email
LIMIT 10
OFFSET 0;
#10 rows in set (0.01 sec)

With email.id NOT IN (20,40,60,100,120) order by email

SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = "0")
AND (contact_id_to_civicrm_contact.is_deleted = "0")
AND (contact_id_to_civicrm_contact.is_deceased = "0")
AND (contact_id_to_civicrm_contact.do_not_email = "0")
AND (a.email LIKE "po%")
AND a.id NOT IN (20,40,60,100,120)

ORDER BY a.email
LIMIT 10

OFFSET 0;

#10 rows in set (0.01 sec)

#first query - filter sort name, order display name
SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = "0") AND (contact_id_to_civicrm_contact.is_deleted = "0")
AND (contact_id_to_civicrm_contact.is_deceased = "0") AND (contact_id_to_civicrm_contact.do_not_email = "0") AND (sort_name LIKE "do%")
ORDER BY contact_id_to_civicrm_contact.display_name
LIMIT 10
OFFSET 0;

10 rows in set (15.02 sec)

#first query - filter sort name, order sort name
SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = "0") AND (contact_id_to_civicrm_contact.is_deleted = "0")
AND (contact_id_to_civicrm_contact.is_deceased = "0") AND (contact_id_to_civicrm_contact.do_not_email = "0") AND (sort_name LIKE "so%")
ORDER BY contact_id_to_civicrm_contact.sort_name
LIMIT 10
OFFSET 0;
#10 rows in set (0.02 sec)

#first query - filter sort name, order sort name, correct casting
SELECT SQL_NO_CACHE a.id as id, contact_id_to_civicrm_contact.display_name as contact_id.display_name, contact_id_to_civicrm_contact.sort_name as contact_id.sort_name, a.email as email
FROM civicrm_email a
INNER JOIN civicrm_contact contact_id_to_civicrm_contact ON a.contact_id = contact_id_to_civicrm_contact.id
WHERE (a.on_hold = 0) AND (contact_id_to_civicrm_contact.is_deleted = 0)
AND (contact_id_to_civicrm_contact.is_deceased = 0) AND (contact_id_to_civicrm_contact.do_not_email = 0) AND (sort_name LIKE "ho%")
ORDER BY contact_id_to_civicrm_contact.sort_name
LIMIT 10
OFFSET 0;
#10 rows in set (0.02 sec)

Note that .02 sec was halved by dropping alter table civicrm_contact DROP INDEX index_is_deleted_sort_name;

@colemanw
Copy link
Member

colemanw commented Apr 7, 2020

Ok I think I follow. So the upshot of your testing is that the default order_by should be set to something reasonably performant, yes?

@eileenmcnaughton
Copy link
Contributor Author

@colemanw yep - I updated the PR to work in this scenario (ie. to have a usable sort by on both queries)

@mfb just pinging you in on this - the goal is to switch the cc & bcc fields on send-an-email to use Email based entity references, without reverting to bad performance

The function CRM_Contact_Page_AJAX::getContactEmail is one of our  earlier  ajax attempts & this approach has been largely
replaced with entity Reference fields. In order to switch over we need to bring Email.getlist api to parity which  means
1) searching on sortname first, if less than 10 results on emails include emails
2) appropriate respect for includeWildCardInName (this should already be in the generic getlist)
3) filter out on_hold, is_deceased, do_not_email
4) acl support (should already  be part of the api).

The trickiest of these to support is the first - because we need to avoid using a non-performant OR
My current solution is the idea of a fallback field to search if the search results are less than the limit.
in most cases this won't require a second query but when it does it should be fairly quick.
@colemanw
Copy link
Member

colemanw commented Apr 8, 2020

retest this please

@eileenmcnaughton
Copy link
Contributor Author

@colemanw it passed!

@colemanw
Copy link
Member

colemanw commented Apr 8, 2020

This introduces a new, as-yet-unused feature, with good test cover, so pretty safe to merge. Code & test looks good.

@colemanw colemanw merged commit 23310ac into civicrm:master Apr 8, 2020
@colemanw colemanw deleted the emailget branch April 8, 2020 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants