Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(dev/core#217) PrevNext - Implement Redis. Decouple Query::getCachedContacts(). #12665

Merged
merged 4 commits into from
Nov 28, 2018

Conversation

totten
Copy link
Member

@totten totten commented Aug 14, 2018

Overview

NOTE: To avoid merge-conflicts, this PR extends/depends on #12664 -- which modifies similar sections of code. This is the final excerpt from #12377. Some commits have been rearranged/squashed to simplify the appearance, though the end result is the same.

This patch implements the first non-SQL storage mechanism for storing prev-next cache used by contact-search screens. It also includes a change in the CRM_Core_PrevNextCache_Interface and some small cleanups (as separate commits).

Before

  • The function CRM_Contact_BAO_Query::getCachedContacts() is hard-coded to only fetch CID's from the SQL-based cache.
  • The only implementation of CRM_Core_PrevNextCache_Interface is Sql.

After

  • The function CRM_Contact_BAO_Query::getCachedContacts() is more flexible -- accepting any CID's provided by CRM_Core_PrevNextCache_Interface::fetch().
  • There are two implementations of CRM_Core_PrevNextCache_Interface -- Sql and Redis.
  • There is an option ("Administer => Misc => PrevNext Cache") to choose the backend. By default, it auto-detects the best available backend.

(There's more discussion of changes in the commit messages.)

Technical Details

I had initially expected to store the entire result-set in Redis as one cache key, e.g.

$cache->set("{$prefix}/prevnext/{$cacheKey}", implode(',', $allMatchingContactIds);

$allMatchingContactIds = explode(',', $cache->get("{$prefix}/prevnext/{$cacheKey}"));

With a large result set (think "1 million contacts"), this would produce a large cache record -- and any page-view would have to fetch the entire list. But then I found that Redis and php-redis expose several data structures -- such hashes, lists, sets, and sorted-sets. The sorted-set API lets us track a list of integers, preserve the ordering, and fetch results on a paginated basis (ZRANGE / zRange()).

It seemed like the most performant approach would be to split the data into three pieces:

  • Store the main list of matching CID's as a sorted-set ({$prefix}/prevnext/{$cacheKey}/all)
  • Store the list of selected CID's as a sorted-set ({$prefix}/prevnext/{$cacheKey}/sel) with the same ordering (scores/weights) as the main list
  • Store the auxiliary data about each CID to a hash ({$prefix}/prevnext/{$cacheKey}/data)

If you open redis-cli while using the search interface, you can inspect the content of these keys, e.g.

#### Browse available cache keys
127.0.0.1:6380> keys */prevnext/*
1) "dmaster/prevnext/civicrm search 5221f7065db967c2256b4f2725912694_5264/all"
2) "dmaster/prevnext/civicrm search 5221f7065db967c2256b4f2725912694_5264/sel"
3) "dmaster/prevnext/civicrm search 5221f7065db967c2256b4f2725912694_5264/data"

#### Get list of all matching contacts from "all"
127.0.0.1:6380> ZRANGE "dmaster/prevnext/civicrm search 5221f7065db967c2256b4f2725912694_5264/all" 0 -1
  1) "51"
  2) "147"
  3) "123"
   ...

#### Get list of manually selected contacts from "sel"
127.0.0.1:6380> ZRANGE "dmaster/prevnext/civicrm search 5221f7065db967c2256b4f2725912694_5264/sel" 0 -1
1) "149"
2) "16"
3) "25"

#### Get auxiliary data from "data"
127.0.0.1:6380> hgetall "dmaster/prevnext/civicrm search 5221f7065db967c2256b4f2725912694_5264/data"
  1) "51"
  2) "Adams family"
  3) "147"
  4) "Adams, Andrew"
   ...

Comments

During development of the original #12377, I used this worksheet to organize fairly deep testing of the prev-next functionality. I haven't re-tested after the various cherry-picks/merges, but it could still be a good guide for thorough r-run testing.

@civibot
Copy link

civibot bot commented Aug 14, 2018

(Standard links)

@totten totten force-pushed the master-prevnext-redis branch from cb88479 to 120a87e Compare August 16, 2018 23:30
@eileenmcnaughton
Copy link
Contributor

@totten so if you can rebase this I can review but I think it needs to be further split into

  1. the rest of the cleanup
  2. activate Redis caching for PrevNext

The reason being that we will need to performance test the second one which might take a bit longer to get answers on

@eileenmcnaughton
Copy link
Contributor

@totten ^^

@eileenmcnaughton
Copy link
Contributor

@totten can you rebase this?

eileenmcnaughton pushed a commit to eileenmcnaughton/civicrm-core that referenced this pull request Sep 11, 2018
(dev/core#217) Query::getCachedContacts - Use swappable fetch() instead of SQL JOIN

The general context of this code is roughly as follows:

* We've already filled up the prevnext cache with a bunch of contact-IDs.
* The user wants to view a page of 50 contacts.
* We want to lookup full information about 50 specific contacts for this page.

It does makes sense to use `CRM_Contact_BAO_Query` for looking up the "full information"
about contacts. However, the function `Query::getCachedContacts()` is hard-coded to
read from the SQL-based prevnext cache.

Before
------

* In `getCachedContacts()`, it grabbed the full SQL for `CRM_Contact_BAO_Query`
  and munged the query to:
    * Add an extra JOIN on `civicrm_prevnext_cache` (with a constraint on `cacheKey`)
    * Respect pagination (LIMIT/OFFSET)
    * Order results based on their position in the prevnext cache

After
-----

* In `CRM_Core_PrevNextCache_Interface`, the `fetch()` function provides one page-worth
  of contact IDs (in order). The `fetch()` function is tested by `E2E_Core_PrevNextTest`.
* In `getCachedContacts()`, it doesn't know anything about `civicrm_prevnext_cache`
  or `cacheKey` or pagination.  Instead, it just accepts CIDs for one page-worth of
  contacts.  It returns contacts in the same order that was given.

(dev/core#217) Implement Redis driver for PrevNext handling

(dev/core#217) PrevNext - Cleanup parameter name in Sql::markSelection

The new name is prettier and matches the names in `CRM_Core_PrevNextCache_{Interface,Redis}`.

(dev/core#217) PrevNext - Add settings for admin to choose backend

The auto-detection is a good default policy.  However, this is new
functionality.  If some bug gets through the review/RC cycles, then this
option provides an escape path.
@totten totten force-pushed the master-prevnext-redis branch from 120a87e to eb1e5ce Compare September 11, 2018 23:09
@totten
Copy link
Member Author

totten commented Sep 11, 2018

@eileenmcnaughton It's rebased now, so this of commits is now shorter.

eileenmcnaughton pushed a commit to eileenmcnaughton/civicrm-core that referenced this pull request Sep 17, 2018
(dev/core#217) Query::getCachedContacts - Use swappable fetch() instead of SQL JOIN

The general context of this code is roughly as follows:

* We've already filled up the prevnext cache with a bunch of contact-IDs.
* The user wants to view a page of 50 contacts.
* We want to lookup full information about 50 specific contacts for this page.

It does makes sense to use `CRM_Contact_BAO_Query` for looking up the "full information"
about contacts. However, the function `Query::getCachedContacts()` is hard-coded to
read from the SQL-based prevnext cache.

Before
------

* In `getCachedContacts()`, it grabbed the full SQL for `CRM_Contact_BAO_Query`
  and munged the query to:
    * Add an extra JOIN on `civicrm_prevnext_cache` (with a constraint on `cacheKey`)
    * Respect pagination (LIMIT/OFFSET)
    * Order results based on their position in the prevnext cache

After
-----

* In `CRM_Core_PrevNextCache_Interface`, the `fetch()` function provides one page-worth
  of contact IDs (in order). The `fetch()` function is tested by `E2E_Core_PrevNextTest`.
* In `getCachedContacts()`, it doesn't know anything about `civicrm_prevnext_cache`
  or `cacheKey` or pagination.  Instead, it just accepts CIDs for one page-worth of
  contacts.  It returns contacts in the same order that was given.

(dev/core#217) Implement Redis driver for PrevNext handling

(dev/core#217) PrevNext - Cleanup parameter name in Sql::markSelection

The new name is prettier and matches the names in `CRM_Core_PrevNextCache_{Interface,Redis}`.

(dev/core#217) PrevNext - Add settings for admin to choose backend

The auto-detection is a good default policy.  However, this is new
functionality.  If some bug gets through the review/RC cycles, then this
option provides an escape path.
@eileenmcnaughton
Copy link
Contributor

eileenmcnaughton commented Sep 18, 2018

@totten I have found a bug in this - when no results are found an sql error is generated in the getCachedContacts query - this fixes it but a better fix may be possible as I think (but haven't verified) that an early return would make sense in the event of no cids at this point

Bad query:

SELECT contact_a.id as contact_id, contact_a.contact_type as `contact_type`, contact_a.contact_sub_type as `contact_sub_type`, contact_a.sort_name as `sort_name`, contact_a.display_name as `display_name`, contact_a.do_not_email as `do_not_email`, contact_a.do_not_phone as `do_not_phone`, contact_a.do_not_mail as `do_not_mail`, contact_a.do_not_sms as `do_not_sms`, contact_a.do_not_trade as `do_not_trade`, contact_a.is_opt_out as `is_opt_out`, contact_a.legal_identifier as `legal_identifier`, 
contact_a.external_identifier as `external_identifier`, contact_a.nick_name as `nick_name`, contact_a.legal_name as `legal_name`, contact_a.image_URL as `image_URL`, contact_a.preferred_communication_method as `preferred_communication_method`, contact_a.preferred_language as `preferred_language`, contact_a.preferred_mail_format as `preferred_mail_format`, contact_a.first_name as `first_name`, contact_a.middle_name as `middle_name`, contact_a.last_name as `last_name`, contact_a.prefix_id as `prefix_id`, 
contact_a.suffix_id as `suffix_id`, contact_a.formal_title as `formal_title`, contact_a.communication_style_id as `communication_style_id`, contact_a.job_title as `job_title`, contact_a.gender_id as `gender_id`, contact_a.birth_date as `birth_date`, contact_a.is_deceased as `is_deceased`, contact_a.deceased_date as `deceased_date`, contact_a.household_name as `household_name`, IF ( contact_a.contact_type = 'Individual', NULL, contact_a.organization_name ) as organization_name, contact_a.sic_code as `sic_code`, contact_a.is_deleted as 
`contact_is_deleted`, IF ( contact_a.contact_type = 'Individual', contact_a.organization_name, NULL ) as current_employer, civicrm_address.id as address_id, civicrm_address.street_address as `street_address`, civicrm_address.supplemental_address_1 as `supplemental_address_1`, civicrm_address.supplemental_address_2 as `supplemental_address_2`, civicrm_address.supplemental_address_3 as `supplemental_address_3`, civicrm_address.city as `city`, civicrm_address.postal_code_suffix as `postal_code_suffix`, civicrm_address.postal_code as `postal_code`, civicrm_address.geo_code_1 as `geo_code_1`, civicrm_address.geo_code_2 as `geo_code_2`, civicrm_address.state_province_id as state_province_id, civicrm_address.country_id as country_id, civicrm_phone.id as phone_id, civicrm_phone.phone_type_id as phone_type_id, civicrm_phone.phone as `phone`, civicrm_email.id as email_id, civicrm_email.email as `email`, civicrm_email.on_hold as `on_hold`, civicrm_im.id as im_id, civicrm_im.provider_id as provider_id, civicrm_im.name as `im`, civicrm_worldregion.id as worldregion_id, civicrm_worldregion.name as `world_region`, 
(CASE
END
) AS _wgt 
 FROM civicrm_contact contact_a 
LEFT JOIN civicrm_address ON ( contact_a.id = civicrm_address.contact_id AND civicrm_address.is_primary = 1 ) LEFT JOIN civicrm_email ON (contact_a.id = civicrm_email.contact_id AND civicrm_email.is_primary = 1) LEFT JOIN civicrm_phone ON (contact_a.id = civicrm_phone.contact_id AND civicrm_phone.is_primary = 1)  
LEFT JOIN civicrm_im ON (contact_a.id = civicrm_im.contact_id AND civicrm_im.is_primary = 1) 
 LEFT JOIN civicrm_country ON civicrm_address.country_id = civicrm_country.id  
LEFT JOIN civicrm_worldregion ON civicrm_country.region_id = civicrm_worldregion.id  
WHERE  (  (  (  ( contact_a.sort_name LIKE 'kjkjhkhjkhjkjhjk%' ) )  )  )  AND (contact_a.is_deleted = 0) AND contact_a.id IN ()  
ORDER BY _wgt

@eileenmcnaughton
Copy link
Contributor

test this please

@eileenmcnaughton
Copy link
Contributor

eileenmcnaughton commented Nov 4, 2018

@totten we've had this in production now for over a month without problems. I just tested the Redis implementation on staging. I exported the first name of all contacts with surnames starting with 'mc' - this gave me a set of contacts of a little over 180,000 which TBH I expected to be a bit slow / painful.

I am surprised to report that both with and without Redis on I would describe the experience as 'snappy'.

I also tried exporting contributions for that same number of contacts. It was anything but snappy and without altering indexes I couldn't get it to complete without a timeout (obviously 180k contacts represents more than 180k donations so the search is getting quite big now). I tried an index change and DID get it to complete - this is what I tried

ALTER TABLE civicrm_contact DROP KEY `index_is_deleted_sort_name`,
ADD KEY `index_sort_name_is_deleted_id` (`sort_name`,`is_deleted`,`id`);

The upshot of the testing being that this points to some new opportunities for improving speed but I think this patch is good to merge once the issue in my comment of Sep 19 is addressed

…ad of SQL JOIN

The general context of this code is roughly as follows:

* We've already filled up the prevnext cache with a bunch of contact-IDs.
* The user wants to view a page of 50 contacts.
* We want to lookup full information about 50 specific contacts for this page.

It does makes sense to use `CRM_Contact_BAO_Query` for looking up the "full information"
about contacts. However, the function `Query::getCachedContacts()` is hard-coded to
read from the SQL-based prevnext cache.

Before
------

* In `getCachedContacts()`, it grabbed the full SQL for `CRM_Contact_BAO_Query`
  and munged the query to:
    * Add an extra JOIN on `civicrm_prevnext_cache` (with a constraint on `cacheKey`)
    * Respect pagination (LIMIT/OFFSET)
    * Order results based on their position in the prevnext cache

After
-----

* In `CRM_Core_PrevNextCache_Interface`, the `fetch()` function provides one page-worth
  of contact IDs (in order). The `fetch()` function is tested by `E2E_Core_PrevNextTest`.
* In `getCachedContacts()`, it doesn't know anything about `civicrm_prevnext_cache`
  or `cacheKey` or pagination.  Instead, it just accepts CIDs for one page-worth of
  contacts.  It returns contacts in the same order that was given.
The new name is prettier and matches the names in `CRM_Core_PrevNextCache_{Interface,Redis}`.
The auto-detection is a good default policy.  However, this is new
functionality.  If some bug gets through the review/RC cycles, then this
option provides an escape path.
@totten totten force-pushed the master-prevnext-redis branch from eb1e5ce to e28bf65 Compare November 27, 2018 23:17
@totten
Copy link
Member Author

totten commented Nov 27, 2018

@eileenmcnaughton I've rebased and included https://gerrit.wikimedia.org/r/#/c/wikimedia/fundraising/crm/civicrm/+/461243/1/CRM/Contact/Selector.php as part of 2ca46d4#diff-143585e32dadc0e4857410eaad9206b4R585 (albeit with a ternary ? : notation).

I vaguely recall having some more discussion about whether to place the emptiness-check inside of or outside of getCachedContacts(); however, given that there is only one caller for getCachedContacts(), it doesn't seem like a big deal, so we'll just follow along with your patch.

Aside: re-reading this after some time, it's tempting to soften the transition policy -- though the fact that you've been using it in prod for a few months is a valid counter-point. If you're inclined toward 👍 on the more conservative policy, I can open it as a separate PR - but it's not a blocker here.

@totten
Copy link
Member Author

totten commented Nov 28, 2018

jenkins, test this please

@eileenmcnaughton
Copy link
Contributor

@totten so in fact we have been running in production still on msql caching - but I'm testing with Redis on staging. I'm just running some performance benchmarks at the moment - maybe we can get some Redis users to test the rc & we can add the conservative approach if need be? - I think @herbdool uses Redis

My understanding is that MemCache & APCCache are unaffected as they don't support the Redis change

@totten
Copy link
Member Author

totten commented Nov 28, 2018

Aah, that's fine by me. It'd be great to get feedback from @herbdool or another Redis user. I've put the other commit (for more conservative transition policy) into its own PR (so that we can do it - or not - async).

You are correct that this only affects Redis -- there is no PrevNext driver for memcache or APC.

@eileenmcnaughton
Copy link
Contributor

eileenmcnaughton commented Nov 28, 2018

Breaking - I have some performance testing results

There are from hitting
civicrm/contact/search?_qf_Basic_display=true&qfKey=9c23cd81fab6d00c4172aa2f49613464_3875&force=1&crmPID=3

100 times with concurrency of 1 & of 5 (no noticeable difference) and each done on

  1. array caching - blue
  2. redis caching + sql prevnext caching (orangey)
  3. redis caching + redis prevenext caching (green)

Note that I altered the page between each run. 3 runs per variant and I eliminated the slowest 20 % which have no obvious pattern but stretch the y axis making it hard to read

timeseriessearch80

@eileenmcnaughton
Copy link
Contributor

test this please

@eileenmcnaughton
Copy link
Contributor

I think fails are unrelated

@eileenmcnaughton eileenmcnaughton merged commit cb5d518 into civicrm:master Nov 28, 2018
@totten
Copy link
Member Author

totten commented Nov 28, 2018

For the record -- that's a really neat graph! 👍

Trying to simplify/analyze a bit, it seems like (with the data-set/use-case/environment under test):

  • The ArrayCache+SQL/blue-dots hover around 900ms (long-tail going higher, and dense block a little lower).
  • The Redis+SQL/orange-dots hover around 850ms (long-tail going higher, and dense block a little lower).
    • That's roughly 50ms (6%) improvement over ArrayCache+SQL.
  • The Redis+Redis/green-dots hover around 790ms (long-tail going higher, and dense block a little lower).
    • That's roughly 110ms (12%) improvement over ArrayCache+SQL.
    • That's roughly 60ms (7%) improvement over Redis+SQL.

@totten totten deleted the master-prevnext-redis branch November 28, 2018 21:04
@eileenmcnaughton
Copy link
Contributor

eileenmcnaughton commented Nov 28, 2018

@totten yeah that sounds about right - I did some separate analysis on Redis on contact edit and I got about 12% performance lift there.

It DOES matter how your site is configured / provisioned. Locally I got a 38% improvement with Redis but it was actually worse once I added in concurrency. Obviously the people who set up Redis on our server did a better job than I did :-)

From a user experience POV - it takes longer locally (no Redis) to load the edit screen on a single contact than on our Redis enabled server. Obviously there are a LOT of differences between the 2 set ups - latency & DB size work against the remote whereas the server is much better provisioned & has Redis enabled. However, the difference is significant from a usablity POV - ie. the difference between 'snappy' and 'I got bored & flipped to a different tab'

@eileenmcnaughton
Copy link
Contributor

eileenmcnaughton commented Nov 28, 2018

@totten also locally I would note that while it is notably faster locally I have had to turn it off - just because there are things that cv flush misses with Redis turned on that make it a bit more painful to develop against when tinkering with things that might be cached (I think cached settings is one but I am not 100% sure what I was doing when I turned it off)

totten added a commit to totten/civicrm-core that referenced this pull request Nov 28, 2018
The setting `prevNextBackend` was introduced in PR civicrm#12665.  The PR was
originally written for an earlier version (circa 5.6) and eventually
merged in a later version (circa 5.9). The metadata should match
the version-number of the actual release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants