Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev/core/544 Add report support for filter on multiple contact subtypes #13158

Merged
merged 1 commit into from
Mar 28, 2019

Conversation

elisseck
Copy link
Contributor

Overview

Issue is described at https://lab.civicrm.org/dev/core/issues/544

Before

Reports do not support filtering on multiple contact subtypes. Examples:

In the Contact Summary report, filtering on contact subtype 'is one of' 'Student' will bring up contacts with subtype 'Student', but not contacts with multiple subtypes including 'Student'.

Similarly, filtering on contact subtype 'is one of' 'Student', 'Volunteer' will bring up contacts with subtype 'Student', but not with multiple subtypes including 'Student', and not with the exact subtypes 'Student', 'Volunteer'.

After

Filtering on contact subtype 'is one of' with contacts that have multiple subtypes, or using multiple filters works as expected e.g. 'Student', 'Volunteer' will bring up contacts with subtypes 'Student', 'Volunteer', and 'StudentVolunteer' etc.

Technical Details

This is kind of a bandaid PR for this functionality rather than changing how it's stored. I've added whereSubtypeClause() to the report form and called it if we've got the contact_subtype field to get the desired results.

@civibot
Copy link

civibot bot commented Nov 26, 2018

(Standard links)

@civibot civibot bot added the master label Nov 26, 2018
for ($i = 0; $i < $subtypeFilters; $i++) {
$clause .= "{$field['dbAlias']} LIKE '%$value[$i]%'";
if ($i !== ($subtypeFilters - 1)) {
$clause .= " OR ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eileenmcnaughton will the OR here cause performance issues possibly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a whirl - but my first instinct here is that we are defining the field metadata incorrectly in some way or this wouldn't be needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eileenmcnaughton I agree as commented in the issue I was surprised to see how this was stored. From my point of view using this function is kind of like "this is the result we're shooting for in terms of output, and if we can get there in another way without causing too much ruckus elsewhere... even better"

@seamuslee001 I didn't notice delay or anything but I also didn't do any real stress test and my dev environment has access to lots of memory, so that's a good point.

@eileenmcnaughton
Copy link
Contributor

OK - I understand the issue now - but it makes me wonder - we also have custom fields that get stored this way - I wonder if this applies to them too....

@elisseck
Copy link
Contributor Author

@eileenmcnaughton good question! I looked into this by creating a custom field that accepts multiple values (multi-select). Looks like instead of using IN (1, 2) or LIKE %1% OR LIKE %2% the report is using REGEXP '([[:cntrl:]]|^)1([[:cntrl:]]|$)|([[:cntrl:]]|^)2([[:cntrl:]]|$)' to get around the same 'single line with control characters' storage concept.

IANAE but I my instinct is that this is also slow if not slower unless you need to match some higher number of option values (each one adding an OR ... LIKE). I think i'll make some time to test that to see if this PR should use that strategy instead... but ultimately if we are concerned with performance I think we'd normalize these types of multi-option value fields. Honestly that's got my vote if it's possible but it's a much more daunting project.

@eileenmcnaughton
Copy link
Contributor

@elisseck looks like the regex was added by @jitendrapurohit in conjunction with CRM-18803 - I would be interested in performance stats on it

@eileenmcnaughton
Copy link
Contributor

' normalize these types of multi-option value field' - definitely too hard!

@elisseck
Copy link
Contributor Author

@eileenmcnaughton understood about normalization I thought as much :).

I don't have access to fancy SQL performance tools at the moment... but I tried the queries with a custom field in the following manner on a local dev server:

  • Created a custom multi-select field with 25 options

  • Filled database with ~10000 contacts with quasi-random selections for the custom field

  • Ran the entire summary report query in mysql as it gets generated in the developer tab for the REGEXP queries, and replacing the REGEXP with what would be generated in the LIKE %foo% OR... type of query:

    1. LIKE...OR searching for 'is one of' a single value

like_or_1_value

  1. LIKE...OR searching for 'is one of' any of the 25 values

like_or_25_values

  1. REGEXP searching for 'is one of' a single value

regexp_1_value

  1. REGEXP searching for 'is one of' any of the 25 values

regexp_25_values

Seems to me for most use-cases they're going to be largely the same in terms of time... but in terms of what the optimizer thinks of as query cost or at scale with lots of values avoiding regexp seems to be the way to go.

@eileenmcnaughton
Copy link
Contributor

I couldn't merge this without a test but I decided to write one for you in this case (which also gives a little cover to the grant detail write @agh1 did too.

I'll merge this (which does fix the issue) & push that up separately

@eileenmcnaughton eileenmcnaughton merged commit 558bcc4 into civicrm:master Mar 28, 2019
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Mar 28, 2019
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Mar 28, 2019
eileenmcnaughton added a commit that referenced this pull request Mar 28, 2019
@eileenmcnaughton
Copy link
Contributor

related regression https://lab.civicrm.org/dev/report/issues/15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants