Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev/core#1191 MySQL 61-table join limit - patch SelectQuery to work with many custom groups #18968

Closed

Conversation

AsylumSeekersCentre
Copy link
Contributor

Overview

Relevant issue
Older stack exchange question

When many custom groups relate to a given Entity, API calls start failing with a "DB Error" caused by the MySQL 61-table join limit. This patch splits large queries into smaller queries and collates the results.

This pull request is probably not suitable for merging, though we are considering applying it to our production site. I want to present the code for discussion and review, but it's a significant change which may have implications for parts of the system which I'm not aware of.

Before

The API failures when exceeding the join limit could produce various results. Sometimes the user would see a yellow banner with the message "DB Error: Unknown Error". Sometimes (e.g. in a modal window in a Manage Case screen when saving an Activity) the result would be an ever-spinning Civi logo, combined with an error in the Drupal log.

After

The system continues to operate normally, even when there are more than 60 Custom Groups applied to Activities.

Technical Details

If it sees that it's going to exceed the join limit, it will split the query into smaller queries. It does this by classifying the joins as "mandatory", "dependant", or "optional". Mandatory joins are the ones which appear in the latter clauses and so form part of the criteria.

Each smaller query contains all of the mandatory joins and any which depend upon them. The remaining joins are distributed between as many queries as are required. The results of the queries are collated into a single data set and returned.

This process is not valid if there is a GROUP BY clause. However, I have not found a way to generate such a query with APIv3.

It's worth noting that it's not just Custom Groups which contribute to the number of joins. Each Custom Field which refers to a Contact adds another join. For example, we have a custom group with two Contact references - a "Donor" and a "Courier". This custom group contributes three joins toward the limit.

Comments

When I ran the full suite of API tests, one extra test failed which didn't fail when the patch was not applied:

not ok 1413 - Failure: api_v3_LoggingTest::testEnableDisableLogging

Also, I tried preparing this as an extension which would override the changed files, but I could not get it to operate in that form - it appeared to continue using the core Civi files. An extension may be a better format for this feature if I can get it working.

@civicrm-builder
Copy link

Can one of the admins verify this patch?

@civibot
Copy link

civibot bot commented Nov 13, 2020

(Standard links)

@civibot civibot bot added the master label Nov 13, 2020
@eileenmcnaughton
Copy link
Contributor

Add to whitelist

@seamuslee001
Copy link
Contributor

@AsylumSeekersCentre looks like there are some style issues in your code are you able to fix https://test.civicrm.org/job/CiviCRM-Core-PR/37893/checkstyleResult/new/ ? (click on Details to see more information there)

@AsylumSeekersCentre
Copy link
Contributor Author

Done!

For my future reference, is there a way to run these checks offline before creating the pull request?

@seamuslee001
Copy link
Contributor

@AsylumSeekersCentre yes there is if you have installed buildkit there is a tool called civilint which is what is used and if you execute that on your changed but not yet checked in code it will find those errors

@eileenmcnaughton
Copy link
Contributor

test this please

@totten
Copy link
Member

totten commented May 29, 2021

Rebased and updated the PHPUnit test for compatibility with current PHPUnit arrangement.

@jmcclelland
Copy link
Contributor

We hit this problem a few months ago with one Powerbase group and we resolved it by cleaning up a lot of old, unsed or poorly designed custom data sets.

CiviCRM is pretty un-opininated about how you design your custom data - so I am curious about how many CiviCRM installs both hit this limit and need this many custom data sets? In my experience, most groups that come close to approaching this limit are engaged in poor data design practices. If this is the case, it would be helpful to get a friendly message warning you that you have too many joins, which can be resolved by reducing the number of custom data sets rather then getting the ugly back trace.

I'm probably wrong - and maybe there are good use cases, but wanted to at least ask the question since this PR introduces considerable complexity to already considerably complex code.

@colemanw
Copy link
Member

colemanw commented Jun 9, 2021

I second the concern about adding complexity to an already complex system. If we are serious about supporting 61+ custom groups, then this PR is only the beginning (and definitely not the end) of the work needed.

What this PR does:

  • Fixes APIv3 query builder fetch custom data in batches

What it doesn't affect:

  • APIv3 entities which do not use the query builder (Contact, Participant, Contribution)
  • APIv4 queries
  • CiviReport queries
  • Basic/Advanced/Custom Search queries
  • Any other hand-coded query in the system

I note that APIv4 is the basis of all new code in CiviCRM (e.g. SearchKit) and it has a much more complicated query builder which handles a wider range of SQL expressions (group by, having, functions, joins, etc). It may not be easy (or possible) to port this PR to APIv4.

@nganivet
Copy link
Contributor

nganivet commented Jun 9, 2021

I also second above comments: it seems we are introducing a lot of complexity for a very narrow use case, and one that could most probably easily be resolved by simply re-organizing these fields in a smaller number of custom groups.

@AsylumSeekersCentre
Copy link
Contributor Author

AsylumSeekersCentre commented Jun 10, 2021

I agree with the above, especially this being the beginning and not the end of the work required to enable larger numbers of custom groups. We did not proceed with this change, on the basis that reversing the decision without breaking functionality would be very difficult. As colemanw notes, there are areas it does not address. We would almost certainly have encountered problems.

On the point of re-organising the data to fit into fewer custom groups, this simply buys us time, and already does so at the cost of taking down permission barriers which should exist. I would like to elaborate a bit about our use of custom fields, and maybe someone can tell me if there is a sensible way to continue extending our system:

Most of our custom groups apply to a single Activity type. Pathology fields are added to Pathology activities, Material Donation fields are added to Material Donation activities. They are completely unrelated, never appear on the same Activities as each other, and some of them contain sensitive information whose visibility is restricted by ACLs. We are periodically called upon to begin tracking new data which seems most suitable for a new Activity type with a new unique set of Custom Fields, completely unrelated to all previous custom groups, some of which may contain sensitive data needing ACL control.

Collapsing custom groups together which belong on different Activity types is not good, because then we have irrelevant fields displayed to the user and they have to scan through a long list for the few which contain data, and creating webforms requires extra work to remove all the irrelevant fields.

Collapsing custom groups together which contain duplicate fields but have different ACL requirements is not good, because (for example) then the Employment team can read custom fields from the Health team, which they should not be able to see. We have already been given permission to do this for some legacy custom fields which were used for data migrated from our old database. We now have four fieldsets condensed into one super fieldset with sparsely populated fields and without the granular ACL control it should have. This has bought us some time, but it doesn't solve the problem.

It's possible that eventually I will begin writing extensions which create new Entities to allow us to continue extending the system as required, but then we don't have the considerable advantages of custom fields (e.g. easy inclusion in reports and webforms).

Are we missing something? I would like to add that I didn't design our system, and had little experience with civicrm prior to migrating a very different database onto it. If there is a better solution which we have not seen, it wouldn't be a shock.

It's also worth noting that the number of custom fieldsets required to exceed the limit is significantly below 61. I seem to recall from when I was writing this code that the number of "spare" joins in some code pathways is about 47, and each contact reference adds another join so some of our custom groups add two or three joins rather than one.

Having said all of the above, I do agree that the code in this pull request is definitely not suitable for merging. After reading some of the discussions about this issue my impression was of significant interest in having it solved (albeit from a minority of users). I wanted to make this visible for discussion rather than sitting on it forever.

Definitely it would help to add clear information to the user interface about how many custom groups or contact reference fields can be added. We discovered the existence of this limit when we exceeded it in production for the first time, a few months after launching the system with a lot of data already migrated into custom fields.

@eileenmcnaughton
Copy link
Contributor

@colemanw - to your points - I would note that this probably mostly affects api v3 because if you don't set any return properties in apiv3 or export then it will return all custom fields - this is pretty common practice (we might discourage it but I suspect that more than 50% of api calls in core and extensions don't limit the return values).

So my guess is that v3 api & export would be the only places where it's a problem in practice.

Having said that @AsylumSeekersCentre says they didn't proceed with the change so it might be this should be closed & returned to gitlab (& the next time someone hits it we can discuss again)

@AsylumSeekersCentre
Copy link
Contributor Author

@eileenmcnaughton my experience in testing was consistent with what you say, that other parts of the system (not going through this code pathway) didn't seem to have the problem of exceeding the join limit. However, we were not confident to proceed because we really don't know enough about the system as a whole.

@totten
Copy link
Member

totten commented Jun 10, 2021

(@AsylumSeekersCentre) It's also worth noting that the number of custom fieldsets required to exceed the limit is significantly below 61. I seem to recall from when I was writing this code that the number of "spare" joins in some code pathways is about 47, and each contact reference adds another join so some of our custom groups add two or three joins rather than one.

This is a good point. Here's another way to frame the issue: Every query has a JOIN budget of 61. That's fairly high. Why does use-case $X go over budget?

Even if a system has 200 CustomGroups, it seems like one should be able to do most sensible tasks without needing 61 JOINs... because you don't need 200 CustomGroups all in the same moment.

(@eileenmcnaughton) I would note that this probably mostly affects api v3 because if you don't set any return properties in apiv3 or export then it will return all custom fields - this is pretty common practice (we might discourage it but I suspect that more than 50% of api calls in core and extensions don't limit the return values).

Yeah, the key phrase here seems to be "return all custom fields". There could be 1 or 2 legit cases for that ("export the entire database" or "fulfill a GDPR subject access request"), and I imagine a handful of such cases would have straight-forward patches. The bigger issue is that "return all custom fields" is a nice short-cut for the developer-experience. (It takes extra work to specifically identify the fields you want...)

Other techniques that might be used to cope with that:

  • If a request says "return all custom fields" and goes over budget, then arbitrarily cut down the list of CustomGroups in the results (array_slice($allGroups, 0, MAX_GROUPS)) and display a warning.
  • If the system is in some kind of strict/debug/developer mode, and if there is any query that says "return all custom fields", then emit a warning.

Those techniques would mean that inhouse code and adhoc code can still work fine in the normal case, but it would help draw attention to any core/common screens which make overly-broad queries.

@nganivet
Copy link
Contributor

@AsylumSeekersCentre I understand your use case, and this is one that might not easily be addressed by CiviCRM, but by an unstructured / loosely structured database as the very structure of your data is so rich and constantly changing. What you are trying to address is a limitation of MySQL (61-table joins), not of CiviCRM in itself. We just started to unravel the complexity of addressing this in CiviCRM (and then Drupal/webforms), you might have an easier path looking at non-relational databases to address your use case. This is my honest opinion, others might disagree (and please speak up if so).

@eileenmcnaughton
Copy link
Contributor

eileenmcnaughton commented Jun 10, 2021

@totten do you think it would be possible to alter the query via hooks to achieve the UNION approach above? I guess that would save us from long-term committing to extra complexity in core,

I suspect this will throw up a LOT of fails #20569

@colemanw
Copy link
Member

Now that I understand the use-case better, I think the real issue is the API indiscriminately joining every possible custom field when that really isn't needed. Unfortunately this is the default behavior in APIv3 (but happily not in v4), and a lot of api calls carelessly leave off any return value from their params.

A good approach to fixing it might be to have @AsylumSeekersCentre post backtraces whenever they encounter a crash and we can patch the offending api params to be more specific about the fields to return.

@totten totten mentioned this pull request Jun 12, 2021
@totten
Copy link
Member

totten commented Jun 12, 2021

@eileenmcnaughton That was a brilliant experiment in 20569.

I'm not sure what "the UNION approach above" refers to. I'd need more description to understand.

My personal intuition is that if one actually needed to return an API call with such a high number of CustomGroups, then it makes sense to split that out as entirely distinct queries (and then merge results in PHP).

As a general matter, big +1 for @colemanw's point about sharing/discussing backtraces as a way to figure out why there are too many joins.

@AsylumSeekersCentre
Copy link
Contributor Author

AsylumSeekersCentre commented Jun 13, 2021

Yes, I'm definitely interested in helping to identify API calls which overflow the limit. I'll do this on a test system with extra groups added to trigger the problem.

I'll be doing this between other tasks, so my capacity may not be very great, but I'll contribute as much as I can.

@AsylumSeekersCentre
Copy link
Contributor Author

AsylumSeekersCentre commented Jun 15, 2021

What would be the best way to share these backtraces, and how much information should accompany them?

For example, this one was triggered by editing and then saving a File Note on a Case (on a system with too many custom groups):
https://gist.github.com/AsylumSeekersCentre/12fedb5b245cf8c6dc5d6254c20329bd

Creating a new File Note and saving it doesn't trigger the error, only editing one which already exists.

The CiviCRM version there is 5.38.0 and the CMS us Drupal 7. Would it be better to do these tests on the master branch, or the latest stable release?

I can post the failing query in a separate gist if it's wanted, but it's quite long. The error message takes the format:
$Fatal Error Details = Array ( [callback] => Array ( [0] => CRM_Core_Error [1] => exceptionHandler ) [code] => -1 [message] => DB Error: unknown error [mode] => 16 [debug_info] => SELECT <snip full query> [nativecode=1116 ** Too many tables; MariaDB can only use 61 tables in a join]

@totten
Copy link
Member

totten commented Jun 15, 2021

Thanks for the gist @AsylumSeekersCentre.

It seems to me that this bit is where it says "give me all custom fields":

#24 /var/www/html/example.com/sites/all/modules/civicrm/api/api.php(132): Civi\API\Kernel->runSafe("Activity", "getsingle", (Array:2)) 
#25 /var/www/html/example.com/sites/all/modules/civicrm/CRM/Case/Form/Activity.php(419): civicrm_api3("Activity", "getsingle", (Array:2)) 
#26 /var/www/html/example.com/sites/all/modules/civicrm/CRM/Core/Form.php(526): CRM_Case_Form_Activity->postProcess() 

If I'm reading correctly, this is a use-case that does want to load an open-ended set of custom fields. But (in my untested opinion) it should be acceptable to load this via N-ary separate SELECT statements. It might be nice to have a helper or flag to support that request.

(Hmm... the dark-arts of API-chaining... it might be possible to mix a top-level query to CustomGroup.get with chained subqueries for CustomValue.get? But still, a high-level helper/flag seems better to my gut. Filtering on civicrm_custom_group.extends can be quirky, and a helper/flag can encapsulate those quirks.)

@eileenmcnaughton
Copy link
Contributor

So -is that code copying the custom fields from one activity to another via some functions that require quite a bit of formatting? If so then I think it might be easier to start from what it's trying to do & figure out the right approach rather than 'help' it to do what it's doing

This seems to come back to our desire for a 'clone' action - which I think is something @colemanw has discussed before & is also likely at the heart of our open regression

I've been trying to clean up some of the places where the tests don't set 'return' because there are so many in the tests that it's hard to close in on the ones in 'real live code' - - current pr #20618

eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 16, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
@eileenmcnaughton
Copy link
Contributor

I note the above points to Activity.get as being the one causing the issues so I am trying the test-finder approach on it #18968

@eileenmcnaughton eileenmcnaughton changed the title MySQL 61-table join limit - patch SelectQuery to work with many custom groups dev/core#1191 MySQL 61-table join limit - patch SelectQuery to work with many custom groups Jun 16, 2021
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 16, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 16, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 16, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 17, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 17, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 17, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 17, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 17, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 17, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Jun 17, 2021
It turns out that the area most causey of civicrm#18968
is activity get calls with no 'return' - which should be less that the contact.get
ones so trying to deprecate no-return
@eileenmcnaughton
Copy link
Contributor

Through the tests I've found another place with an unconstrained 'return' - which is

$activityParams['id'] = civicrm_api3('Activity', 'Get', [

& which would mean the many-activities + membership would be very breaky

@eileenmcnaughton
Copy link
Contributor

Just re-visiting this - I'm actually going to close it & let it be tracked in gitlab since I guess this is stalled and our process is to close things and return them to gitlab if it turns out that conceptual agreement has not been reached. I'm a bit gutted tbh but I think that's realistically where this PR is at

@AsylumSeekersCentre
Copy link
Contributor Author

Yes, I agree. It's unfortunate but the scope of the work from here is beyond me at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants