-
-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dev/core#1191 MySQL 61-table join limit - patch SelectQuery to work with many custom groups #18968
Conversation
Can one of the admins verify this patch? |
(Standard links)
|
Add to whitelist |
@AsylumSeekersCentre looks like there are some style issues in your code are you able to fix https://test.civicrm.org/job/CiviCRM-Core-PR/37893/checkstyleResult/new/ ? (click on Details to see more information there) |
Done! For my future reference, is there a way to run these checks offline before creating the pull request? |
@AsylumSeekersCentre yes there is if you have installed buildkit there is a tool called |
test this please |
…y so that it works with arbitrary numbers of custom fields
Rebased and updated the PHPUnit test for compatibility with current PHPUnit arrangement. |
We hit this problem a few months ago with one Powerbase group and we resolved it by cleaning up a lot of old, unsed or poorly designed custom data sets. CiviCRM is pretty un-opininated about how you design your custom data - so I am curious about how many CiviCRM installs both hit this limit and need this many custom data sets? In my experience, most groups that come close to approaching this limit are engaged in poor data design practices. If this is the case, it would be helpful to get a friendly message warning you that you have too many joins, which can be resolved by reducing the number of custom data sets rather then getting the ugly back trace. I'm probably wrong - and maybe there are good use cases, but wanted to at least ask the question since this PR introduces considerable complexity to already considerably complex code. |
I second the concern about adding complexity to an already complex system. If we are serious about supporting 61+ custom groups, then this PR is only the beginning (and definitely not the end) of the work needed. What this PR does:
What it doesn't affect:
I note that APIv4 is the basis of all new code in CiviCRM (e.g. SearchKit) and it has a much more complicated query builder which handles a wider range of SQL expressions (group by, having, functions, joins, etc). It may not be easy (or possible) to port this PR to APIv4. |
I also second above comments: it seems we are introducing a lot of complexity for a very narrow use case, and one that could most probably easily be resolved by simply re-organizing these fields in a smaller number of custom groups. |
I agree with the above, especially this being the beginning and not the end of the work required to enable larger numbers of custom groups. We did not proceed with this change, on the basis that reversing the decision without breaking functionality would be very difficult. As colemanw notes, there are areas it does not address. We would almost certainly have encountered problems. On the point of re-organising the data to fit into fewer custom groups, this simply buys us time, and already does so at the cost of taking down permission barriers which should exist. I would like to elaborate a bit about our use of custom fields, and maybe someone can tell me if there is a sensible way to continue extending our system: Most of our custom groups apply to a single Activity type. Pathology fields are added to Pathology activities, Material Donation fields are added to Material Donation activities. They are completely unrelated, never appear on the same Activities as each other, and some of them contain sensitive information whose visibility is restricted by ACLs. We are periodically called upon to begin tracking new data which seems most suitable for a new Activity type with a new unique set of Custom Fields, completely unrelated to all previous custom groups, some of which may contain sensitive data needing ACL control. Collapsing custom groups together which belong on different Activity types is not good, because then we have irrelevant fields displayed to the user and they have to scan through a long list for the few which contain data, and creating webforms requires extra work to remove all the irrelevant fields. Collapsing custom groups together which contain duplicate fields but have different ACL requirements is not good, because (for example) then the Employment team can read custom fields from the Health team, which they should not be able to see. We have already been given permission to do this for some legacy custom fields which were used for data migrated from our old database. We now have four fieldsets condensed into one super fieldset with sparsely populated fields and without the granular ACL control it should have. This has bought us some time, but it doesn't solve the problem. It's possible that eventually I will begin writing extensions which create new Entities to allow us to continue extending the system as required, but then we don't have the considerable advantages of custom fields (e.g. easy inclusion in reports and webforms). Are we missing something? I would like to add that I didn't design our system, and had little experience with civicrm prior to migrating a very different database onto it. If there is a better solution which we have not seen, it wouldn't be a shock. It's also worth noting that the number of custom fieldsets required to exceed the limit is significantly below 61. I seem to recall from when I was writing this code that the number of "spare" joins in some code pathways is about 47, and each contact reference adds another join so some of our custom groups add two or three joins rather than one. Having said all of the above, I do agree that the code in this pull request is definitely not suitable for merging. After reading some of the discussions about this issue my impression was of significant interest in having it solved (albeit from a minority of users). I wanted to make this visible for discussion rather than sitting on it forever. Definitely it would help to add clear information to the user interface about how many custom groups or contact reference fields can be added. We discovered the existence of this limit when we exceeded it in production for the first time, a few months after launching the system with a lot of data already migrated into custom fields. |
@colemanw - to your points - I would note that this probably mostly affects api v3 because if you don't set any return properties in apiv3 or export then it will return all custom fields - this is pretty common practice (we might discourage it but I suspect that more than 50% of api calls in core and extensions don't limit the return values). So my guess is that v3 api & export would be the only places where it's a problem in practice. Having said that @AsylumSeekersCentre says they didn't proceed with the change so it might be this should be closed & returned to gitlab (& the next time someone hits it we can discuss again) |
@eileenmcnaughton my experience in testing was consistent with what you say, that other parts of the system (not going through this code pathway) didn't seem to have the problem of exceeding the join limit. However, we were not confident to proceed because we really don't know enough about the system as a whole. |
This is a good point. Here's another way to frame the issue: Every query has a JOIN budget of 61. That's fairly high. Why does use-case Even if a system has 200
Yeah, the key phrase here seems to be "return all custom fields". There could be 1 or 2 legit cases for that ("export the entire database" or "fulfill a GDPR subject access request"), and I imagine a handful of such cases would have straight-forward patches. The bigger issue is that "return all custom fields" is a nice short-cut for the developer-experience. (It takes extra work to specifically identify the fields you want...) Other techniques that might be used to cope with that:
Those techniques would mean that inhouse code and adhoc code can still work fine in the normal case, but it would help draw attention to any core/common screens which make overly-broad queries. |
@AsylumSeekersCentre I understand your use case, and this is one that might not easily be addressed by CiviCRM, but by an unstructured / loosely structured database as the very structure of your data is so rich and constantly changing. What you are trying to address is a limitation of MySQL (61-table joins), not of CiviCRM in itself. We just started to unravel the complexity of addressing this in CiviCRM (and then Drupal/webforms), you might have an easier path looking at non-relational databases to address your use case. This is my honest opinion, others might disagree (and please speak up if so). |
Now that I understand the use-case better, I think the real issue is the API indiscriminately joining every possible custom field when that really isn't needed. Unfortunately this is the default behavior in APIv3 (but happily not in v4), and a lot of api calls carelessly leave off any A good approach to fixing it might be to have @AsylumSeekersCentre post backtraces whenever they encounter a crash and we can patch the offending api params to be more specific about the fields to return. |
@eileenmcnaughton That was a brilliant experiment in 20569. I'm not sure what "the UNION approach above" refers to. I'd need more description to understand. My personal intuition is that if one actually needed to return an API call with such a high number of CustomGroups, then it makes sense to split that out as entirely distinct queries (and then merge results in PHP). As a general matter, big +1 for @colemanw's point about sharing/discussing backtraces as a way to figure out why there are too many joins. |
Yes, I'm definitely interested in helping to identify API calls which overflow the limit. I'll do this on a test system with extra groups added to trigger the problem. I'll be doing this between other tasks, so my capacity may not be very great, but I'll contribute as much as I can. |
What would be the best way to share these backtraces, and how much information should accompany them? For example, this one was triggered by editing and then saving a File Note on a Case (on a system with too many custom groups): Creating a new File Note and saving it doesn't trigger the error, only editing one which already exists. The CiviCRM version there is 5.38.0 and the CMS us Drupal 7. Would it be better to do these tests on the master branch, or the latest stable release? I can post the failing query in a separate gist if it's wanted, but it's quite long. The error message takes the format: |
Thanks for the gist @AsylumSeekersCentre. It seems to me that this bit is where it says "give me all custom fields":
If I'm reading correctly, this is a use-case that does want to load an open-ended set of custom fields. But (in my untested opinion) it should be acceptable to load this via N-ary separate (Hmm... the dark-arts of API-chaining... it might be possible to mix a top-level query to |
So -is that code copying the custom fields from one activity to another via some functions that require quite a bit of formatting? If so then I think it might be easier to start from what it's trying to do & figure out the right approach rather than 'help' it to do what it's doing This seems to come back to our desire for a 'clone' action - which I think is something @colemanw has discussed before & is also likely at the heart of our open regression I've been trying to clean up some of the places where the tests don't set 'return' because there are so many in the tests that it's hard to close in on the ones in 'real live code' - - current pr #20618 |
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
I note the above points to Activity.get as being the one causing the issues so I am trying the test-finder approach on it #18968 |
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
It turns out that the area most causey of civicrm#18968 is activity get calls with no 'return' - which should be less that the contact.get ones so trying to deprecate no-return
Through the tests I've found another place with an unconstrained 'return' - which is civicrm-core/CRM/Member/BAO/Membership.php Line 172 in 61194d4
& which would mean the many-activities + membership would be very breaky |
Just re-visiting this - I'm actually going to close it & let it be tracked in gitlab since I guess this is stalled and our process is to close things and return them to gitlab if it turns out that conceptual agreement has not been reached. I'm a bit gutted tbh but I think that's realistically where this PR is at |
Yes, I agree. It's unfortunate but the scope of the work from here is beyond me at the moment. |
Overview
Relevant issue
Older stack exchange question
When many custom groups relate to a given Entity, API calls start failing with a "DB Error" caused by the MySQL 61-table join limit. This patch splits large queries into smaller queries and collates the results.
This pull request is probably not suitable for merging, though we are considering applying it to our production site. I want to present the code for discussion and review, but it's a significant change which may have implications for parts of the system which I'm not aware of.
Before
The API failures when exceeding the join limit could produce various results. Sometimes the user would see a yellow banner with the message "DB Error: Unknown Error". Sometimes (e.g. in a modal window in a Manage Case screen when saving an Activity) the result would be an ever-spinning Civi logo, combined with an error in the Drupal log.
After
The system continues to operate normally, even when there are more than 60 Custom Groups applied to Activities.
Technical Details
If it sees that it's going to exceed the join limit, it will split the query into smaller queries. It does this by classifying the joins as "mandatory", "dependant", or "optional". Mandatory joins are the ones which appear in the latter clauses and so form part of the criteria.
Each smaller query contains all of the mandatory joins and any which depend upon them. The remaining joins are distributed between as many queries as are required. The results of the queries are collated into a single data set and returned.
This process is not valid if there is a GROUP BY clause. However, I have not found a way to generate such a query with APIv3.
It's worth noting that it's not just Custom Groups which contribute to the number of joins. Each Custom Field which refers to a Contact adds another join. For example, we have a custom group with two Contact references - a "Donor" and a "Courier". This custom group contributes three joins toward the limit.
Comments
When I ran the full suite of API tests, one extra test failed which didn't fail when the patch was not applied:
Also, I tried preparing this as an extension which would override the changed files, but I could not get it to operate in that form - it appeared to continue using the core Civi files. An extension may be a better format for this feature if I can get it working.