Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer - Increase size of queue batches #24151

Merged
merged 3 commits into from
Aug 4, 2022
Merged

Conversation

totten
Copy link
Member

@totten totten commented Aug 4, 2022

Overview

Addresses a simple, recent, and significant change in performance.

See #23669 (comment)

@civibot
Copy link

civibot bot commented Aug 4, 2022

(Standard links)

@eileenmcnaughton
Copy link
Contributor

this is fine - I probably would have waited on @andyburnsco testing - but if you want to change in 5.52 that's OK. I have been digging into a performance issue that impacts imports but it was unrelated to this

@totten
Copy link
Member Author

totten commented Aug 4, 2022

Yeah, it would be nice to hear back how that went. But he did an emoji on my '100' suggestion...

Maybe we go for the safer 50 on both #24151 and 24152 then bump it up further when we hear back.

(Assuming #24152 also passes in a few minutes, I think we could just edit this one to 50 and go straight to merge.)

@totten totten merged commit 0620525 into civicrm:5.52 Aug 4, 2022
@totten totten deleted the 5.52-batchsize branch August 4, 2022 08:47
@andyburnsco
Copy link
Contributor

andyburnsco commented Aug 4, 2022

this is fine - I probably would have waited on @andyburnsco testing - but if you want to change in 5.52 that's OK. I have been digging into a performance issue that impacts imports but it was unrelated to this

I tried 250 and 500 batchSize, the performance did not noticeably change, which is different than my experience with API CSV Importer (it made a difference there). 1 to 1.5s per contact import with several custom fields.

50 or 100 hardcoded is reasonable given normal max execution parameters until this can be a setting on the site and/or per import job.

However, I did hit concerning errors and one that broke the db and civi requiring db restore, but could not find a debug and backtrace for it, will test again today. It was a vague DB Error: constraint violation. I was able to trigger the error importing 1 record by including county, removing county worked. However, the behavior is not consistent, county has imported. Imports seem to go sideways after awhile.

On an import that broke the db, it imported 1830 records consecutively (status column = IMPORTED) and then ERROR for the rest with vague DB Error: unknown error. Not seeing any bad data that would have caused it.

@eileenmcnaughton what issues or PR's are the performance related ones? We may be able to pitch in some funding. Background processing, multiple jobs / threads are all priorities for us.

@eileenmcnaughton
Copy link
Contributor

@andyburnsco the biggest part of the performance issue in my testing turned out to be a cache miss - #24156 - we use a Redis cache and that seems to be slower than I might have expected - so I'm not sure if you will get the same improvement with that fix & some related fixes that we did (I'm also going to check in with our sysadmins on the speed of the Redis cache which seems to be a confounding factor).

@eileenmcnaughton
Copy link
Contributor

Please create a gitlab with whatever you can find on the DB Error: unknown error

@andyburnsco
Copy link
Contributor

Seems like this merge did not take. Just noting a small batch queue creates a delay between queues (3-4s in my tests) looking at the detailed logging table.
image

It was 5 this PR #24151 made it 100 now it is 5 again... https://github.com/civicrm/civicrm-core/blob/master/CRM/Import/Parser.php#L673

Noting detailed logging off doesn't seem to have that much of an effect, which is surprising.

@andyburnsco
Copy link
Contributor

50 is probably what it should be and is safest for all installs: #24152

eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Oct 20, 2022
Overview
----------------------------------------
We increased batch size to 50 in 5.52 but appear to have accidentally
re-set it to 5. This was presumably due to code being merged that
patched the code with the older value & hadn't picked up the new

Before
----------------------------------------
Each batch is 5 records

After
----------------------------------------
Each batch is 50 records

Technical Details
----------------------------------------
There seems to be consensus this should be at least 50.
100 is also on the table but I went with Andy's comment
civicrm#24151 (comment)

The issue Andy points out about the 3 second delay between batches seems excessive
and worth digging into. There used to be a 30 second delay in the UI just to
let people really luxuriate in watching our batch screen - not sure
if that is still there?
@eileenmcnaughton
Copy link
Contributor

See #24772

eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Oct 20, 2022
Overview
----------------------------------------
We increased batch size to 50 in 5.52 but appear to have accidentally
re-set it to 5. This was presumably due to code being merged that
patched the code with the older value & hadn't picked up the new

Before
----------------------------------------
Each batch is 5 records

After
----------------------------------------
Each batch is 50 records

Technical Details
----------------------------------------
There seems to be consensus this should be at least 50.
100 is also on the table but I went with Andy's comment
civicrm#24151 (comment)

The issue Andy points out about the 3 second delay between batches seems excessive
and worth digging into. There used to be a 30 second delay in the UI just to
let people really luxuriate in watching our batch screen - not sure
if that is still there?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants