-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cursor population in batches, only when batchSize is passed #9366
Implement cursor population in batches, only when batchSize is passed #9366
Conversation
… set on the query
I'm willing to hear comments on this feature, i noted the current cursor population queries all populate fields per document which is very slow when multiple populate fields are requested and the main query total size is large. I turned on logging for the Mongo driver while testing the new feature and effectively the amount of queries lowers drastically when specifying batchSize with this approach. Populate queries are requested at the same rate as getMore commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this PR is great! Made a couple comments.
Could we put this in the 5.11 release? This change might be a bit too big for a patch release.
lib/cursor/QueryCursor.js
Outdated
return callback(null, null); | ||
} else { | ||
// Request as many docs as batchSize, to populate them also in batch | ||
return ctx.cursor.next(_onNext.bind({ ctx, callback, batchDocs: [] })); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we not use bind()
, especially with a custom object as this
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was a quick approach to make ESLint happy (you can see the previous commit). Take in mind callback
needs to be preserved through the internal cursor iteration. There probably could be a cleaner way to succeed in this. Any suggestions?
I also did not wanted to expose onNext
, onPopulate
and nextDoc
in the QueryCursor API, as they are purely internal methods. That forces me to pass somehow the ctx
object in any way. And thus i ended with the bind()
approach.
lib/cursor/QueryCursor.js
Outdated
return _this.callback(err); | ||
} | ||
|
||
_this.ctx._batchDocs = docs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to have both .batchDocs
and ._batchDocs
? Model.populate(docs)
modifies docs
in place so they point to the same object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this.batchDocs
belongs to the custom object initially bound to _onNext
, while this.ctx._batchDocs
belongs to the QueryCursor instance. So this assignation is basically passing the Array from one object to another.
Sorry i did not researched that deep to know that Model.populate(docs)
modified the Array in place, so maybe i could even reset ctx._batchDocs
just before line 312 and always keep one Array. Let me see if that works as expected.
No problem with adding this to the 5.11 release. When is it expected to be released? |
Done with the single My first version ( 3ec17eb ) did not use |
We're planning on releasing 5.11 in October or November 2020. Does that work for you or do you need this feature out sooner? |
Great! I can manage to use my fork as temporary dependency for my projects right now until the 5.11 release. The current expectative is to not use the fork dependency as a long term solution. |
Summary
This adds a new feature that allows to optimize QueryCursor usage when populate is passed to the query. It requires a batchSize to be defined on the query, otherwise it will work as usual.
Motivation
Actually population on cursors iterates one doc at a time, running N population queries per doc which is not the best scenario for large datasets. By using a custom batchSize, the Mongo driver is already getting a batch of documents into memory, which can be safely be passed into Mongoose QueryCursor without triggering a new getMore command to the server. Then those documents can be populated together, reducing the amount of extra queries to the DB.
Examples