Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable batch size for JIRA._fetch_pages() and dependant methods #1394

Merged
merged 10 commits into from
Jun 28, 2022

Conversation

rynkk
Copy link
Contributor

@rynkk rynkk commented Jun 3, 2022

Add batch_size argument to JIRA._fetch_pages and dependant methods.
This allows for significant performance improvements when trying to retrieve a lot of items using maxResults=False.

fixes #1393

The program was tested solely for our own use cases, which might differ from yours.


Jannik Meinecke (jannik.meinecke@mercedes-benz.com) on behalf of MBition GmbH.
Provider Information

Licensed under BSD-2-Clause license

@rynkk
Copy link
Contributor Author

rynkk commented Jun 3, 2022

Required change in _fetch_pages was a lot less than anticipated. I improved the method's tests, while I was at it.


Jannik Meinecke (jannik.meinecke@mercedes-benz.com) on behalf of MBition GmbH.
Provider Information

jira/client.py Show resolved Hide resolved
jira/client.py Outdated Show resolved Hide resolved
@studioj
Copy link
Collaborator

studioj commented Jun 4, 2022

I'm not sure what this entails ... it seems just like offering a different way of doing the same thing 🤔
Could you update the examples maybe to explain the usecase a little bit?
I dont really like using maxResults and batchsize to set the maxResults in the json, but that's personal of course ...

@studioj
Copy link
Collaborator

studioj commented Jun 5, 2022

Maybe setting a batch size for the whole jira class seems more logical instead of adding it for quite some methods? =>
Would there be a disadvantage for having a large batch size on shorter queries?
I see your improvement from 41s to 6s is pretty significant so its definitly worth continuing this train of thought.
I'm just trying to prevent tons of parameters being added to methods... It's often seen as a code smell if your parameter list is getting too large ...

If you would be in dire need for a speedup quickly, probably a quick fix would be to increase the default batch size to 100 🤔?
although I like it to be configurable

@adehad @ssbarnea some architectural guidance is welcome :-)

Copy link
Contributor

@adehad adehad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe setting a batch size for the whole jira class seems more logical instead of adding it for quite some methods? => Would there be a disadvantage for having a large batch size on shorter queries? I see your improvement from 41s to 6s is pretty significant so its definitly worth continuing this train of thought. I'm just trying to prevent tons of parameters being added to methods... It's often seen as a code smell if your parameter list is getting too large ...

If you would be in dire need for a speedup quickly, probably a quick fix would be to increase the default batch size to 100 🤔? although I like it to be configurable

@adehad @ssbarnea some architectural guidance is welcome :-)

@rynkk, thanks for this. Especially as this is one of the more convoluted areas in the code.
As suggested by @studioj I believe this batch size needs to be set at a class instance level. i.e. in here something like DEFAULT_BATCH_SIZE

DEFAULT_OPTIONS = {

I completely understand where you are coming from with wanting to set this per function. It provides the most flexibility, especially as you will likely have more issues than dashboards and therefore would want to set it differently for different resources. Also scales differently depending on how fast the server is you are working with.

A solution I would prefer if you strongly would like to set this per endpoint might look like so:

# within the DEFAULT_OPTIONS

# A dict of (Resource Type, Batch Size) 
DEFAULT_BATCH_SIZE: Dict[Resource, int] = {
    Issue : 500,
    Resource: 100, # or 50 or whatever we want as a default now
}


...

# inside the JIRA.__init__

self.options["default_batch_size"] = DEFAULT_BATCH_SIZE
self.options["default_batch_size"].update(default_batch_sizes)  # or whatever the argument name is
 



...
# now inside fetch pages

        if maxResults:
            page_params["maxResults"] = maxResults
        else: # will need to set batch size to get all resources
            page_params["maxResults"] = self.options["default_batch_size"].get(item_type, self.options["default_batch_size"].get(Resource))
            _LOG.verbose(f"Batch size for {item_type.__name__} set to {page_params["maxResults"]}")

There might be problems with this approach too, but I feel it may be more ergonomic for advanced users such as yourself without confusing/bogging down more casual users

@rynkk
Copy link
Contributor Author

rynkk commented Jun 7, 2022

@adehad, @studioj, I agree, not having to specify the batch_size for every request is less error-prone and more user-friendly, while the dict-config allows the required configurability.

I implemented the requested changes with some addition:
It is possible to set a Resource's mapped batch-size to None, which means, that no batching on our side is done, but instead we let the JIRA-backend handle it.

I also adjusted the tests. Seems like the test for _fetch_pages is starting to look a bit bloated. Fine for now?


Jannik Meinecke (jannik.meinecke@mercedes-benz.com) on behalf of MBition GmbH.
Provider Information

Copy link
Contributor

@adehad adehad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

fetch_pages() and the way we handle async needs its own refactoring at some point so don't worry about it for now.

tests/tests.py Outdated Show resolved Hide resolved
jira/client.py Show resolved Hide resolved
jira/client.py Outdated Show resolved Hide resolved
jira/client.py Show resolved Hide resolved
tests/tests.py Outdated Show resolved Hide resolved
jira/client.py Outdated Show resolved Hide resolved
@adehad
Copy link
Contributor

adehad commented Jun 7, 2022

@rynkk you should also rebase/merge the latest main if you haven't already, as we've fixed some build problems

@rynkk
Copy link
Contributor Author

rynkk commented Jun 7, 2022

Changes implemented, some modifications of my own, as well. Tests pass, docs do not. Some issue with not being able to map ResourceType. Not sure what the cause is.


Jannik Meinecke (jannik.meinecke@mercedes-benz.com) on behalf of MBition GmbH.
Provider Information

___
Jannik Meinecke <jannik.meinecke@mercedes-benz.com> on behalf of MBition GmbH.
https://github.com/mercedes-benz/foss/blob/master/PROVIDER_INFORMATION.md
Copy link
Contributor

@adehad adehad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, almost ready, let's see if we can fix the build

jira/client.py Outdated Show resolved Hide resolved
jira/client.py Outdated Show resolved Hide resolved
jira/client.py Outdated Show resolved Hide resolved
tests/tests.py Outdated Show resolved Hide resolved
jira/client.py Outdated Show resolved Hide resolved
jira/client.py Outdated Show resolved Hide resolved
---
<sub>Jannik Meinecke (<jannik.meinecke@mercedes-benz.com>) on behalf of MBition GmbH.
[Provider Information](https://github.com/mercedes-benz/foss/blob/master/PROVIDER_INFORMATION.md)</sub>
Copy link
Contributor

@adehad adehad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 1 change and then I'm happy, @studioj any further thoughts?

jira/client.py Show resolved Hide resolved
jira/client.py Outdated Show resolved Hide resolved
tests/tests.py Show resolved Hide resolved
---
<sub>Jannik Meinecke (<jannik.meinecke@mercedes-benz.com>) on behalf of MBition GmbH.
[Provider Information](https://github.com/mercedes-benz/foss/blob/master/PROVIDER_INFORMATION.md)</sub>
@rynkk
Copy link
Contributor Author

rynkk commented Jun 9, 2022

Issue-default removed, tests & docs adjusted :-)


Jannik Meinecke (jannik.meinecke@mercedes-benz.com) on behalf of MBition GmbH.
Provider Information

adehad
adehad previously approved these changes Jun 9, 2022
Copy link
Contributor

@adehad adehad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks good to me, will wait a bit to see if @studioj has any comments.

Thanks for pushing through with this.

jira/client.py Outdated
Comment on lines 479 to 480
``{Issue: 500, Resources: None}`` will make :py:meth:`search_issues` query Issues in batches of 500, while
every other item type's batch-size will be controlled by the backend. (Default: None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really excellent documentation here

studioj
studioj previously approved these changes Jun 9, 2022
Copy link
Collaborator

@studioj studioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems pretty thourough although I'm counting on @adehad for the details :P

@@ -716,6 +730,8 @@ def _fetch_pages(
page_params["startAt"] = startAt
if maxResults:
page_params["maxResults"] = maxResults
elif batch_size := self._get_batch_size(item_type):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the first walrus operator we're using?
somebody has to go first :)
I must admit I'm not used to it yet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an honor (for some). Just have to be very careful about braces when using it..

@rynkk rynkk dismissed stale reviews from studioj and adehad via e997654 June 10, 2022 07:53
___
Jannik Meinecke (<jannik.meinecke@mercedes-benz.com>) on behalf of MBition GmbH.
[Provider Information](https://github.com/mercedes-benz/foss/blob/master/PROVIDER_INFORMATION.md)
@rynkk
Copy link
Contributor Author

rynkk commented Jun 10, 2022

Found typos in the default_batch_size-docstring at the last glance


Jannik Meinecke (jannik.meinecke@mercedes-benz.com) on behalf of MBition GmbH.
Provider Information

@rynkk
Copy link
Contributor Author

rynkk commented Jun 21, 2022

Hi,
do you need anything else to get this PR done or are you just a bit busy right now?


Jannik Meinecke (jannik.meinecke@mercedes-benz.com) on behalf of MBition GmbH.
Provider Information

@adehad
Copy link
Contributor

adehad commented Jun 26, 2022

Sorry @rynkk haven't had much free time, am having a bit more now so will try and push this through

@adehad adehad changed the title Add batch_size argument to JIRA._fetch_pages and dependant methods Configurable batch size for JIRA._fetch_pages() and dependant methods Jun 28, 2022
@adehad adehad merged commit 60b503a into pycontribs:main Jun 28, 2022
@adehad
Copy link
Contributor

adehad commented Jun 28, 2022

Thanks again for pushing through with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"search_issues" - performance improvements
3 participants