-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paginate the activity list #6798
Paginate the activity list #6798
Conversation
@avdata99 @amercader My suggestion is to merge this one first |
for performance it would be much better to use timestamps to paginate activities instead of offsets. The activity table will be very large on busy sites and a web crawler could inadvertently create a DoS when using offset. |
ckan/views/dataset.py
Outdated
# TODO: remove | ||
g.pkg_dict = pkg_dict | ||
g.pkg = pkg | ||
all_activities = activity_model.package_activity_list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a really expensive way of calculating the total number of activities. Maybe we can create a new package_activity_count
function that accepts the same params but just returns the count needed, computed in an efficient way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the package_activity_count
here @amercader
@wardi This is how activity lists are paginated now: Lines 136 to 140 in 636135f
Essentially we pass What you are suggesting is paginating with something like |
Exactly. A We have some indexes that include timestamp: "idx_activity_object_id" btree (object_id, "timestamp")
"idx_activity_user_id" btree (user_id, "timestamp") so as long as we're only looking at the activities for a specific user or object we should be covered. |
Offset by timestamp also has the benefit of having persistent urls and not having duplicate items show up while moving through pages when new activities are added at the same time |
@tino097 @amercader, I agree to merge this one first. |
So we could approach this in two ways: Option A
or Option B
I don't have any strong opinions either way, @smotornyuk you probably have a better idea of the effort involved and are best placed to decide |
I prefer option A. For me, it will be as simple as picking a few extra changes into a separate plugin(i have plenty of macros, so it will be a trivial task). And I don't want @avdata99 to repeat his work:) After merging this PR, if we go with #6790 next, it will be even easier for @avdata99 to replicate the changes on this PR to all activity streams. BTW, starting the next week I'll stop my volunteer activities(as the situation is more or less stabilized and I'm not really useful anymore). This means I'll have more time for CKAN and will get back to the TechTeam meetings:) PS @avdata99, as for your question: It should be a separate PR. I think that it will be simpler to replicate your changes to other activities after #6790. But if you don't want to wait/don't want to work on the updated codebase, go ahead and I'll just adopt your changes into my PR afterward. |
Thanks @smotornyuk, I agree to go for this PR first, then move this feature to this PR and finally create independent PRs for other activity streams. |
2242740
to
0080efb
Compare
@smotornyuk @amercader
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avdata99 looking good! I added a few more comments to polish the feature a bit
ckan/views/dataset.py
Outdated
) -> Union[Response, str]: # noqa | ||
|
||
"""Render this package's public activity stream page. | ||
""" | ||
after = h.get_request_param(u'after') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't introduce new u
prefixes please 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed here.
ckan/views/dataset.py
Outdated
if after and before: | ||
raise ValidationError( | ||
{'after': ['Cannot be used together with `before']} | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks should be done via schema validators, not directly in the blueprints (otherwise they are not applied when using the API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new validator was created here.
Now we allow after and before together.
ckan/logic/action/get.py
Outdated
@@ -2533,14 +2533,17 @@ def package_activity_list( | |||
|
|||
:param id: the id or name of the package | |||
:type id: string | |||
:param offset: where to start getting activity items from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wardi was your idea to remove support for offset
pagination entirely or keep it at the API level and just not use it from the UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how many users of the activity API we have?
My guess is we would want to keep it for compatibility, at least for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offset
para was restored. Works well in conjunction with after
and before
(and not used for pagination).
Also, some test were added
ckan/logic/schema.py
Outdated
schema['before'] = [ignore_missing] | ||
schema['after'] = [ignore_missing] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be validated as timestamps, and as mentioned before if you want to check that both are not present do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here
8ababa5
to
a317327
Compare
Latest changes:
|
ckan/model/activity.py
Outdated
q = q.filter(model.Activity.timestamp > timestamp) | ||
else: | ||
q = q.filter(model.Activity.timestamp < timestamp) | ||
return q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of 1-line function shouldn't exist because it makes calling code more difficult to understand. Instead please use the line of code itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated here
ckan/model/activity.py
Outdated
|
||
# revert sort queries for "only after" queries | ||
revese_order = before and not after | ||
results = _activities_limit(q, limit, offset, revese_order).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of calling _activities_limit
please use order_by
, offset
and limit
here. We should aim to remove functions like _activities_limit
because they make the code harder to understand and optimize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated here
@avdata99 thank you for your work on this. I suggested a few other small changes above to reduce the number of functions that hide really simple operations and to make the code easier to understand and improve in the future. |
@avdata99 thank you those changes look much better! |
This is good to go, but let's wait on #6790 (comment) |
@avdata99 thanks for your work on this. @smotornyuk will port this changes to #6790 and once that is merged you can replicate the changes to the user and group streams |
Fixes #6108
Proposed fixes:
This PR:
activity-stream.js
. The*_activity_list_html
actions were removed at Activity stream html #4627 and this file should be removedIf this is correct we can implement same changes for other activity streams
Features: