Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginate the activity list #6798

Merged
merged 27 commits into from
May 17, 2022

Conversation

avdata99
Copy link
Member

@avdata99 avdata99 commented Apr 6, 2022

Fixes #6108

pkg_activity_list

Proposed fixes:

This PR:

  • Completely change the JS module activity-stream.js. The *_activity_list_html actions were removed at Activity stream html #4627 and this file should be removed
  • Remove template with the Load more button (no longer in use)
  • Allow pagination over activity lists adding the next and previous page buttons
  • Allow filter activities by type

If this is correct we can implement same changes for other activity streams

Features:

  • includes tests covering changes
  • includes updated documentation
  • includes user-visible changes
  • includes API changes
  • includes bugfix for possible backport

@tino097
Copy link
Member

tino097 commented Apr 25, 2022

@avdata99 @amercader
This is good to go, but we need to check if we shoudl merge this first or #6790

My suggestion is to merge this one first

@wardi
Copy link
Contributor

wardi commented Apr 26, 2022

for performance it would be much better to use timestamps to paginate activities instead of offsets. The activity table will be very large on busy sites and a web crawler could inadvertently create a DoS when using offset.

# TODO: remove
g.pkg_dict = pkg_dict
g.pkg = pkg
all_activities = activity_model.package_activity_list(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a really expensive way of calculating the total number of activities. Maybe we can create a new package_activity_count function that accepts the same params but just returns the count needed, computed in an efficient way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the package_activity_count here @amercader

@amercader
Copy link
Member

@wardi This is how activity lists are paginated now:

ckan/ckan/model/activity.py

Lines 136 to 140 in 636135f

q = q.order_by(desc(model.Activity.timestamp)) # type: ignore
if offset:
q = q.offset(offset)
if limit:
q = q.limit(limit)

Essentially we pass ?offset=10&limit=20

What you are suggesting is paginating with something like ?since=<timestamp>&limit=10?
Would we need some extra index in the activity table?

@wardi
Copy link
Contributor

wardi commented Apr 26, 2022

Exactly. A since or before timestamp parameter is something the db can return instantly based on an index.

We have some indexes that include timestamp:

    "idx_activity_object_id" btree (object_id, "timestamp")
    "idx_activity_user_id" btree (user_id, "timestamp")

so as long as we're only looking at the activities for a specific user or object we should be covered.

@wardi
Copy link
Contributor

wardi commented Apr 26, 2022

Offset by timestamp also has the benefit of having persistent urls and not having duplicate items show up while moving through pages when new activities are added at the same time

@avdata99
Copy link
Member Author

@avdata99 @amercader This is good to go, but we need to check if we shoudl merge this first or #6790

My suggestion is to merge this one first

@tino097 @amercader, I agree to merge this one first.
But we need to consider other activity streams (e.g. user, groups and others).
Do we want to apply this to all of them before #6790?

@avdata99 avdata99 marked this pull request as draft April 26, 2022 18:46
@amercader
Copy link
Member

So we could approach this in two ways:

Option A

  1. Implement timestamp based pagination on this PR (for datasets)
  2. Replicate the changes on this PR to all activity streams (user, group/org, dashboard) on a separate PR
  3. Port the changes to the new activities plugin PR (Extract acitivities into a separate plugin #6790)

or Option B

  1. Merge the new activities plugin PR (Extract acitivities into a separate plugin #6790)
  2. Port the changes of this PR to the activities plugin
  3. Implement timestamp based pagination on this PR (for datasets) on the activities plugin
  4. Replicate the changes on this PR to all activity streams (user, group/org, dashboard) on the activities plugin

I don't have any strong opinions either way, @smotornyuk you probably have a better idea of the effort involved and are best placed to decide

@smotornyuk
Copy link
Member

smotornyuk commented Apr 29, 2022

I prefer option A. For me, it will be as simple as picking a few extra changes into a separate plugin(i have plenty of macros, so it will be a trivial task). And I don't want @avdata99 to repeat his work:)

After merging this PR, if we go with #6790 next, it will be even easier for @avdata99 to replicate the changes on this PR to all activity streams.

BTW, starting the next week I'll stop my volunteer activities(as the situation is more or less stabilized and I'm not really useful anymore). This means I'll have more time for CKAN and will get back to the TechTeam meetings:)

PS @avdata99, as for your question: It should be a separate PR. I think that it will be simpler to replicate your changes to other activities after #6790. But if you don't want to wait/don't want to work on the updated codebase, go ahead and I'll just adopt your changes into my PR afterward.

@avdata99
Copy link
Member Author

avdata99 commented May 2, 2022

Thanks @smotornyuk, I agree to go for this PR first, then move this feature to this PR and finally create independent PRs for other activity streams.

@avdata99 avdata99 force-pushed the 6108_improve_pkg_actvity_list branch from 2242740 to 0080efb Compare May 2, 2022 10:39
@avdata99
Copy link
Member Author

avdata99 commented May 3, 2022

@smotornyuk @amercader
I finished a first version of pagination by timestamp (including some basic tests).
Some notes:

  • If we expect duplicate timestamps (I think we don't), we'll need to change this approach to one that includes timestamp+id approach
  • For the previous page I'm still using offset, is there a better way to do it?

@avdata99 avdata99 marked this pull request as ready for review May 3, 2022 08:00
Copy link
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avdata99 looking good! I added a few more comments to polish the feature a bit

) -> Union[Response, str]: # noqa

"""Render this package's public activity stream page.
"""
after = h.get_request_param(u'after')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't introduce new u prefixes please 🙏

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here.

Comment on lines 1112 to 1116
if after and before:
raise ValidationError(
{'after': ['Cannot be used together with `before']}
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks should be done via schema validators, not directly in the blueprints (otherwise they are not applied when using the API)

Copy link
Member Author

@avdata99 avdata99 May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new validator was created here.
Now we allow after and before together.

ckan/views/dataset.py Outdated Show resolved Hide resolved
@@ -2533,14 +2533,17 @@ def package_activity_list(

:param id: the id or name of the package
:type id: string
:param offset: where to start getting activity items from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wardi was your idea to remove support for offset pagination entirely or keep it at the API level and just not use it from the UI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how many users of the activity API we have?

My guess is we would want to keep it for compatibility, at least for a while.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offset para was restored. Works well in conjunction with after and before (and not used for pagination).
Also, some test were added

ckan/views/dataset.py Outdated Show resolved Hide resolved
Comment on lines 659 to 660
schema['before'] = [ignore_missing]
schema['after'] = [ignore_missing]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be validated as timestamps, and as mentioned before if you want to check that both are not present do it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here

ckan/tests/controllers/test_package.py Show resolved Hide resolved
ckan/templates/package/activity.html Outdated Show resolved Hide resolved
ckan/templates/package/activity.html Show resolved Hide resolved
ckan/templates/package/activity.html Outdated Show resolved Hide resolved
@avdata99 avdata99 marked this pull request as draft May 5, 2022 08:50
@avdata99 avdata99 force-pushed the 6108_improve_pkg_actvity_list branch from 8ababa5 to a317327 Compare May 5, 2022 09:52
@avdata99
Copy link
Member Author

avdata99 commented May 5, 2022

Latest changes:

  • New validator added
  • Tests added
  • after and before works together
  • offset can be used again (also combined with after and before)

@avdata99 avdata99 marked this pull request as ready for review May 5, 2022 10:49
@avdata99 avdata99 requested review from wardi and amercader May 5, 2022 11:13
q = q.filter(model.Activity.timestamp > timestamp)
else:
q = q.filter(model.Activity.timestamp < timestamp)
return q
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of 1-line function shouldn't exist because it makes calling code more difficult to understand. Instead please use the line of code itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated here


# revert sort queries for "only after" queries
revese_order = before and not after
results = _activities_limit(q, limit, offset, revese_order).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of calling _activities_limit please use order_by, offset and limit here. We should aim to remove functions like _activities_limit because they make the code harder to understand and optimize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated here

@wardi
Copy link
Contributor

wardi commented May 6, 2022

@avdata99 thank you for your work on this. I suggested a few other small changes above to reduce the number of functions that hide really simple operations and to make the code easier to understand and improve in the future.

@avdata99
Copy link
Member Author

avdata99 commented May 6, 2022

@wardi thanks for your review
I updated the activity queries here.

@avdata99 avdata99 requested a review from wardi May 6, 2022 14:25
@wardi
Copy link
Contributor

wardi commented May 6, 2022

@avdata99 thank you those changes look much better!

@amercader
Copy link
Member

This is good to go, but let's wait on #6790 (comment)

@amercader amercader merged commit b3a9f5e into ckan:master May 17, 2022
@amercader
Copy link
Member

@avdata99 thanks for your work on this. @smotornyuk will port this changes to #6790 and once that is merged you can replicate the changes to the user and group streams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix activity pagination in web UI
5 participants