Audiobook time tracking #1288

RishiDiwanTT · 2023-07-21T10:21:37Z

Description

New tables IdentifierPlaytimeEntries and IdentifierPlaytimes have been added.
The time tracking API has been added as POST /playtimes/<type>/<identifier>/ .
A new environment variable SIMPLIFIED_REPORTING_EMAIL has been added for the temporary requirement of emailing reports via a cron job, this will require a deployment change.
Playtime aggregations run every 12 hours.
Playtime reporting occurs on the 2nd of every month.

Cannot add the api spec for now since HTTP 207 is not supported by flask_pydantic_spec. Bug ticket here.

Motivation and Context

We need to track the total amount of time an audiobook is playing on a user’s device. The apps should send this information to a remote server every 1 minute.
JIRA

How Has This Been Tested?

Manually run the APIs and summation jobs.
Emailed a local smtp server with the CSV report.

Checklist

I have updated the documentation accordingly.
All new and existing tests passed.

codecov · 2023-07-21T10:27:23Z

Codecov Report

Patch coverage: 99.07% and project coverage change: +0.03% 🎉

Comparison is base (d12325e) 89.82% compared to head (ba1fa75) 89.85%.
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1288      +/-   ##
==========================================
+ Coverage   89.82%   89.85%   +0.03%     
==========================================
  Files         208      210       +2     
  Lines       28549    28655     +106     
  Branches     6545     6556      +11     
==========================================
+ Hits        25644    25749     +105     
  Misses       1893     1893              
- Partials     1012     1013       +1

Files Changed	Coverage Δ
core/jobs/playtime_entries.py	`98.03% <98.03%> (ø)`
core/model/constants.py	`100.00% <100.00%> (ø)`
core/query/playtime_entries.py	`100.00% <100.00%> (ø)`
core/util/datetime_helpers.py	`100.00% <100.00%> (ø)`
core/util/email.py	`83.33% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tdilauro

I've taken a first pass through this one, but have NOT really looked at the tests yet. More detail below, but here are a few high-level thoughts:

We need to be able to report at the collection and library level for the CSV report, so we need that information in the database tables.
Since the apps will use the presence of a item entry-level link with the time tracking rel to decide whether to perform/report time tracking, we want link to be present for only those books needing tracking to have one. So we need some collection or library/collection-level config to indicate whether time tracking should be used.
So that we can more easily tune them over time, it would be useful to be able to set (or override) the following with script options:
- How old a time entry timestamp should be before it is processed into the summary table.
- How old a processed==True entry should be (i.e., how long we want to actively avoid duplicates) before it is removed from the entries table.

api/controller.py

api/model/time_tracking.py

api/opds.py

core/jobs/playtime_entries.py

core/model/time_tracking.py

core/jobs/playtime_entries.py

core/model/time_tracking.py

jonathangreen · 2023-07-26T14:05:59Z

Just throwing a note on here to remind us that since #1281 went in, and also had a DB migration, that we will need to fix the DB migration here before merging.

The api takes bulk playtime entries to insert into the DB

The cron job is slated to run every 12 hours, shifted by 8 to avoid clutter

To run once a month and send a quarterly report

…summaries

core/model/time_tracking.py

core/jobs/playtime_entries.py

docker/services/cron/cron.d/circulation

api/model/time_tracking.py

api/opds.py

tdilauro

I resolved some conversations and added a few more comments.

core/jobs/playtime_entries.py

core/util/datetime_helpers.py

core/query/playtime_entries.py

tests/api/test_controller_playtime_entries.py

tdilauro · 2023-07-31T23:34:31Z

tests/core/jobs/test_playtime_entries.py

-                    Configuration.REPORTING_EMAIL_ENVIRONMENT_VARIABLE: "reporting@test.email"
-                },
-            ),
+        # Horrible unbracketted syntax for python 3.8


So glad this is fixed in more recent Python versions! Absolutely hideous and confusing.

…g rels

tdilauro · 2023-08-02T01:46:02Z

core/query/playtime_entries.py

+                if (
+                    today - entry.during_minute.date()
+                ).days > cls.OLDEST_ACCEPTABLE_ENTRY_DAYS:
+                    # This will count as a success, since we don't want to repeat the entry


This is reported as a failure, but one that means the client should discard the corresponding entry.

Changed this to count as a failure

api/opds.py

api/routes.py

RishiDiwanTT · 2023-08-03T04:37:07Z

I did originally use collection id, but found that every other collection based route on the CM, which are a lot, uses collection name. To keep it consistent I also used the name. I'm not privy as to why that's the standard though.

…

On Thu, Aug 3, 2023, 10:03 Tim DiLauro ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In api/opds.py <#1288 (comment)> : > + href=self.url_for( + "track_playtime_events", + identifier_type=identifier.type, + identifier=identifier.identifier, + library_short_name=self.library.short_name, + _external=True, The route we're getting the URL for here is "/playtimes/<collection_name>/<identifier_type>/<path:identifier>", so the href should be something like: href=self.url_for( "track_playtime_events", collection_name=active_license_pool.collection.name, identifier_type=identifier.type, identifier=identifier.identifier, _external=True, ), But I think it might be better -- for a couple of reasons -- if we used the id of the collection, rather than the name: - A collection name can have spaces, which add to the messiness of the URLs (though, this is also the case with identifiers, especially the id type, as well). - A collection name can be changed at any time (for example, when we notice a typo or extra spaces in the name). ------------------------------ In api/routes.py <#1288 (comment)> : > @@ -672,6 +672,18 @@ def track_analytics_event(identifier_type, identifier, event_type): ) ***@***.***_route( + "/playtimes/<collection_name>/<identifier_type>/<path:identifier>", methods=["POST"] I might be better to use collection_id, rather than collection_name, for the first component of the path, since it is less likely to change. — Reply to this email directly, view it on GitHub <#1288 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVRR5S7DJHJHG6D2EVLIPUDXTMSZRANCNFSM6AAAAAA2SURIFY> . You are receiving this because you were assigned.Message ID: ***@***.***>

tdilauro · 2023-08-03T04:45:49Z

I did originally use collection id, but found that every other collection based route on the CM, which are a lot, uses collection name. To keep it consistent I also used the name. I'm not privy as to why that's the standard though.

I'd still suggest using the id here, since these URLs might be cached for a long time on the client apps and a name change might cause all of them to break. That said, the client apps should update these links in their registries whenever they fetch the loans feed, so maybe that's good enough.

tdilauro

I think this is looking pretty good. I’m going to do some more testing once I wake up. And, of course, we’ll need to resolve the collection name vs. id issue.

One more fairly minor comment below.

api/routes.py

tdilauro · 2023-08-03T16:58:47Z

api/model/time_tracking.py

+            logging.getLogger("TimeTracking").error(
+                f"An incorrect timezone was received for a playtime ({value.tzname()})."
+            )
+            raise ValueError("Timezone MUST be UTC always")


Raising ValueError here causes the entire request to fail, even if only one timeEntry of many has an error. Ideally, we'd return this as a 400 response for just this entry.

I think it's unlikely for this to happen to one among many, so I think we can address this later.

tdilauro · 2023-08-03T17:00:02Z

api/model/time_tracking.py

+class PlaytimeEntriesPost(CustomBaseModel):
+    time_entries: List[PlaytimeTimeEntry] = Field(description="A List of time entries")


The book_id and library_id from the spec are missing here.

I pushed a commit to fix this.

tdilauro

This looks good to go! 🎈🎉

It'll be good to start testing against this with the apps!

tdilauro · 2023-08-03T17:55:54Z

@RishiDiwanTT I'm going to go ahead and merge this, so that we can get it deployed out for testing. Thanks for all your work on this!

RishiDiwanTT added DB migration This PR contains a DB migration feature New feature labels Jul 21, 2023

RishiDiwanTT requested a review from a team July 21, 2023 10:21

RishiDiwanTT self-assigned this Jul 21, 2023

RishiDiwanTT changed the title ~~Feature/audiobook time tracking~~ Audiobook time tracking Jul 21, 2023

tdilauro reviewed Jul 24, 2023

View reviewed changes

RishiDiwanTT added 8 commits July 28, 2023 12:34

Time tracking models and model tests

b5e58f3

Time tracking route with api models added

3d43dca

The api takes bulk playtime entries to insert into the DB

Playtime summation script and tests

e1a00a5

The cron job is slated to run every 12 hours, shifted by 8 to avoid clutter

Playtime reporting script added with a configurable email recipient

22d74c2

To run once a month and send a quarterly report

Changed API route

22bcfe5

Added time tracking links to feed entries

d735181

Mypy fixes

4b0cfcd

Python 3.8 syntax fix

edf3756

RishiDiwanTT force-pushed the feature/audiobook-time-tracking branch from 7fbb202 to edf3756 Compare July 28, 2023 07:05

RishiDiwanTT added 6 commits July 28, 2023 12:37

Alembic ordering fix

0b51149

Added collection and library information to the playtime entries and …

0705167

…summaries

Reporting summation groups on collection and library as well now

23c5e51

Playtimes API validations

1a1c9a0

Modularized playtime entries

e081f78

Fixed UTC date issue

4a35980

tdilauro reviewed Jul 31, 2023

View reviewed changes

RishiDiwanTT added 5 commits July 31, 2023 12:00

PR updates

fd0e97a

Added the 401 gone status for very old entries

30b0661

Playtime entries reaping cut off time

c5c942d

Time tracking rels only for specific collections

87ad147

Mypyp fixes

6827875

tdilauro reviewed Jul 31, 2023

View reviewed changes

api/opds.py Outdated Show resolved Hide resolved

tdilauro reviewed Jul 31, 2023

View reviewed changes

RishiDiwanTT added 2 commits August 1, 2023 11:35

Fixed 401 Gone to 410 Gone

6a93da8

Only a loans feed with active loans for a work will have time trackin…

49170e5

…g rels

tdilauro reviewed Aug 2, 2023

View reviewed changes

410 is now counted as a failure

6e727b6

tdilauro reviewed Aug 3, 2023

View reviewed changes

api/opds.py Outdated Show resolved Hide resolved

api/routes.py Outdated Show resolved Hide resolved

RishiDiwanTT added 3 commits August 3, 2023 11:19

Fixed time tracking links

049696b

Mypy fix

3a0a689

API spec for the route

c9e8a21

tdilauro reviewed Aug 3, 2023

View reviewed changes

api/routes.py Show resolved Hide resolved

Switched from collection.name to colletion.id for the playtime route

dc8d552

tdilauro reviewed Aug 3, 2023

View reviewed changes

Add missing API fields.

ba1fa75

tdilauro approved these changes Aug 3, 2023

View reviewed changes

tdilauro merged commit b88847e into main Aug 3, 2023

tdilauro deleted the feature/audiobook-time-tracking branch August 3, 2023 17:56

jonathangreen mentioned this pull request Aug 7, 2023

Remove SharedCollectionAPI and IntegrationClient 🔥 (PP-224) #1298

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audiobook time tracking #1288

Audiobook time tracking #1288

RishiDiwanTT commented Jul 21, 2023 •

edited

Loading

codecov bot commented Jul 21, 2023 •

edited

Loading

tdilauro left a comment

jonathangreen commented Jul 26, 2023

tdilauro left a comment

tdilauro Jul 31, 2023

tdilauro Aug 2, 2023

RishiDiwanTT Aug 2, 2023

RishiDiwanTT commented Aug 3, 2023 via email

tdilauro commented Aug 3, 2023

tdilauro left a comment

tdilauro Aug 3, 2023

tdilauro Aug 3, 2023

tdilauro Aug 3, 2023

tdilauro left a comment

tdilauro commented Aug 3, 2023

		class PlaytimeEntriesPost(CustomBaseModel):
		time_entries: List[PlaytimeTimeEntry] = Field(description="A List of time entries")

Audiobook time tracking #1288

Audiobook time tracking #1288

Conversation

RishiDiwanTT commented Jul 21, 2023 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Checklist

codecov bot commented Jul 21, 2023 • edited Loading

Codecov Report

tdilauro left a comment

Choose a reason for hiding this comment

jonathangreen commented Jul 26, 2023

tdilauro left a comment

Choose a reason for hiding this comment

tdilauro Jul 31, 2023

Choose a reason for hiding this comment

tdilauro Aug 2, 2023

Choose a reason for hiding this comment

RishiDiwanTT Aug 2, 2023

Choose a reason for hiding this comment

RishiDiwanTT commented Aug 3, 2023 via email

tdilauro commented Aug 3, 2023

tdilauro left a comment

Choose a reason for hiding this comment

tdilauro Aug 3, 2023

Choose a reason for hiding this comment

tdilauro Aug 3, 2023

Choose a reason for hiding this comment

tdilauro Aug 3, 2023

Choose a reason for hiding this comment

tdilauro left a comment

Choose a reason for hiding this comment

tdilauro commented Aug 3, 2023

RishiDiwanTT commented Jul 21, 2023 •

edited

Loading

codecov bot commented Jul 21, 2023 •

edited

Loading