Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial, stable dates with automatic precision field #3059

Merged
merged 22 commits into from
Nov 20, 2023

Conversation

dato
Copy link
Contributor

@dato dato commented Oct 23, 2023

Fixes: #743
Fixes: #3028

cover letter (pt. 1)

as proof-of-concepts go, this is what I'd like to remark of this PR: it delays actual data migration.

Why is it important? (in my opinion)

Eyeing #3028, it's clear we can fix it moving forward (stop storing data that triggers the bug), but the issue remains at large, in existing data.

If we eagerly migrate publication dates from DateTimeField to DateTime, say by merely calling dt.date() on them, we'd lose an opportunity to fix the bug retrospectively. Which is what I'd like to do.

By leaving the data alone for a while, we can not only decide how to migrate things like 2023-09-30T21:00:00-03, but to actually run some experiments to include partial date support in the migration.

Anyway, onto more interesting matters...

cover letter (pt. 2)

The new PartialDateField class does what it says, and does so by bringing an associated precision field "on its own" (i.e. it defines it automatically in contribute_to_class() method). The trick to this is using a custom descriptor (I'm sure there are bugs there).

I named the underlying data type SealedDate because, well, it seemed to me it covers both the "partial" (#743) and "stable" (#3028) aspects of this work. But I'm not a native speaker.

Also, I really wanted this type to inherit either from datetime (now) or date (in the future), and be as close to them as possible; that's why the specializations are done exclusively from inheritance (I wanted no additional attributes on these classes, no custom __init__, etc.). This is just my preference, and it's not without its drawbacks.

dato added 10 commits October 24, 2023 04:32
Django uses `str(date)` for backends other than PostgreSQL, so do not
break "YYYY-MM-DD" formatting, just in case.
Note that Django forms _already_ have suppport for partial date data; we
just need to extend it when converting to Python (using SealedDate instead
of returning an error).
In particular, SealedDate's class methods always return an instance
of the class they're invoked through (i.e., `SealedDate.from_date_parts`
intentionally never returns `MonthSeal` or `YearSeal`).

To propertly annotate this, a type variable is needed (or the much
simpler `Self` in Python 3.11).
The three "ignore" directives are:

  - avoid unreadable boilerplate from inherited `Field` methods; and:
  - typeddjango/django-stubs#285 (comment)
@mouse-reeve
Copy link
Member

First off, this seems to work perfectly, and I'm really excited to have partial dates on the way! However, I've been trying to review this for a while but somewhere along the way I have completely lost the script and now I have absolutely no clue what is going on.

One thing is, I'm not understanding what "sealed" means. I get that the date can be partial (yay!), and I think that it's stable in that it doesn't have time zone bugs? But I'm less sure about the second part. Would it make sense to just call it "partial" since that's what's functionally different about it than a regular out of the box django date field?

I noticed there are some comments in the code about things that need fixing before merge, but my brain has totally turned to mush on this issue and I'm not sure what needs to happen to make this ready to merge.

@dato
Copy link
Contributor Author

dato commented Nov 8, 2023

tl;dr: (0) thank for you reviewing! 1) will fix the pending "fixme" items; (2) we can rename anything, I provide context below nevertheless; (3) any other questions about the implementation, let me know!


First off, this seems to work perfectly, and I'm really excited to have partial dates on the way! However, I've been trying to review this for a while but somewhere along the way I have completely lost the script and now I have absolutely no clue what is going on.

hey thanks for testing the feature, glad to hear it's functional. apart from the naming issues (below), the use of descriptors makes the solution more complex and more difficult to follow, we can go over any details after figuring out the naming of things. (n.b.: I've never done such low-level Django plumbing before, just read a bunch of blog post and a lot of django code.)

One thing is, I'm not understanding what "sealed" means. I get that the date can be partial (yay!), and I think that it's stable in that it doesn't have time zone bugs? But I'm less sure about the second part.

It refers to "partial (#743) + stable (#3028)", yes, the stable bit meaning: should not be "converted" across timezones (think an author's birth date: it does occur on a certain timezone and moment in time, but the date itself is fixed: it is taken verbatim across the world).

Having to clarify this is an artifact of the columns being still datetime. When/if we migrate them to date down the road, this concern no longer holds, because date brings the stable-literal semantics on its own.

Would it make sense to just call it "partial" since that's what's functionally different about it than a regular out of the box django date field?

Please note I did use "partial" for the more visible bookwyrm.models.fields.PartialDateField (which is what the rest of bookwyrm.models.* modules will import and use). That name makes sense to me there, since it's about functionality.

I also used naturalday_partial for the filter name templates will use. (Thinking of it, I should definitely rename templates/sealed_dates.py (where that filter lives) to partial_dates.py, or even better templates/date_ext.py. Would you be okay with the latter? Anything but "sealed" seems good here.)

I chose not to use "partial" for the core classes in utils.sealed_date (but I will rename them too if you prefer) because they're full dates (inheriting from datetime) and so are never partial except through partial_isoformat() (never through str(), e.g.) Compare the implementation of LooseDate in #2961, which is properly partial (day and month can be None).

So for me this name is a "lower-level" nuance that shouldn't bubble up to the code base except in a couple, very specific places. But I'm not a native speaker and if "sealed" is more confusing than clarifying, let's rename.

I noticed there are some comments in the code about things that need fixing before merge, but my brain has totally turned to mush on this issue and I'm not sure what needs to happen to make this ready to merge.

Ah yes, I wasn't sure how to address incoming dates through ActivityPub if they didn't have a timezone. Now it's clear (take verbatim, no second-guessing). Will push a fix to the PR.

@mouse-reeve
Copy link
Member

Thank you! I think "sealed" is confusing, I would assume it means "closed" or "hidden" like a sealed envelope. Using partial where appropriate is my preference. If I'm understanding you, the utils part is still intended for use as a partial date, but under the hood is uses full dates? My inclination would be to use "partial" there too, since that's still what it's for (and maybe "part" instead of "seal" in contexts like "MonthSeal"). If you feel strongly that it's misleading to describe that part of the code as "Partial," we can figure out a term -- let me know :)

@dato
Copy link
Contributor Author

dato commented Nov 9, 2023

Thank you! I think "sealed" is confusing, I would assume it means "closed" or "hidden" like a sealed envelope. Using partial where appropriate is my preference.

Fair enough, done!

If I'm understanding you, the utils part is still intended for use as a partial date, but under the hood is uses full dates? My inclination would be to use "partial" there too, since that's still what it's for (and maybe "part" instead of "seal" in contexts like "MonthSeal").

Done too.

If you feel strongly that it's misleading to describe that part of the code as "Partial," we can figure out a term -- let me know :)

No worries. The code being correct is what matters most. :)

Thanks!

@dato dato force-pushed the stable_dates_v2 branch 2 times, most recently from 5477030 to c344e2a Compare November 14, 2023 23:19
@mouse-reeve
Copy link
Member

Is this ready to merge? It continues to work beautifully :)

@dato
Copy link
Contributor Author

dato commented Nov 20, 2023

Yes!, no more commits to add from me :)

Any comments about the code welcome

@mouse-reeve mouse-reeve merged commit 3d9f339 into bookwyrm-social:main Nov 20, 2023
@dato dato deleted the stable_dates_v2 branch December 11, 2023 23:39
dato added a commit to dato/bookwyrm that referenced this pull request Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dates wrong in ActivityPub Stream Allow dates to be only year
2 participants