Added DB tables for storing and interating with posts #313

allouis · 2025-02-12T05:05:14Z

closes https://linear.app/ghost/issue/AP-714

We want to move away from storing raw ActivityPub objects in a KV store and instead start storing the data in the format which best suits our usecases.

This means moving to a new schema where we're able to store our concept of a post and use joins to be able to look up information such as who has liked or reposted a post, as well as which posts belong in a users feed.

This schema is the result of some long term performance testing as well as audits into our UI and the data it needs in order to render.

refs https://linear.app/ghost/issue/AP-714 We want to move away from storing raw ActivityPub objects in a KV store and instead start storing the data in the format which best suits our usecases. This means moving to a new schema where we're able to store our concept of a post and use joins to be able to look up information such as who has liked or reposted a post, as well as which posts belong in a users feed. This schema is the result of some long term performance testing as well as audits into our UI and the data it needs in order to render.

allouis · 2025-02-12T05:06:52Z

migrate/migrations/000007_add-posts-table.up.sql

+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+
+    type TINYINT UNSIGNED NOT NULL CHECK (type IN (0,1,2,3,4)),


I'm using TINYINT instead of BIT - they would both take up a byte of space, and I've read that BIT is difficult to work with

I'm not sure about using the CHECK here? Maybe we leave this at the application level instead? 5 types was to have enough for growth

Yeh I don't think BIT would be suitable for this if we plan on having more than 2 types.

I'm not sure on the CHECK, were probably going to end up doing this at the application level anyways right? But i suppose it does guarantee integrity. I guess it's just making sure its clear that allowing a new post type means updating the application level checks as well as the schema (which should be just dropping the constraint and re-applying?)

allouis · 2025-02-12T05:08:26Z

migrate/migrations/000007_add-posts-table.up.sql

+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
+
+    type TINYINT UNSIGNED NOT NULL CHECK (type IN (0,1,2,3,4)),
+    audience TINYINT UNSIGNED NOT NULL CHECK (audience IN (0,1,2)),


This is to store the audience of the post - currently thinking public, private & followers-only

How would this affect the current feed queries?

At the moment it wouldn't at all - we'd query everything - but eventually we could filter based on audience

allouis · 2025-02-12T05:10:52Z

migrate/migrations/000007_add-posts-table.up.sql

+    reading_time_minutes INT UNSIGNED DEFAULT 0 NOT NULL,
+
+    ap_id VARCHAR(1024) NOT NULL,
+    ap_id_hash BINARY(32) GENERATED ALWAYS AS (UNHEX(SHA2(ap_id, 256))) STORED UNIQUE,


We're using VARCHAR(1024) to store all our URL's which is very big, but we choose that on purpose because we ran into long URL's already (> 512 IIRC)

We want to index the ap_id for lookups, but VARCHAR(1024) is a large column to index, the idea here is to store a hash and do the indexing on this instead.

The GENERATED ALWAYS... stuff was suggested by ChatGPT - but we can do that at the application level instead if desired.

Neat idea 👍

allouis · 2025-02-12T05:11:38Z

migrate/migrations/000007_add-posts-table.up.sql

+    title VARCHAR(256) NULL,
+    excerpt VARCHAR(500) NULL,
+    content TEXT NULL,
+    url VARCHAR(1024) NOT NULL,


I didn't put a UNIQUE here to avoid 1. adding a large index and 2. are we 100% sure that URL's will be unique - ap_id should be, but it might be possible to have different posts/post_types refer to same URL?

Why would we have 2 different posts/post_types referring to same URL? By definition if they are different they should have different URLs right?

The url property isn't an identifier (https://www.w3.org/TR/activitystreams-vocabulary/#dfn-url)

So I think it's possible that you could have both an Article and Note with the same url representation

I might be missing something here, but how can 1 URL resolve to 2 different things? (without additional context)

allouis · 2025-02-12T05:11:57Z

migrate/migrations/000007_add-posts-table.up.sql

+    audience TINYINT UNSIGNED NOT NULL CHECK (audience IN (0,1,2)),
+
+    author_id INT UNSIGNED NOT NULL,
+    title VARCHAR(256) NULL,


Is this long enough?

allouis · 2025-02-12T05:12:13Z

migrate/migrations/000007_add-posts-table.up.sql

+
+    author_id INT UNSIGNED NOT NULL,
+    title VARCHAR(256) NULL,
+    excerpt VARCHAR(500) NULL,


This exceeds Ghost's custom_excerpt field so I think we're good

Yep, i think Djordje mentioned on the frontend its expected not be greater than 400 chars max

allouis · 2025-02-12T05:12:45Z

migrate/migrations/000010_add-feeds-table.up.sql

+    id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+
+    post_type TINYINT UNSIGNED,


Need for efficient queries when filtering feeds on post_type (feed vs inbox)

Should we enforce the same constraints as the posts table for these columns?

allouis · 2025-02-12T05:13:21Z

migrate/migrations/000010_add-feeds-table.up.sql

+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+
+    post_type TINYINT UNSIGNED,
+    audience TINYINT UNSIGNED,


Similar idea for above - we might want to be able to filter feeds on whether or not they're private/public

allouis had a problem deploying to build February 12, 2025 05:05 — with GitHub Actions Failure

allouis commented Feb 12, 2025

View reviewed changes

allouis temporarily deployed to build February 12, 2025 05:14 — with GitHub Actions Inactive

allouis requested a review from mike182uk February 12, 2025 05:16

fixup! Added DB tables for storing and interating with posts

3260ff8

allouis temporarily deployed to build February 12, 2025 10:21 — with GitHub Actions Inactive

fixup! Added DB tables for storing and interating with posts

dc50d1d

allouis temporarily deployed to build February 12, 2025 10:30 — with GitHub Actions Inactive

allouis merged commit 484eb5d into main Feb 12, 2025
2 checks passed

allouis deleted the posts-tables branch February 12, 2025 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added DB tables for storing and interating with posts #313

Added DB tables for storing and interating with posts #313

allouis commented Feb 12, 2025

allouis Feb 12, 2025

allouis Feb 12, 2025

mike182uk Feb 12, 2025

allouis Feb 12, 2025

mike182uk Feb 12, 2025

allouis Feb 12, 2025

allouis Feb 12, 2025

mike182uk Feb 12, 2025

allouis Feb 12, 2025

mike182uk Feb 12, 2025

allouis Feb 12, 2025

mike182uk Feb 12, 2025

allouis Feb 12, 2025

allouis Feb 12, 2025

mike182uk Feb 12, 2025

allouis Feb 12, 2025

mike182uk Feb 12, 2025

allouis Feb 12, 2025

Added DB tables for storing and interating with posts #313

Added DB tables for storing and interating with posts #313

Conversation

allouis commented Feb 12, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment