Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for a large influx of new users to the Bookwyrm network #2874

Open
10 tasks
WesleyAC opened this issue Jun 16, 2023 · 4 comments
Open
10 tasks

Prepare for a large influx of new users to the Bookwyrm network #2874

WesleyAC opened this issue Jun 16, 2023 · 4 comments

Comments

@WesleyAC
Copy link
Member

There's a pattern of large tech companies shooting themselves in the feet (Twitter and Reddit most recently, but many others before that), and users flocking to alternatives that are not well-prepared to handle the load. As the tech industry implodes due to rising interest rates and executives melting their brains into sludge via LLMs, we should ensure that we are well-prepared for a influx of users when Goodreads inevitably starts falling apart, both by optimizing codepaths that will be hit hard during a influx of new users (imports in particular are likely to be fairly brutal), creating a plan for how to manage the load (both server load and support/moderation/etc load), and providing guidance to people currently running bookwyrm instances and people interested in starting new instances.

It's hard to get numbers for the size of the active GoodReads userbase (the total userbase is ~100M, and they claim to have ~45M monthly active visitors, but presumably many of those don't have a account), but I think aiming to be able to absorb ~500k users into the network (hopefully across many instances, not just bookwyrm.social) over the course of about a week would be a reasonable target, from looking at numbers from Twitter → Mastodon and Reddit → Lemmy/Kbin migrations.

I should also mention that I would prefer to grow much more organically and gradually than this — I think a giant influx of users all at once is almost always worse than those same users coming over gradually — but looking at what is happening to Lemmy and Kbin right now, it's clear that we aren't necessarily going to have a choice in the matter, so we should prepare while we can.

Some specific things that we should look into:

Code / Technical

  • Audit the import code for performance problems, and figure out the approximate rate of imports that a single instance will be able to handle. I suspect that this will be a major problem, as a couple large imports on bookwyrm.social can already back up the queues for a little while. In particular, I am pretty sure that imports generate a bunch of work on the non-import queues, which should be fixed if it is the case. Imports being slow during a influx of users is probably okay, but the entire site being slow would be bad.
  • Figure out how to scale the import process. The way we have bookwyrm.social set up doesn't allow us to super easily scale up the number of Celery workers (we need to take some downtime rebooting the VM every time we want to scale up, which is alright but would be better if we didn't have to), but it's unclear to me if the IPC costs of moving workers to a remote machine would be worth the ability to scale that it buys us. I think at the very least, having a runbook for how to scale in this way and investigating hybrid approaches (keeping most of the queues on the same machine, but import queues on a different machine, for instance) would be beneficial
  • We should build easier switches to turn off non-essential tasks (recommended user recalculation, link preview generation, etc), ideally in the admin interface. If there are other things that should be in this list, let me know.
  • Do more performance tuning work on the whole. There's still a lot of inefficiency floating around that I haven't fully characterized yet, but I think some diligent work could offer at least a order of magnitude improvement in the number of active users we'd be able to support on given hardware. The performance tag has a lot of these written down, but there are more in my head as well.

More instances

  • Provide performance tuning instructions in docs / improve the default config. I've done a lot of tweaking on bookwyrm.social that people shouldn't have to reinvent themselves.
  • Encourage more people to set up instances. Right now we only have 14 open-registration instances, and many of those are language-specific or otherwise niche (a good thing, to be clear, but likely not of a huge help in a large migration of GoodReads users). My guess is that if a GoodReads migration were to start right now, bookwyrm.social, bookrastinating.com, books.theunseen.city, and ramblingreaders.org would see the vast majority of the load. If anyone wants help setting up a instance, please do reach out to me, I'm happy to provide advice and even walk you through the technical aspects of the setup.
  • Create some space for coordination between instance admins (maybe this is Matrix, which uh I should probably figure out how to be in)

Onboarding

  • Add setting to expose the invite system to users. Right now, instances can only be open or closed registration, and closed registration requires admins to dish out the invites. It would be good to have a mode that allows users to make invite links, since this lets overwhelmed admins slow down the rate of new users without closing them off completely (thus shifting the load to other instances)
  • Consider changing joinbookwyrm.com to distribute load more evenly/randomly. For instance, clicking "join" could show a user a random instance (that has opted into being shown in this way), with a link/button to see the full list of instances, rather than dumping them on something that shows bookwyrm.social (and other large instances) first by default. Additionally, we could look at the browser language setting to recommend a instance in the user's preferred language.

Process

  • Make a runbook for what to do when a influx of new users starts, both at a project-wide level (comms-wise) and for individual instances

I hope to turn these into individual issues for more easy tracking, but I figure it's useful to start a thread about the possibility of a large influx of the kind that Mastodon saw in the past and Lemmy/Kbin are seeing right now — if you have more ideas for things that would help, please do chime in. While I hope for our sake that we grow slowly and see a gradual shift away from GoodReads rather than a sudden implosion, I think it's worth putting some effort into preparing for the worst.

@hughrun
Copy link
Contributor

hughrun commented Jun 17, 2023

This all looks pretty thorough @WesleyAC and you're totally right that we need to think about this before it happens.

I'd add that it doesn't even have to be a full-blown migration from a book-focused application - the Bookwyrm world was overwhelmed for a few weeks shortly after the last Twitter migration to Mastodon because people were excited to hear about other fediverse apps once they understood what the fediverse is.

I could be misreading the code but it seems to me that the biggest bottleneck is when connector_manager.first_search_result can't find anything in the local database and starts searching remotely, which brings in huge latency problems. At some point we probably need to make a trade-off (or enable system admins to make their own trade-off) in terms of both the user onboarding experience and the existing user experience. e.g. maybe admins can temporarily force import jobs to skip imports if the item is not already in the local database, with some functionality to allow the remainder to import later when the server is more able to handle it. This would give new users something to see in their shelves and readthroughs, and be less likely to tank the server.

A different/additional approach I'm mulling over is whether some kind of shared index would help at all - either something like a relay but specifically for Edition objects (i.e. that otherwise have no reason to be in your local database because they aren't linked to any user activity), or a simpler table of basic Edition data (again via a relay-like tool to distribute the data across Bookwyrm instances) that simply points to a known Bookwyrm instance where the local server can query a remote_id immediately for the full object instead of cycling through a bunch of connectors. The latter may not really be any more useful, I'm just thinking maybe it's better for smaller instances in terms of disk space, but maybe it doesn't make enough difference?

@WesleyAC
Copy link
Member Author

joinbookwyrm.com (somewhat regrettably) getting on the front page of Hacker News today provides a good study of what a influx of new users looks like — bookwyrm.social saw ~500 new users (instead of the ~30-50 we'd see on a normal day), including ~28,000 imports from ~80 users.

The biggest problem seems to be that imports generate items on non-import queues, slowing down more user-facing queue items like generating timelines. This could be pretty easily fixed by making another queue for import-triggered tasks (I still need to figure out exactly what those are, though, it's not entirely clear from tracing).

In general, switching from the current queue "priority" system to a more functionality-based system would I think probably be preferable. The current status quo is that everything in the "high-priority" queue will be behind by the same amount of wall-clock time, which makes everything in it equally slow, when it might be better to have some particular tasks fall behind, while other (less common or compute intensive) tasks are operating in real time.

It might also be good to have multiple import queues depending on the number of items in the import, so that users who are just importing 50 books or so don't get stuck behind users importing thousands. I think having three import queues, one for <100 items, one for <1000, and one for >1000 would be good. It would probably be nicer if Celery had a way of doing task priority, since splitting import queues is essentially providing more processing power to imports than other things, which is potentially somewhat undesirable — I may think about that a bit more.

@mouse-reeve
Copy link
Member

Thank you! Celery does have priority-based queueing: https://docs.celeryq.dev/en/latest/userguide/routing.html#redis-message-priorities -- I didn't understand the documentation correctly when I initially implemented our current queue system (and I'm not totally sure how this would work at the moment).

Also worth noting the bookwyrm.social is handling this far better than in previous times and that's in large part because of the fixes you've added!

@mouse-reeve
Copy link
Member

mouse-reeve commented Jul 20, 2023

Off the top of my head, I believe imports trigger add_status_task when a review is added, and create_edition_task and load_more_data when adding new-to-the-database books. They can trigger broadcast tasks (only to other bookwyrm instances) if the user has followers (often they don't because people often start imports first thing). Conversely, imports on other instances can create inbox.activity_task and add_status_tasks on instances they are connected to.

Edit: also add_book_statuses_task!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants