Skip to content
Konstantin Gredeskoul edited this page Aug 12, 2017 · 15 revisions

SimpleFeed

Background

Building a personalized activity feed tends to be a challenging task, due to the diversity of event types that it often includes, the personalization requirement, and the need for it to often scale to very large numbers of concurrent users. Therefore common implementations tend to focus on either:

  • optimizing the read time performance by pre-computing the feed for each user ahead of time
  • OR optimizing the various ranking algorithms by computing the feed at read time, with complex forms of caching addressing the performance requirements.

The first type of feed is much simpler to implement on a large scale (up to a point), and it scales well if the data is stored in a light-weight in-memory storage such as Redis. This is exactly the approach this library takes.

For more information about various types of feed, and the typical architectures that power them — please read:

Library Goals

The SimpleFeed ruby library aims to accomplish the following design goals:

  • To define a minimalistic API for a typical event-based simple feed, without tying it to any concrete provider implementation
  • To make it easy to implement and plug in a new type of provider, eg. using Couchbase or MongoDB
  • To provide a scalable default provider implementation using Redis, which can support millions of users via data sharding by user
  • To support multiple simple feeds within the same application, but used for different purposes, eg. simple feed of my followers, versus simple feed of my own actions.
  • To support per-user read and unread counts that can be shown in user's "new messages" widgets

High-level look at the API

For a real-world implementation scenario, you are expected to have a high-availability Redis instance available in your application environment. This Redis instance will be used to store each individual user's activity feeds. SimpleFeed stores events in an Ordered Set data type of Redis, one for each user. To "render" or "display" this user's feed, a developer must simply instantiate the user's Activity object by calling #activity on the Feed instance, and then query it via the #paginate API method.

Example. Implementing a Twitter™ Clone

To demonstrate features of this library with a well-understood example, consider a Use Case of story posting on a Social Network similar to Twitter™. The events are pushed to the follower's activity streams as soon as they are published by the authors. The relationship between the author and their followers is left for the implementation to the developer using this library. You are expected to identify what "user" means in your Use Case, and pass their ID as the key to a unique activity stream.

Non-User Activity Streams

As a side-effect of this approach, you are free to create additional activity streams by creating a fake users, such a "global" stream of events from ALL users, or geo-centered streams (you can use a zip-code of the author a "user id", and then "subscribe" users based on proximity). For simplicity's sake, we'll be using the term "user" to denote a unique instance of an activity stream, that gathers events filtered by some identifying characteristic.

API

Whenever a user on your site performs an interesting action, one that the product must display in other user's activity feeds, a developer using this library must get a list of users interested in the event first, create a multi-user instance of Activity, and then call #store on this instance. Amount of time required to store activity for all users depends on whether or not your backend is sharded, ie. you can shard Redis into hundreds of shards by using a twemproxy sharding proxy. But you may not need to do this until you are serving activity feeds for millions of unique users, because the storage requirements are rather small (assuming you keep small values in SimpleFeed).

If you are using a background-processing framework such as Sidekiq you will want to be publishing these events (i.e. calling #store) in the background job. This is because posting to thousands of users will take some time, much slower than reading individual user's feed (an operation this library is specifically optimized around).

Taylor Swift posting on Twitter, and SimpleFeed API

One could say that, if you were to implement Twitter with SimpleFeed, you would be using the Multi-User API for writing to the feed, whenever users post events. Conversely, in order to render each user's activity in real-time (and in constant time) you will be using the Single-User API, as you will be reading from the feed.

Let's say we are on Twitter, and we follow Taylor Swift. Let's also say she sends a tweet. I expect my feed to be updated, and her post added. As a developer implementing this backend, you must first fetch all of her follower IDs, and then create a multi-user Activity instance, while passing the array of IDs as an argument. You will then call #store on the instance to publish this event, serialized somehow as the string value.

And yet, the Twitter example is just one particular Use Case. You can just as easily use this software to implement a 1-1 or many-1 event publishing paradigms.

How to Represent Events in SimpleFeed

From the library perspective, the only thing you can store as a Value is a string, which is necessarily associated with a double Score when its stored. Most common application for the Score field is a timestamp (it's also the default, if you do not pass the score yourself). But there is no such as thing as the most common Value. This is because how you store your events is up to you, the developer, to decide.

Having said that, here are some considerations:

  • the less you store the more a single-instance Redis backend can scale up without having to be sharded
  • the less you store the faster network interactions will be with the feed, and less likely it will become a point of contention
  • the less you store in SimpleFeed, the more you probably will have to fetch in addition to what SimpleFeed returns, to show the user their activity.

With that in mind, imagine that we are building an application where most objects in the system have a database identifier field, specifically — the table of events. We created this polymorphic EVENTS table to records all interesting events that happened, associated only with their producer and not with the consumers (i.e. followers). This is easy and relatively cheap (up to a point) implement, and ensures you have an authoritative list of events in your system, and you can always re-generate user activity in SimpleFeed, should you need to.

Now, all you have to store in the Value is the database ID of the event. You can even store it as a base64-encoded number, for additional compactness.

Then, to render a given user's feed, you'd first call #paginate to get a page-worth of event IDs, and then fetch the actual data from the database, using primary-key lookup against an array of IDs. This will scale very well, and is easily cached, because all lookups are always by the primary key. Caching libraries like cache-object might offer a drop-in write-through caching solution.