diff --git a/docs/anatomy.rst b/docs/anatomy.rst
deleted file mode 100644
index 61882054d..000000000
--- a/docs/anatomy.rst
+++ /dev/null
@@ -1,71 +0,0 @@
-*******
-Anatomy
-*******
-
-We take a high-level look at the main functionalities in a caching system and how they are performed by various roles.
-
-Life of A Request
-=================
-
-A cache request is rather simple- a request is made for some identifiable data, and a result is returned regarding the state of that request.
-
-Request Remote Data
--------------------
-
-Local Proxy
-^^^^^^^^^^^
-
-A request starts with the client, which has to serialize it according to some agreed-upon protocol to prepare for network transmission. If the proxy is co-locating with the client, the request is then examined by the proxy, which picks out the useful bits for routing (e.g. the key) and comes up with one or more destinations. The encoded request is then sent over the Ethernet and (hopefully) reaches one or more servers. Each server, upon receiving data from the network, de-serializes the request, processes it, and sends back a reply in the same route that it came.
-
-
-Remote Proxy
-^^^^^^^^^^^^
-
-If proxies are run as independent processes on separate machines, additional work needs to be done before routing or other tasks. Proxies have to both receive and send data from the network, perform the same serialization required for clients and de-serialization required for servers. Proxies can also translate between different protocols, or intercept replies and alter them.
-
-
-Server-side Proxy
-^^^^^^^^^^^^^^^^^
-
-If proxies are colocated with servers, they will mostly act as a remote proxy except when the routing destination is the same as their current process, in which case it should act as a server.
-
-
-Failure Scenarios
-^^^^^^^^^^^^^^^^^
-
-The failures of clients are out of scope of the cache design (although, failing to complete a write may introduce data inconsistency or corruption down the line, so that should be a concern for users). A failure in the proxy, such as failing to forward a request, can be effectively mitigated by a combination of client retry and reroute to a different proxy. Individual requests may still fail after exhausting the common options, but these failures should be viewed as independent, rare events without further ramificationsand can be safely retried at the client level.
-
-The failures of servers are much more serious. A server going away means the topology is temporarily wrong. This not only means further retry without topology update is unlikely to help, but also means any future request destined to this particular shard will probably also fail until the topology is somehow "fixed". Unlike proxy failures, the best strategy here is not to retry, even reroute might be possible in some systems for certain type of requests, but to wait until topology is updated. It is very easy for a large cohort of clients to DDoS a single server through some aggressive retry logic that stateless systems with many nodes can much more easily survive. However, topology updates are not part of the "normal" request path, and thus don't belong to the fast path. It happens a lot slower compared to request rates and can affect many requests in a row.
-
-When the various roles make logical mistakes instead of flat out fall off the map, the situation gets more complicated. For example, incorrect routing decision at the proxy role may lead to data inconsistency, which is both subtler and more difficult to deal with than servers going offline. A lot of similar mistakes effectively corrupt the topology, which requires detection of such issues in a caching system. Unfortunately, unlike many databases, such logic is often missing in caching, trading correctness for performance. Because of such risks, users should never keep their data in cache indefinitely. A much safer practice is to always let data expire after a certain period of time to bound the amount of inconsistency that may persist.
-
-Local Cache
------------
-
-When the cache is local, it often means the proxy role degenerates to a statically configured target for the cache. The communication protocol is also switched to something cheaper and simpler compared to ones used for RPC. The client and server roles look largely unchanged.
-
-In-process Cache
-----------------
-
-When the cache is in-process, the boundary between client and server also melts away. There often isn't any explicit communication protocol at all, as the most efficient thing to do is to pass memory references. Similarly, serialization is short-circuited by in-memory format. In-process cache becomes a packaged version of one of more data structures.
-
-Importance
-----------
-
-The data plane is defined by the functionalities needed during life of a request, and nothing more.
-
-
-Life of A Server
-================
-
-Servers *are* the topology. With a correct set of servers, there are numerous ways to reliably come up with a configuration that would work. Proxies and clients also need to be aware of the topology, but their stateless status w.r.t. cache data means it is far simpler to manage them.
-
-What happens when the server set changes? When a server first joins an existing topology, it needs to make its presence known by signaling the manager or making the information descoverable. The manager re-evaluates the topology with a new set, and distributes the new topology or the update to all proxies. Here the catch is that the update may be done in a centralized, or consensual way by the manager, but the distribution is probably asynchronous in nature, and thus proxies can fall out of sync with each other. Most existing systems simply ignore this scenario and deem the "gap" small enough to be overall safe. When a server goes offline, it can be graceful- where the server tells the manager, or sudden- where there is nobody there to announce the departure. In the latter case, the manager needs failure detection to catch the quiet quitters. Either way the manager then proceeds with topology change and announcement.
-
-Managing server membership and status is undoubtly the core of the management/control plane, while the distribution of such information another important, but often overlooked aspect.
-
-
-Manager
-=======
-
-There isn't a life cycle expected for the manager. As a role it is supposed to be there for all the decisions that need to be made about system topology and policy, and monitor the whole system at a high level. When the manager goes away, topology is temporarily frozen and the system becomes vulnerable against future topological changes. A breakdown in the manager often requires human attention and fixes that are not built into the system design. As much as software developers would like to automate everything, the manager role stands as the last frontier between human operator and the system itself.
diff --git a/docs/dataplane.rst b/docs/dataplane.rst
deleted file mode 100644
index bedeeae63..000000000
--- a/docs/dataplane.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-**********
-Data Plane
-**********
-
-Data plane is conceptually the simplest, but with the most stringent performance requirements. As such, it often favors simple, deterministic algorithms and data structures, and an efficient, deterministic runtime.
-
-All caching scenarios except for the in-process cache requires inter-process communication between the client and server (by ways of proxy as needed), and the successful processing of requests in these scenarios is not guaranteed, even when the other processes are local to the machine. Depending on the setup, the system may resort to different media and protocols to carry out the communication. If the requests are sent to different machines on the network, most likely TCP connections or UDP will be deployed. When the destination is local, more efficient communication can be achieved over other media such as Unix Domain Socket, pipes, or shared memory. At the application level, the requests need to be packed and unpacked for the communication as well, using protocols defined by specific solutions. Memcached has both an ASCII and a binary protocol that have much syntatic overlap but are lexically distinct. Redis uses RESP (REdis Serialization Protocol) between client and server, which is a plaintext protocol incompatible with either of the memcached protocols. Twitter has been using Thrift partially along the communication path of Nighthawk.
-
-Overall, choosing a communication protocol right for the scenario and having an efficient, easy-to-understand application protocol is rather crucial to data plane performance, since communication dominates clock cycles and resources in the simple use cases [#]_.
-
-The programming model around inter-process communication is a subject of great complexity and headache. Generally it is agreed that a synchronous model that provide concurrency by using kernel threads does not perform or scale well [citation needed], unless support for user-level threads are provided [citation needed]. Asynchronous programming allows us to scale better by multiplexing many communication channels in each running thread, avoiding idle waiting, and keeping thread-related overhead under control. However, the burden of keep states now falls on application developers. The apparatus for asynchronous communication- event libraries and asynchronous IO syscalls, are not most programmers' friend, and development can be slow and buggy. And many choose to use an abstraction that hides the implementation details, such as `Finagle/Netty `__.
-
-It is quite obvious that dealing with data inside a process's own address space is significantly easier than remote data. And thus it makes sense to draw a line there- application logic handles computing based on data already in memory, while a library takes care of everything else.
-
-Cache Common *is* the library that "takes care of everything else".
-
-.. [#] `Twemcache Performance Analysis `_
diff --git a/docs/development.rst b/docs/development.rst
deleted file mode 100644
index b3ce2eaee..000000000
--- a/docs/development.rst
+++ /dev/null
@@ -1,36 +0,0 @@
-****************
-ccommon Overview
-****************
-
-Development
-===========
-
-Building a Cache Common library has a lot to do with the historical context that gave rise to the idea. The current open-source caching field is both sparse and crowded- most of what people use today trace back to either Memcached or Redis, maybe forked over them. On the other hand, many of the customization and improvement individuals come up with don't make their way back into the trunk version very easily, indicating an architectural problem with the original codebases.
-
-Fundamentally, there hasn't been enough modularity in either project, or the many that derived from them, to encourage sharing and reuse. During our own, multiple attempts to create Twemperf, Twemproxy, Fatcache and Slimcache, regardless of whether we were writing a server, a proxy or a client, we had to resort to copy-and-paste, and littered our changes across the landscape.
-
-It certainly *feels* that there was a lot in common among these projects, so we formalized it by abstracting the commonality as modules and putting them in a single library, which is ccommon, short for Cache Common.
-
-We went through the existing code bases that have implemented some or all of the core functionalities and have been tested in production for years, synthesized them, made changes whenever necessary, to make the APIs generic enough for all the use cases we know of. Inheriting code makes it faster and more reliable to build the library; holding the APIs against known implementations allow the core library to be generic and flexible enough for our needs.
-
-Given that multi-threaded programs are much harder to get right than their single-threaded counterpart, and sometimes incur non-trivial synchronization overhead, the priority is to get single-threading right first, with the possibility of investing in multi-threading in the future on a module-by-module basis.
-
-
-Goals
-=====
-
-#. Modularized functionality with consistent, intuitive interface.
-
-#. Use abstraction to allow compile-time choice from multiple implementations of the same interface, but avoid excessive, multi-layered abstraction unless absolutely necessary.
-
-#. Production quality code that has built-in configuration, monitoring and logging support.
-
-
-#. Work well on platforms emerging in the next 3-5 years.
-
-
-Modules
-=======
-
-:doc:`modules/cc_ring_array`
-
diff --git a/docs/history.rst b/docs/history.rst
deleted file mode 100644
index d9d65e2b4..000000000
--- a/docs/history.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-Genesis
-=======
-The Twitter Cache team (which were part of Runtime Systems, and before that, Infrastructure) started working on a fork of Memcached 1.4.4 in 2010. In 2011, with the launch of Haplo, it also took over the maintenance and improvement of Redis.
-
-Over time, we have made significant changes to the code bases we inherited, created and open-sourced Twemcache, Twemproxy and Fatcache. The prolification of projects that are all related to managing and serving data out of memory hints a lack of common infrastructure. And indeed, the projects we have mentioned have a lot in common, especially when you examine the core mechanisms that drives the runtime and low-level utilities.
-
-This is why we decide to work on a project called Cache Common, or ccommon in short. Instead of stretching our developement/maintenance effort thin over all these individual code bases, it makes sense to build a library that captures the commonality of these projects. We also think the commonality may very well extend beyond just in-memory caching, and can provide a toolbox of writing other high-throughput, low-latency services.
diff --git a/docs/index.rst b/docs/index.rst
index fcc857b5e..52624416c 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -10,11 +10,7 @@ Subscribe to `cache-user `_ for announcement, send feedb
:glob:
:maxdepth: 2
- history
overview
- anatomy
- dataplane
- development
modules/*
diff --git a/docs/overview.rst b/docs/overview.rst
index b532ddc4a..a029e3bd0 100644
--- a/docs/overview.rst
+++ b/docs/overview.rst
@@ -1,77 +1,36 @@
-**************
-Cache Overview
-**************
+********
+Overview
+********
-What is Caching
-===============
+Background
+==========
-What is the essense of caching?
+Building a Cache Common library has a lot to do with the historical context that gave rise to the idea. The current open-source caching field is both sparse and crowded- most of what people use today trace back to either Memcached or Redis, maybe forked over them. On the other hand, many of the customization and improvement individuals come up with don't make their way back into the trunk version very easily, indicating an architectural problem with the original codebases.
-There are many different definitions given for their more or less narrowly defined contexts. Caching is ubiquitous- from CPU to CDN, from hardware controllers to wide-area networks. Caching also varies greatly in its medium and location, and as a result, its speed- CPU cache using SRAM clocks a mere couple of nanoseconds per operation, while the nearest CDN server for a website across the globe takes seconds before sending an high resolution image. But there is one invariant- people use cache as a way to get data faster and more cheaply.
+Fundamentally, there hasn't been enough modularity in either project, or the many that derived from them, to encourage sharing and reuse. During our own, multiple attempts to create Twemperf, Twemproxy, Fatcache and Slimcache, regardless of whether we were writing a server, a proxy or a client, we had to resort to copy-and-paste, and littered our changes across the landscape.
-The lifeline of caching is **performance**, the one property that justifies its existence. People routinely tolerate slightly incorrect or stale data so that *some* version of the data can be served to them quickly. Caching is also the answer to a lot of scalability problems- although often labeled as an optimization, there are plenty of services that will stop working at all if caching is suddenly removed.
+It certainly *feels* that there was a lot in common among these projects, so we formalized it by abstracting the commonality as modules and putting them in a single library, which is ccommon, short for Cache Common.
+We went through the existing code bases that have implemented some or all of the core functionalities and have been tested in production for years, synthesized them, made changes whenever necessary, to make the APIs generic enough for all the use cases we know of. Inheriting code makes it faster and more reliable to build the library; holding the APIs against known implementations allow the core library to be generic and flexible enough for our needs.
-Caching in Datacenters
-======================
+Given that multi-threaded programs are much harder to get right than their single-threaded counterpart, and sometimes incur non-trivial synchronization overhead, the priority is to get single-threading right first, with the possibility of investing in multi-threading in the future on a module-by-module basis.
-Caching in datacenters is the primary concern of this project (and we'll refer to it simply as 'caching'). How do we make caching fast among a large number of servers interconnected in a predefined networking topology?
+Goals
+=====
-Infrastructure
---------------
+#. Modularized functionality with consistent, intuitive interface.
+
+#. Use abstraction to allow compile-time choice from multiple implementations of the same interface, but avoid excessive, multi-layered abstraction unless absolutely necessary.
-Caching in a datacenter has to abide by the rules of that particular environment- the physics of networking fabrics and how fast they can transmit data, and the software that regulates the sending, forwarding and receiving of data. If it takes at least 100μs for any data from server A to reach server B and vice versa, it will take at least 200μs to request a chunk of data and receive it. if the Linux kernel networking stack takes 15μs to process a TCP packet, more time will be added to fulfill each request over TCP. Fast caching means getting on the best environment available- a faster networking infrastructure, or the choice of a faster medium (local memory over remote memory, memory over disk) can have a huge impact.
+#. Production quality code that has built-in configuration, monitoring and logging support.
-Most datacenters are still based on ethernet. Network bandwidth ranges from 1Gbps to 40Gbps, with 10Gbps increasingly becoming the common option. The end-to-end latencies between servers are often on the order of 100μs. SSDs have a seek time at about the same level, with a bandwidth somewhere between 100MB/s and 1GB/s, also comparable. Spinning disks, on the other hand, have a seek time that's 1-2 orders of magnitude higher and thus much slow for random read/write. Main memory (DRAM) bandwidth are on the order of 10GB/s with an access latency at about 100ns, with a fast increasing capacity per unit cost. The following figure captures the relative "closeness" of different data locations.
-.. image:: _static/img/data_access_speed.jpeg
+#. Work well on platforms emerging in the next 3-5 years.
-The infrastucture commonly available implies a few things: first, local memory access is significantly faster than remote memory access, it also offers much higher throughput; second, SSD and LAN performance are comparable both on latency and throughput, depending on specific products/setup, indicating making a choice between the two is not trivial, either. However, getting data remotely means the system will have better scalability w.r.t. both data size and bandwidth through sharding, which may explain the dominance of distributed caches. Finally, it is worth mentioning some game-changing technologies: Infiniband lowers the E2E latency by two orders of magnitude, and often completely demands re-architecting the systems built on top of it. Emerging medium such as nonvolatile memory further blurs the boundary between various storage media, and will require architects to rethink their storage hierarchy.
+Modules
+=======
-Design
-------
-
-How do we approach caching from a design perspective?
-
-Assumptions
-^^^^^^^^^^^
-
-The reality of infrastructure today means a few design decisions are common:
-
-#. "hot" data usually reside in main memory and sometimes in SSD, but if such data also comprise mostly of small objects (by standards of SSD page size) without locality among them, then they almost always reside in memory, because SSD is not efficient for tiny reads/writes. Larger data can be more efficiently stored with SSD while still keeping up with ethernet.
-#. given data size and scalability requirement, cache is managed as a stateful distributed system, and sharding/routing is required. Given the popularity of key-value store and NoSQL databases, caching often takes the format of distributed, in-memory key-value storei and applies sharding applies to key only. In many cases, cache even looks and behaves like a database.
-#. local caches, ones that can be visited by inter- or intra-process communication, are used when the network bandiwdth and/or latency becomes a bottleneck, especially when they create unevenness or "hotspots" among the data shards.
-
-
-On top of these decisions, efficient caching must continue to manage the infrastructure real estate well by doing as little non-essential work as possible, and having as little interference as possible.
-
-
-Layered Functionality
-^^^^^^^^^^^^^^^^^^^^^
-
-It is helpful to learn from existing distributed systems that are stateful and performance oriented, one of them being the networks themselves. Also having handle states while trying to maximize throughput and minimize latencies, networking technologies in recent decades adheres to the divide between control plane and data plane (aka forwarding plane) rather strictly. In short, control plane is the part of the logic in each node that deals with the state of the distributed system- how each node should interact with other nodes properly; it also hands useful, state-related information over to data plane so the latter can perform logic such as routing correctly. On the other hand, data plane is where each individual request or data exchange is handled, this is where the performance is perceived by the end user. So it is not surprising that data and control plane are responsible for carrying out work on the "fast path" and "slow path" correspondingly, and a trip through the control plane is meant to be slower. Networking community demonstrated that by having clear divide between different parts of the systems that are optimized for different goals, they can make packet processing fast while keeping the state of the system well-managed.
-
-The claim here is that we should apply the same analogy to high-performance caching systems. The layered networking model is an effective way of minimizing work and interference on performance-critical paths. By explicitly calling out functionalities and components that are performance critical or not, and segregate them as much as possible, we can better define the work that *has* to happen for every single request, while pushing other functionalities elsewhere. Furthermore, we can take measures so that the "fast path" operations take priority, and are not interrupted or interfered unnecessarily by those happening on the "slow path". Both kinds of mistakes are especially likely or even tempting when functions, threads, and processes *can* run indiscrimintively on the same set of system resources, seem perfectly blendable under the name of code sharing and reuse, unlike many dedicated routers where even hardware is highly specialized for the particular plane it serves.
-
-Software-defined networking brings distributed storage systems and networking systems even closer together in the datacenter. Traditionally, control plane decisions are often made independently by each node, since large-scale communication is unpredictable or plainly impossible. However, datacenters represent a special kind of environment where homogeneity and centralized control can be achieved much more easily. Reaching concensus in a large distributed system is expensive and slow [citation needed], so most scalable solutions delegate the decision-making to a central location, represented by one or a relatively small number of nodes. Orchestration increasingly applies to both network topology as well as sharding, and a lot of the functionalities at the control plane level are increasingly drained, centralized, and the control plane inside each node greatly thinned. This emphasizes the importance of an often-neglected term "management plane" [#]_, which is the centralized brain of distributed systems, and serves as the interface where human operators will come to interact with an otherwise highly abstracted and automated [#]_.
-
-Organizing functionalities into layers is more than a frivolous exercise. It provides a powerful mental model to focus and differentiate. For example, once we establish a boundary between data and control plane, it becomes more natural to make different language choices for different parts- we may want to use a highly expressive, and potentially verifiable language/implementation for the control plane, while leaning toward languages that are closer to bare-metal hardware for the data plane. The management plane, due to the need to interact with operators, may call for yet another language that's declarative in nature. We thus match each plane with languages that enhances the most desirable properties for that particular layer. Similar considerations can be found throughout the design process, where such a division can be liberating.
-
-Anatomy
-^^^^^^^
-
-There are four roles in a caching system: server, client, proxy, and manager. Servers collectively hold all the data, decide data retention policy, apply updates, and serve other requests related to data. Clients initiate the data requests. Proxies route and dispatch data requests, either by sending it to a server, or by sending it to another proxy. Manager determines the topology and routing policies which proxies follow, and may also monitor the health of servers and other roles if necessary.
-
-We are calling these entities roles instead of parts or nodes because they are logical. While these roles often have their separate modular representation, they don't have to be "physically" (i.e. machine-wise) separated. The proxy can run along side the client, or the server, or by itself. All three entities may reside on the same machine, the proxy may degenerate and disappear when routing is static and simple, etc. However, the functionalities these roles provide are universal in any caching system. For example, finagle-memcached as a library serves as a combination of the client role and proxy role. Many memcached users using such a client also skip an explicit manager, but assume server topology is mostly fixed, and requires human intervention when a server is offline, thus effectively turning the system operator into a manager. When a cache is in-process, neither proxy nor manager is necessary, since routing is trivial and the availability of the cache is guaranteed as long as the process is alive.
-
-One of the simplest computing model in a distributed system is the client-server model, and that's how caching started. Here, we call out an more complex four role model mostly based to two facts. First, caching systems are stateful since they hold a large amount of data, this means having a single functional view of the system topology is crucial to route requests correctly and consistently (i.e. clients won't diverge on their world-view). To reach a concensus among a large number of nodes would be difficult and expensive, if not impossible. And even the task of monitoring the topology is unnecessarily complicated for individual nodes. This justifies the role of the manager. Second, caching is rather prevelant in modern Web architecture and other types of data-intensive applications. With the increased popularity of microservices, many components in a single system will have their own needs for caching, which often can be served using the same technological stack but individual configurations. While functionalities such as routing is fundamental to the service, it can be involve a fair amount of computation, and s often subject to change. Hence it quickly becomes a logistic nightmare trying to coordinate with dozens of different clients, which in turn means dozens of customers/teams, to apply any nontrivial updates. This practical concern drives owners of the caching technology to minimize the interface visible and managed by their customers- in other words, a thin client that doesn't know or worry about state of the whole system. This preference thus justifies proxy as its own role, so routing and other features can be provided outside of the customers' direct control.
-
-The different functionality layers and roles will be discussed in more details in their own section.
-
-
-.. [#] `Remembering The Management Plane `_
-
-.. [#] `The Control Plane, Data Plane and Forwarding Plane in Networks `_
+:doc:`modules/cc_ring_array`