From fa8f2c41dde0faa4708f6d312690c529b123636b Mon Sep 17 00:00:00 2001 From: Christine Yen Date: Sun, 5 Mar 2017 19:10:27 -0800 Subject: [PATCH] Clean up doc.go, copy some of it into README --- README.md | 16 +++++++++++++++- doc.go | 10 ++++------ 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 728576c..6d1e984 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,21 @@ # dynsampler-go -Dynsampler is a golang library for doing dynamic sampling of traffic before sending it on to Honeycomb (or another analtics system) +Dynsampler is a golang library for doing dynamic sampling of traffic before sending it on to [Honeycomb](https://honeycomb.io) (or another analytics system) +It contains several sampling algorithms to help you select a representative set of events instead of a full stream. + +A "sample rate" of 100 means that for every 100 requests, we capture a single event and indicate that it represents 100 similar requests. For full documentation, look at the [dynsampler godoc](https://godoc.org/github.com/honeycombio/dynsampler-go). +## Sampling Techniques + +This package is intended to help sample a stream of tracking events, where events are typically created in response to a stream of traffic (for the purposes of logging or debugging). In general, sampling is used to reduce the total volume of events necessary to represent the stream of traffic in a meaningful way. + +There are a variety of available techniques for reducing a high-volume stream of incoming events to a lower-volume, more manageable stream of events. +Depending on the shape of your traffic, one may serve better than another, or you may need to write a new one! Please consider contributing it back to this package if you do. +* If your system has a completely homogeneous stream of requests: use `Static` sampling to use a constant sample rate. +* If your system has a steady stream of requests and a well-known low cardinality partition key (e.g. http status): use `Static` sampling and override sample rates on a per-key basis (e.g. if you know want to sample `HTTP 200/OK` events at a different rate from `HTTP 503/Server Error`). +* If your logging system has a strict cap on the rate it can receive events, use `TotalThroughput`, which will calculate sample rates based on keeping *the entire system's* representative event throughput right around (or under) particular cap. +* If your system has a rough cap on the rate it can receive events and your partitioned keyspace is fairly steady, use `PerKeyThroughput`, which will calculate sample rates based on keeping the event throughput roughly constant *per key/partition* (e.g. per user id) +* The best choice for a system with a large key space and a large disparity between the highest volume and lowest volume keys is `AvgSampleRateWithMin` - it will increase the sample rate of higher volume traffic proportionally to the logarithm of the specific key's volume. If total traffic falls below a configured minimum, it stops sampling to avoid any sampling when the traffic is too low to warrant it. diff --git a/doc.go b/doc.go index 062d17d..bc57fb8 100644 --- a/doc.go +++ b/doc.go @@ -5,21 +5,19 @@ This package is intended to help sample a stream of tracking events, where event For the purposes of these examples, the "traffic" will be a set of HTTP requests being handled by a server, and "event" will be a blob of metadata about a given HTTP request that might be useful to keep track of later. A "sample rate" of 100 means that for every 100 requests, we capture a single event and indicate that it represents 100 similar requests. -Use - Use the `Sampler` interface in your code. Each different sampling algorithm implements the Sampler interface. The following guidelines can help you choose a sampler. Depending on the shape of your traffic, one may serve better than another, or you may need to write a new one! Please consider contributing it back to this package if you do. -* If your system has a completely homogeneous stream of requests: use `Static` with only the default set. +* If your system has a completely homogeneous stream of requests: use `Static` to use a constant sample rate. -* If your system has a steady stream of requests and a well-known low cardinality partition key (e.g. http status): use `Static` and override sample rates on a per-key basis (e.g. if you know want to sample `HTTP 200/OK` events ata different rate from `HTTP 503/Server Error`). +* If your system has a steady stream of requests and a well-known low cardinality partition key (e.g. http status): use `Static` and override sample rates on a per-key basis (e.g. if you know want to sample `HTTP 200/OK` events at a different rate from `HTTP 503/Server Error`). -* If your logging system has a strict cap on the rate it can receive events: use `TotalThroughput`, which will calculate sample rates based on keeping *the entire system's* representative event throughput right around (or under) particular cap. +* If your logging system has a strict cap on the rate it can receive events, use `TotalThroughput`, which will calculate sample rates based on keeping *the entire system's* representative event throughput right around (or under) particular cap. * If your system has a rough cap on the rate it can receive events and your partitioned keyspace is fairly steady, use `PerKeyThroughput`, which will calculate sample rates based on keeping the event throughput roughly constant *per key/partition* (e.g. per user id) -* The best choice for a system with a large key space and a large disparity between the highest volume and lowest volume keys is AvgSampleRateWithMin - it will increase the sample rate of higher volume traffic proportionally to the logarithm of the specific key's volume. If total traffic falls below a configured minimum, it stops sampling to avoid any sampling when the traffic is too low to warrant it. +* The best choice for a system with a large key space and a large disparity between the highest volume and lowest volume keys is `AvgSampleRateWithMin` - it will increase the sample rate of higher volume traffic proportionally to the logarithm of the specific key's volume. If total traffic falls below a configured minimum, it stops sampling to avoid any sampling when the traffic is too low to warrant it. */ package dynsampler