From 633c6c16e87aaf70e7a0f92f820b34bc86e8f215 Mon Sep 17 00:00:00 2001 From: Edd Robinson Date: Wed, 7 Sep 2016 16:54:33 +0100 Subject: [PATCH 1/3] Add cardinality estimation proposal --- docs/tsm/TSI_CARDINALITY_PROPOSAL.md | 254 +++++++++++++++++++++++++++ 1 file changed, 254 insertions(+) create mode 100644 docs/tsm/TSI_CARDINALITY_PROPOSAL.md diff --git a/docs/tsm/TSI_CARDINALITY_PROPOSAL.md b/docs/tsm/TSI_CARDINALITY_PROPOSAL.md new file mode 100644 index 00000000000..b2b537f6564 --- /dev/null +++ b/docs/tsm/TSI_CARDINALITY_PROPOSAL.md @@ -0,0 +1,254 @@ +# TSM Series Cardinality Estimation Proposal + +This document describes a proposal for providing future cardinality readings of series and measurement data in a database. + +The existing approach of counting distinct series and measurement data will not scale to billions of series in a database, and regardless would be complicated by implementation of the new TSI proposal (#7174). + +The rest of the proposal provides more background and a proposed approach for providing scalable cardinality readings. + +## Background + +Currently the TSM engine maintains in-memory inverted indices, which store measurement and series data for each database. +Each database's index is shared amongst the shards for that database. +As discussed in more detail in #7151, this presents a number of problems for databases with large numbers of series. +A proposal to solve these performance problems by implementing a new memory-mapped inverted persistent index has been made in #7174. +The in-memory shared indices would be replaced with these m-mapped time-series indices. + +In the current shared in-memory index, we calculate the series and measurement cardinality for a database, by counting the number of distinct series and measurements, which only involves a single index, since all series and measurement data is held within the same shared index across shards. +However, with the new proposed approach, each shard would have a separate index, and with potentially different series and measurement values in it for a given database. + +Therefore, we need to come up with a solution that gets cardinality readings for a database where: + + - We have billions of series values in the database; and + - Series and measurement values are distributed across multiple indices. + +The solution must not: + + - Take significant time to calculate the series/measurement cardinality. + - Have significant storage requirements. + - Be computationally intensive to run, since it will need to run often. + + +##### Exact Counting + +The number of distinct series or measurements could be counted exactly by scanning all series/measurements in all indices relating to a database. +This could be achieved using a merge-type approach, where sorted streams of series/measurements are scanned in parallel and a counter is updated as distinct values are discovered. +It would not require significant memory because only the largest value seen so far, and the count would need to be stored. + +However, this solution is completely unscalable because as soon as a series/measurement has been added or removed to the database, the entire calculation would need to be done again. +Further, to calculate the value all values need to be sent across a network, if indices are located on different hosts. + + +##### Probabilistic Cardinality Estimation + +Cardinality estimation involves determining the number distinct elements in a stream, which in practice may be either a single stream, or a combined stream from multiple sources. + +If it can be assumed that a stream of values are uniformly distributed (such as if the values have been hashed with a good hash function) then some assumptions can be made about the cardinality of the set of values by maintaining a simple summary ([sketch][3]) of the values seen so far. +This is the underpinnings of how probabilistic approaches to cardinality estimation work. + +The only information this approach needs is the current value in the stream and some summary data on previously seen values. Therefore these approaches: + + - Are usually parallelisable; + - Can be pre-computed and serialised if a complete stream is available, or updated on the fly for a continuous stream; + - Can handle multiple streams; + - Result in small storage footprints because they only summarised information about the values seen, rather than the values themselves. + +### The HyperLogLog++ algorithm + +The current industry standard for probabilistic cardinality estimation is the [HyperLogLog++ algorithm][1] (an extension to the original [HyperLogLog algorithm][2]). + +It has a relative accuracy (estimation error) of: + +``` +1.02 / sqrt(M) +``` + +where `M` is related to the size of the sketch used to store frequencies of certain properties of input values. +In practice, `M` is typically in the range `1536–24576`, and so the accuracy is typically in the range `2.6%–0.65%`. +A lower error rate demands more memory to accommodate a bigger sketch, but in practice the space requirements are trivial (using the above ranges would result in disk requirements of `~256B - ~4KB` for the sketch). + +To build a sketch, each value is hashed and the binary encoding of the hashed value is examined, to determine where in the sketch a counter should be incremented. +The computational requirements for building a sketch are linear to the size of the input since every value needs to be examined once, however, this process can be carried out in parallel because the operations necessary to build a sketch are commutative. + +Sketches only need to be built once for each index; they can then be persisted, and consulted when a cardinality estimation needs to be made. +Multiple sketches can be combined if necessary, to estimate cardinality of series/measurements across multiple indices. +Further, combining sketches is an associative operation—it doesn't matter which order they're combined it. +Different sized sketches can be combined by folding the larger one into the smaller one. +Combining sketches to generate estimations is also _lossless_: that is, the accuracy of the estimate is no different when combining multiple sketches rather than a single sketch built from all the data. + + +## Implementation + +This section gives a broad overview of how we might implement cardinality estimation. + +On system startup we would need to consult each TSI index and extract the sketches contained in each one, storing them in the associated Shard types in the database. +Since the sketches are small, this should be quick to do. +We would also need to scan all WAL files and merge WAL sketches into the in-memory sketches we pulled from the index. +The end result would be two in-memory sketches for each database: one for series data and one for measurement data. + +TSI sketches would only be updated during a full compaction, meaning they may over-estimate cardinalities if there are tombstoned series or measurements in the index. +We would need to know how many series and measurements have been tombstoned, but not yet removed from the index, so that we can adjust down our cardinality estimates. +We would do this by referring to a new tombstone count we could add to the TSI index. + +WAL files may also contain tombstoned series and measurements that have not been fast-compacted into the TSI file, so we would also scan them and count the tombstoned series/measurements, adding them to the counts we got from the associated TSI files for that database. + +##### Building an index sketch + +When a new index is created as the result of a full compaction, a new sketch will be generated for both the series and measurements within the index. +This will involve scanning and hashing with a 64-bit hash function, every series and measurement value within the index. +A suitable hash function for this purpose would be [MurmurHash3][4], which is optimised to generate 128-bit hash values on 64-bit architectures (we can use the first 64 bits for the series/measurement hash). +Each hashed value will then contribute to the HLL++ sketch for either the series or measurement sketch. + +Some prototyping and testing may be needed, but I would suggest a starting point for the precision of the HLL++ sketch to be `14`, which will require a sketch size of around `16KB`. + +Once the sketch has been generated it can be stored in the index and also in memory within the `Shard`. + +##### Getting cardinality estimates + +Cardinality estimates can be provided in constant time by consulting a series or measurement sketch. +If series or measurements are distributed across multiple indices (likely), then those sketches can be combined in linear time, with respect to the size of the sketch, which typically will be small (`~4096`). +In summary, cardinality estimations will be fast, and will not be affected by the number of series or measurements in a database. + +When calculating estimates, we will need to consider the number of tombstoned series/measurements for the database, and subtract them from the cardinality estimate. + +##### Adding new series and measurements + +Since TSI indices are immutable, we should not modify an index's sketch when a new series is added to the database. +Instead we will update the in-memory sketches, and for durability we will then also update sketches we will keep in the WAL files. + +##### Removing series and measurements + +The HyperLogLog family of algorithms unfortunately do not support removing values from the sketch, since the sketch itself is lossy and does not contain enough information to support deletions. +However, we can improve the accuracy of estimations by keeping track of series and measurements that are deleted from the database. + +In the case of series removal, a tombstone entry is added to the WAL. +When this occurs we will increment a series tombstone counter for that database. +As discussed above, this count will be subtracted from any cardinality estimates for the database. + +In the case of measurement removal, we will maintain a similar counter for removed measurements. +However, when a measurement is removed, all series for that measurement are removed, so we need to get a count of the number of series belonging to the measurement, in order to increase the removed series counter too. +We can do this by consulting the measurement hash index in the TSI +and looking up the `len(series)` value, which tells us how many series belong to the measurement. + +**Note** technically we would need to scan all the series for the measurement and check if they have been tombstoned or not in the index, or are marked as tombstoned in the WAL file. +The reason being, that any series tombstoned in the index or WAL will also have been accounted for and if we don't consider them we will double-count them, and reduce the accuracy of the cardinality estimate. +Possibly we could avoid this and have a less accurate estimation. +Thoughts would be welcome on the above... + + +##### Fast compactions + +During a fast compaction, WAL data is merged into the associated TSI index. +We can combine the WAL and TSI sketches during this process, persisting the updated sketches to the TSI file. +We would clear the in-memory tombstone counts, and reload them from the new tombstone counts in the TSI indexes. + +##### Full compactions + +During a full compaction, tombstoned series and measurements are removed from the TSI indices. +At this point a new sketches will need to be calculated for the index, which should replace any existing ones. +These new sketches would replace the current in-memory sketches. + +#### TSI file format + +We could extend the TSI file format to include a sketches block for series and measurements, which could be for example at the beginning of the index. +We would also need to add tombstone counters for series and measurements, which I have not added to this diagram, but would need to be considered in #7174. + +``` +╔═══════Inverted Index═══════╗ +║ ┌────────────────────────┐ ║ +║ │ │ ║ +║ │ Sketches │ ║ +║ │ │ ║ +║ └────────────────────────┘ ║ +║ ┌────────────────────────┐ ║ +║ │ │ ║ +║ │ Series Dictionary │ ║ +║ │ │ ║ +║ └────────────────────────┘ ║ +║ ┌────────Tag Set─────────┐ ║ +║ │┌──────────────────────┐│ ║ +║ ││ Tag Values Block ││ ║ +║ │├──────────────────────┤│ ║ +║ ││ Tag Values Block ││ ║ +║ │├──────────────────────┤│ ║ +║ ││ Tag Keys Block ││ ║ +║ │└──────────────────────┘│ ║ +║ └────────────────────────┘ ║ +║ ┌────────Tag Set─────────┐ ║ +║ │┌──────────────────────┐│ ║ +║ ││ Tag Values Block ││ ║ +║ │├──────────────────────┤│ ║ +║ ││ Tag Values Block ││ ║ +║ │├──────────────────────┤│ ║ +║ ││ Tag Values Block ││ ║ +║ │├──────────────────────┤│ ║ +║ ││ Tag Keys Block ││ ║ +║ │└──────────────────────┘│ ║ +║ └────────────────────────┘ ║ +║ ┌──────────────────────┐ ║ +║ │ Measurements Block │ ║ +║ └──────────────────────┘ ║ +╚════════════════════════════╝ +``` + +The sketch block itself would contain a value indicating the number of sketches in the block. +Each sketch block would then contain the size of the sketch data, and the sketch itself. +The sketch may be a binary blob, perhaps generated by Go's `gob` package, or it may be a custom binary format of data comprising the HLL algorithm state and sketch. +The exact format will be decided during implementation. + +``` +╔═══════════Sketches═══════════╗ +║ ┌──────────────────────────┐ ║ +║ │ Sketch Count │ ║ +║ └──────────────────────────┘ ║ +║ ┌────── Series Sketch ─────┐ ║ +║ │┌────────────────────────┐│ ║ +║ ││ len(Sketch) ││ ║ +║ │└────────────────────────┘│ ║ +║ │┌────────────────────────┐│ ║ +║ ││ Sketch ││ ║ +║ │└────────────────────────┘│ ║ +║ └──────────────────────────┘ ║ +║ ┌────Measurements Sketch───┐ ║ +║ │┌────────────────────────┐│ ║ +║ ││ len(Sketch) ││ ║ +║ │└────────────────────────┘│ ║ +║ │┌────────────────────────┐│ ║ +║ ││ Sketch ││ ║ +║ │└────────────────────────┘│ ║ +║ └──────────────────────────┘ ║ +╚══════════════════════════════╝ +``` + +### WAL file format + +Each WAL file would also have a Sketches block, which is the same format as the block in the TSI index. +These sketches however only contain cardinality estimations for whatever is in the WAL file. + +``` +╔═════════════WAL══════════════╗ +║ ┌──────────────────────────┐ ║ +║ │ │ ║ +║ │ Sketches │ ║ +║ │ │ ║ +║ └──────────────────────────┘ ║ +║ ┌──────────Entry───────────┐ ║ +║ │ ┌──────────────────────┐ │ ║ +║ │ │ Flag │ │ ║ +║ │ ├──────────────────────┤ │ ║ +║ │ │ len(Series) │ │ ║ +║ │ ├──────────────────────┤ │ ║ +║ │ │ Series │ │ ║ +║ │ ├──────────────────────┤ │ ║ +║ │ │ Checksum │ │ ║ +║ │ └──────────────────────┘ │ ║ +║ └──────────────────────────┘ ║ +║ ... ║ +╚══════════════════════════════╝ +``` + + + [1]: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf + [2]: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf + [3]: https://en.wikipedia.org/wiki/Streaming_algorithm + [4]: https://sites.google.com/site/murmurhash/ From 0a3e8797d81318797f494a3016918bd02e2bea77 Mon Sep 17 00:00:00 2001 From: Edd Robinson Date: Mon, 12 Sep 2016 14:30:28 +0100 Subject: [PATCH 2/3] Address comments in proposal --- docs/tsm/TSI_CARDINALITY_PROPOSAL.md | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/docs/tsm/TSI_CARDINALITY_PROPOSAL.md b/docs/tsm/TSI_CARDINALITY_PROPOSAL.md index b2b537f6564..00120b5d698 100644 --- a/docs/tsm/TSI_CARDINALITY_PROPOSAL.md +++ b/docs/tsm/TSI_CARDINALITY_PROPOSAL.md @@ -83,25 +83,26 @@ This section gives a broad overview of how we might implement cardinality estima On system startup we would need to consult each TSI index and extract the sketches contained in each one, storing them in the associated Shard types in the database. Since the sketches are small, this should be quick to do. -We would also need to scan all WAL files and merge WAL sketches into the in-memory sketches we pulled from the index. +We would also need to scan all WAL files and for each WAL file, update the associated index sketch with the series and measurement data. The end result would be two in-memory sketches for each database: one for series data and one for measurement data. TSI sketches would only be updated during a full compaction, meaning they may over-estimate cardinalities if there are tombstoned series or measurements in the index. We would need to know how many series and measurements have been tombstoned, but not yet removed from the index, so that we can adjust down our cardinality estimates. We would do this by referring to a new tombstone count we could add to the TSI index. -WAL files may also contain tombstoned series and measurements that have not been fast-compacted into the TSI file, so we would also scan them and count the tombstoned series/measurements, adding them to the counts we got from the associated TSI files for that database. +WAL files may also contain tombstoned series and measurements that have not been fast-compacted into the TSI file. +We would simply skip over them when scanning the WAL file on startup such that they wouldn't contribute to the cardinality estimate. ##### Building an index sketch When a new index is created as the result of a full compaction, a new sketch will be generated for both the series and measurements within the index. This will involve scanning and hashing with a 64-bit hash function, every series and measurement value within the index. A suitable hash function for this purpose would be [MurmurHash3][4], which is optimised to generate 128-bit hash values on 64-bit architectures (we can use the first 64 bits for the series/measurement hash). -Each hashed value will then contribute to the HLL++ sketch for either the series or measurement sketch. +Each hashed value will then contribute to the HLL++ sketch for either series or measurements. Some prototyping and testing may be needed, but I would suggest a starting point for the precision of the HLL++ sketch to be `14`, which will require a sketch size of around `16KB`. -Once the sketch has been generated it can be stored in the index and also in memory within the `Shard`. +Once the sketch has been generated, it can be stored in the index and also in memory within the `Shard`. ##### Getting cardinality estimates @@ -114,7 +115,7 @@ When calculating estimates, we will need to consider the number of tombstoned se ##### Adding new series and measurements Since TSI indices are immutable, we should not modify an index's sketch when a new series is added to the database. -Instead we will update the in-memory sketches, and for durability we will then also update sketches we will keep in the WAL files. +Instead, we will update the in-memory sketches with the series and measurement data we perist to the associated WAL file. ##### Removing series and measurements @@ -131,8 +132,11 @@ We can do this by consulting the measurement hash index in the TSI and looking up the `len(series)` value, which tells us how many series belong to the measurement. **Note** technically we would need to scan all the series for the measurement and check if they have been tombstoned or not in the index, or are marked as tombstoned in the WAL file. -The reason being, that any series tombstoned in the index or WAL will also have been accounted for and if we don't consider them we will double-count them, and reduce the accuracy of the cardinality estimate. +The reason being that any series tombstoned in the index or WAL will also have been accounted for in the sketch, and if we don't consider them we will double-count them, and reduce the accuracy of the cardinality estimate. Possibly we could avoid this and have a less accurate estimation. + +**Note** if there are multiple TSI files for a database, then there will be multiple `len(series)` values too, and some of those series may be duplicated across TSI files, meaning that summing all of the `len(series)` values will be inaccurate. + Thoughts would be welcome on the above... @@ -157,12 +161,12 @@ We would also need to add tombstone counters for series and measurements, which ╔═══════Inverted Index═══════╗ ║ ┌────────────────────────┐ ║ ║ │ │ ║ -║ │ Sketches │ ║ +║ │ Series Dictionary │ ║ ║ │ │ ║ ║ └────────────────────────┘ ║ ║ ┌────────────────────────┐ ║ ║ │ │ ║ -║ │ Series Dictionary │ ║ +║ │ Sketches │ ║ ║ │ │ ║ ║ └────────────────────────┘ ║ ║ ┌────────Tag Set─────────┐ ║ @@ -198,9 +202,7 @@ The exact format will be decided during implementation. ``` ╔═══════════Sketches═══════════╗ -║ ┌──────────────────────────┐ ║ -║ │ Sketch Count │ ║ -║ └──────────────────────────┘ ║ +║ ║ ║ ┌────── Series Sketch ─────┐ ║ ║ │┌────────────────────────┐│ ║ ║ ││ len(Sketch) ││ ║ From 13a11e863ba3e85b13b86e50780a673abd3ad253 Mon Sep 17 00:00:00 2001 From: Edd Robinson Date: Tue, 13 Sep 2016 15:37:15 +0100 Subject: [PATCH 3/3] Update how tombstone sketches work --- docs/tsm/TSI_CARDINALITY_PROPOSAL.md | 144 ++++++++++++--------------- 1 file changed, 65 insertions(+), 79 deletions(-) diff --git a/docs/tsm/TSI_CARDINALITY_PROPOSAL.md b/docs/tsm/TSI_CARDINALITY_PROPOSAL.md index 00120b5d698..bda4b93e587 100644 --- a/docs/tsm/TSI_CARDINALITY_PROPOSAL.md +++ b/docs/tsm/TSI_CARDINALITY_PROPOSAL.md @@ -83,79 +83,75 @@ This section gives a broad overview of how we might implement cardinality estima On system startup we would need to consult each TSI index and extract the sketches contained in each one, storing them in the associated Shard types in the database. Since the sketches are small, this should be quick to do. -We would also need to scan all WAL files and for each WAL file, update the associated index sketch with the series and measurement data. -The end result would be two in-memory sketches for each database: one for series data and one for measurement data. +Each TSI file contains the following sketches: -TSI sketches would only be updated during a full compaction, meaning they may over-estimate cardinalities if there are tombstoned series or measurements in the index. -We would need to know how many series and measurements have been tombstoned, but not yet removed from the index, so that we can adjust down our cardinality estimates. -We would do this by referring to a new tombstone count we could add to the TSI index. + - Series sketch + - Measurement sketch + - Tombstoned series sketch + - Tombstoned measurements sketch -WAL files may also contain tombstoned series and measurements that have not been fast-compacted into the TSI file. -We would simply skip over them when scanning the WAL file on startup such that they wouldn't contribute to the cardinality estimate. +We would also need to scan all WAL files and, for each WAL file, update the sketches associated with the index. +Any series or measurements in the WAL files would be added to the series and measurement sketches, and any tombstoned series or measurement entries would be added to the tombstoned sketches. +The end result would be four in-memory sketches for each shard. +Persistent sketches in the TSI files would only be updated during compactions. ##### Building an index sketch -When a new index is created as the result of a full compaction, a new sketch will be generated for both the series and measurements within the index. +When a new TSI file is created as the result of a full compaction, a new sketches will be generated for both the series and measurements within the index. This will involve scanning and hashing with a 64-bit hash function, every series and measurement value within the index. A suitable hash function for this purpose would be [MurmurHash3][4], which is optimised to generate 128-bit hash values on 64-bit architectures (we can use the first 64 bits for the series/measurement hash). -Each hashed value will then contribute to the HLL++ sketch for either series or measurements. +Each hashed value will then contribute to the appropriate sketch. +Finally, the tombstone sketches will be reinitialised as empty sketches, since there will no longer be any tombstone entries in the index. Some prototyping and testing may be needed, but I would suggest a starting point for the precision of the HLL++ sketch to be `14`, which will require a sketch size of around `16KB`. -Once the sketch has been generated, it can be stored in the index and also in memory within the `Shard`. ##### Getting cardinality estimates -Cardinality estimates can be provided in constant time by consulting a series or measurement sketch. +Cardinality estimates can be provided in constant time by consulting the in-memory sketches. If series or measurements are distributed across multiple indices (likely), then those sketches can be combined in linear time, with respect to the size of the sketch, which typically will be small (`~4096`). In summary, cardinality estimations will be fast, and will not be affected by the number of series or measurements in a database. -When calculating estimates, we will need to consider the number of tombstoned series/measurements for the database, and subtract them from the cardinality estimate. +Calculating cardinality estimates will be a three step process, for example in the case of series: + + - 1. Combine all series sketches and generate a `series_estimate`; + - 2. Combine all tombstone series sketches and generate a `tombstone_series_estimate`; + - 3. Return `series_estimate - tombstone_series_estimate`. + +If appropriate, we could maintain in memory the combined result sketches to avoid combining them across shards for each cardinality estimate. ##### Adding new series and measurements Since TSI indices are immutable, we should not modify an index's sketch when a new series is added to the database. -Instead, we will update the in-memory sketches with the series and measurement data we perist to the associated WAL file. +Instead, we will update in the associated shards, the in-memory sketches with the series and measurement data that gets persisted to the associated WAL file. ##### Removing series and measurements The HyperLogLog family of algorithms unfortunately do not support removing values from the sketch, since the sketch itself is lossy and does not contain enough information to support deletions. -However, we can improve the accuracy of estimations by keeping track of series and measurements that are deleted from the database. +However, we will improve the accuracy of estimations by maintaining estimates of the number of series and measurements deleted from a database. In the case of series removal, a tombstone entry is added to the WAL. -When this occurs we will increment a series tombstone counter for that database. -As discussed above, this count will be subtracted from any cardinality estimates for the database. - -In the case of measurement removal, we will maintain a similar counter for removed measurements. -However, when a measurement is removed, all series for that measurement are removed, so we need to get a count of the number of series belonging to the measurement, in order to increase the removed series counter too. -We can do this by consulting the measurement hash index in the TSI -and looking up the `len(series)` value, which tells us how many series belong to the measurement. - -**Note** technically we would need to scan all the series for the measurement and check if they have been tombstoned or not in the index, or are marked as tombstoned in the WAL file. -The reason being that any series tombstoned in the index or WAL will also have been accounted for in the sketch, and if we don't consider them we will double-count them, and reduce the accuracy of the cardinality estimate. -Possibly we could avoid this and have a less accurate estimation. - -**Note** if there are multiple TSI files for a database, then there will be multiple `len(series)` values too, and some of those series may be duplicated across TSI files, meaning that summing all of the `len(series)` values will be inaccurate. - -Thoughts would be welcome on the above... +When this occurs we will add the tombstoned series to the tombstone series sketch for that database. +In the case of measurement removal, we will maintain a similar tombstone sketch. +Further, because removal of a measurement involves the removal of all series under that measurement, we will also need to add all those series to the tombstone series sketches. ##### Fast compactions During a fast compaction, WAL data is merged into the associated TSI index. -We can combine the WAL and TSI sketches during this process, persisting the updated sketches to the TSI file. -We would clear the in-memory tombstone counts, and reload them from the new tombstone counts in the TSI indexes. +We can persist the in-memory series and measurement sketches to the TSI file since they will already represent the combined cardinality estimate for the TSI file once WAL data is merged in. +We will also persist the in-memory tombstone sketches to the TSI file because once the compaction completes we will be removing the associated WAL files. + ##### Full compactions During a full compaction, tombstoned series and measurements are removed from the TSI indices. -At this point a new sketches will need to be calculated for the index, which should replace any existing ones. +At this point all four sketches will need to be regenerated, though the tombstone sketches only need to be re-initialised. These new sketches would replace the current in-memory sketches. #### TSI file format -We could extend the TSI file format to include a sketches block for series and measurements, which could be for example at the beginning of the index. -We would also need to add tombstone counters for series and measurements, which I have not added to this diagram, but would need to be considered in #7174. +We could extend the TSI file format to include a sketches block for series, measurement and tombstone data, which could be located after the Series Dictionary. ``` ╔═══════Inverted Index═══════╗ @@ -201,53 +197,43 @@ The sketch may be a binary blob, perhaps generated by Go's `gob` package, or it The exact format will be decided during implementation. ``` -╔═══════════Sketches═══════════╗ -║ ║ -║ ┌────── Series Sketch ─────┐ ║ -║ │┌────────────────────────┐│ ║ -║ ││ len(Sketch) ││ ║ -║ │└────────────────────────┘│ ║ -║ │┌────────────────────────┐│ ║ -║ ││ Sketch ││ ║ -║ │└────────────────────────┘│ ║ -║ └──────────────────────────┘ ║ -║ ┌────Measurements Sketch───┐ ║ -║ │┌────────────────────────┐│ ║ -║ ││ len(Sketch) ││ ║ -║ │└────────────────────────┘│ ║ -║ │┌────────────────────────┐│ ║ -║ ││ Sketch ││ ║ -║ │└────────────────────────┘│ ║ -║ └──────────────────────────┘ ║ -╚══════════════════════════════╝ +╔══════════════Sketches═══════════════╗ +║ ║ +║┌───────── Series Sketch ───────────┐║ +║│┌─────────────────────────────────┐│║ +║││ len(Sketch) ││║ +║│└─────────────────────────────────┘│║ +║│┌─────────────────────────────────┐│║ +║││ Sketch ││║ +║│└─────────────────────────────────┘│║ +║└───────────────────────────────────┘║ +║┌──────Tombstoned Series Sketch ────┐║ +║│┌─────────────────────────────────┐│║ +║││ len(Sketch) ││║ +║│└─────────────────────────────────┘│║ +║│┌─────────────────────────────────┐│║ +║││ Sketch ││║ +║│└─────────────────────────────────┘│║ +║└───────────────────────────────────┘║ +║┌─────── Measurements Sketch ───────┐║ +║│┌─────────────────────────────────┐│║ +║││ len(Sketch) ││║ +║│└─────────────────────────────────┘│║ +║│┌─────────────────────────────────┐│║ +║││ Sketch ││║ +║│└─────────────────────────────────┘│║ +║└───────────────────────────────────┘║ +║┌──Tombstoned Measurements Sketch ──┐║ +║│┌─────────────────────────────────┐│║ +║││ len(Sketch) ││║ +║│└─────────────────────────────────┘│║ +║│┌─────────────────────────────────┐│║ +║││ Sketch ││║ +║│└─────────────────────────────────┘│║ +║└───────────────────────────────────┘║ +╚═════════════════════════════════════╝ ``` -### WAL file format - -Each WAL file would also have a Sketches block, which is the same format as the block in the TSI index. -These sketches however only contain cardinality estimations for whatever is in the WAL file. - -``` -╔═════════════WAL══════════════╗ -║ ┌──────────────────────────┐ ║ -║ │ │ ║ -║ │ Sketches │ ║ -║ │ │ ║ -║ └──────────────────────────┘ ║ -║ ┌──────────Entry───────────┐ ║ -║ │ ┌──────────────────────┐ │ ║ -║ │ │ Flag │ │ ║ -║ │ ├──────────────────────┤ │ ║ -║ │ │ len(Series) │ │ ║ -║ │ ├──────────────────────┤ │ ║ -║ │ │ Series │ │ ║ -║ │ ├──────────────────────┤ │ ║ -║ │ │ Checksum │ │ ║ -║ │ └──────────────────────┘ │ ║ -║ └──────────────────────────┘ ║ -║ ... ║ -╚══════════════════════════════╝ -``` [1]: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf