Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support recycling log files #224

Merged
merged 54 commits into from
Jul 20, 2022
Merged

Conversation

LykxSassinator
Copy link
Contributor

@LykxSassinator LykxSassinator commented Jun 15, 2022

As the title shows, this commit makes the relevant implementation for supporting [Recycle Log Files], for closing #214. This commit mainly contains:

  • Design and implement [Recycle Log Files] for recycling expired log files.
  • Supplement necessary UTs on [Recycle Log Files].

As the title shows, this commit do the following things:
>* Implement `[Recycle Log Files]` structure for recycling expired log files.
>* Supplement necessary UTs on `[Recycle Log Files]`.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Here, we modify the testing cases in `bench_falloc` to simulate several different calling of `fallocate`,
to generate the comparision result of performance among these callings.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
…alloc.rs`

As the title shows, the bug of condition judgement is fixed.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
This commit mainly focused on:
1. Supplement extra necessary UTs for `RecycleFileCollection` in `pipe.rs`;
2. Tidy the codes on `RecycleFileCollection`;
3. Correct the misleading guides in `README.md`;

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator
Copy link
Contributor Author

/cc @tabokie , thanks for reviewing it.

@codecov
Copy link

codecov bot commented Jun 15, 2022

Codecov Report

Merging #224 (9d98e7b) into master (764ac71) will increase coverage by 0.35%.
The diff coverage is 99.27%.

@@            Coverage Diff             @@
##           master     #224      +/-   ##
==========================================
+ Coverage   96.71%   97.06%   +0.35%     
==========================================
  Files          30       30              
  Lines        9577    10447     +870     
==========================================
+ Hits         9262    10140     +878     
+ Misses        315      307       -8     
Impacted Files Coverage Δ
src/lib.rs 100.00% <ø> (ø)
src/filter.rs 82.63% <80.00%> (-0.17%) ⬇️
src/log_batch.rs 97.65% <98.49%> (+1.04%) ⬆️
src/file_pipe_log/pipe.rs 99.09% <98.82%> (-0.06%) ⬇️
src/config.rs 97.50% <100.00%> (+3.01%) ⬆️
src/engine.rs 96.89% <100.00%> (+0.59%) ⬆️
src/env/default.rs 92.93% <100.00%> (+0.66%) ⬆️
src/env/mod.rs 100.00% <100.00%> (ø)
src/env/obfuscated.rs 95.83% <100.00%> (+0.44%) ⬆️
src/file_pipe_log/format.rs 95.45% <100.00%> (+0.14%) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 764ac71...9d98e7b. Read the comment docs.

@tabokie tabokie self-requested a review June 15, 2022 03:35
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
…leCollection`

This commit contains:
>* Simplify the configuration on the capcaity of `RecycleFileCollection`, which would be
automatically updated by the formula -- `purge_threshold` / `targe_file_size` - `active file count`.
>* Tidy the codes and annotations in `pipe.rs`.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
@tabokie tabokie changed the title [Enhancement]Support recycling log files from expired logs. Support recycling log files Jun 16, 2022
…leCollection`

This commit contains:
>* Simplify the configuration on the capcaity of `RecycleFileCollection`, which would be
automatically updated by the formula -- `purge_threshold` / `targe_file_size` - `active file count`.
>* Tidy the codes and annotations in `pipe.rs`.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
As the title shown, this commit is mainly responsible for simplifying
the implementation of `FileCollection`, capable for recycling obsolete
or stale log files for the future reusing.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
…..&LogFileContext)`

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator
Copy link
Contributor Author

/cc @tabokie , please take comments on it for reviewing. Thanks.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator
Copy link
Contributor Author

LykxSassinator commented Jun 22, 2022

1. Preparations for recycling

For making comparisons on Write performance between master and #224, which introducing the recycling log files strategy, I've made a minor modification in stress bench tool.
Considering that #224 needs purged, that is, obsolete files, to generate significant impact on the Write performance, I've introduce extra purge processing before doing Write bench with reusing reuse-data flag.
That is, if the reuse-data was true, the Engine should do purge and mark several files expired for recycling before Write bench.
And the code could be reviewed as:

diff --git a/stress/src/main.rs b/stress/src/main.rs
index 78f3f76..4712ab2 100644
--- a/stress/src/main.rs
+++ b/stress/src/main.rs
@@ -604,6 +604,20 @@ fn main() {
     let wb = Arc::new(WrittenBytesHook::new());
     let engine = Arc::new(Engine::open_with_listeners(config, vec![wb.clone()]).unwrap());
+    if opts.reuse_data {
+        println!("======= Before `bench`, try to purge files for recycling stale files as many as possible. =======");
+        match engine.purge_expired_files() {
+            Ok(regions) => {
+                for region in regions.into_iter() {
+                    let first = engine.first_index(region).unwrap_or(0);
+                    let last = engine.last_index(region).unwrap_or(0);
+                    let compact_to =
+                        last - ((last - first + 1) as f64 * args.force_compact_factor) as u64 + 1;
+                    engine.compact_to(region, compact_to);
+                }
+            }
+            Err(e) => println!("purge error {:?}", e),
+        }
+        println!("======= Before `bench`, all expired files are purged. =======");
+    }

Here, as u can see, --reuse-data is set for preparing recycled files as many as possible for accelerating the later bench testing, and that is why Recycle Log File can make sense.
Then, I did the comparison between master and #224.

2. Conclusions based on benchmark results

With the comparison, #224 has brought an significant and positive impact on Write, especially when there exists enough free / stale files for recycling. In a brief conclusion, the performance comparison between master and #224 shows that:

    1. Recycle log files will bring a significant improvement on QPS, about 5% to 10% in normal cases.
    1. Recycle log files will bring up to almost 70% improvement on QPS, when it was set with write-thread == 4 and opened with --reuse-data.

3. Reference

There are several assistant materials for proving that this feature makes sense.

3.1 Code paths that this feature makes sense

How can we prove this feature truely make sense with the above modification on --reuse-data ? Here, I provide a simple and effective way to get the evidance -- logging the kernel code path by the standard stdout. For instance, you can take a log in the processing of Pipe::purge_to to prove that this feature truely recycled several files for reusing in the later Write bench.
In my local branch, I've got the following outputs with println! on the kernel path:

======= Before `bench`, try to purge files for recycling stale files as many as possible. =======
Rewrite LogQueue::Append for purging expired files.
compact queue: LogQueue::Append with rewrite_watermark: 253, compact_watermark: 206.
[PurgeTo]Do purge_to on queue: 0, expect_seq: 281, min_seq: 267.
[After purging]First purged file_seq: 187, truely deleted file count: 0, recycled file count: 80, remained file count on disk: 95
======= Before `bench`, all expired files are purged. =======

3.2 Benchmarks

3.2.1 Env and settings

Brief introduction of the testing env and related settings:

  • OS: CentOS - Linux version 3.10.0-862.14.4.el7.x86_64
  • Disk: Intel SSD P4610 2000+MB/s
  • Settings for stress tool:
    • purge_threshold: use the default - 10GB.
    • purge_interval: use the default - 10s.
  • Settings for RaftEngine:
    • log_batch_size: 4KB one LogBatch;
    • file_size: 128GB with default;
    • size_per_sync: 4MB.

And with default settings in stress tool, we can get a 14GB size full of log files on the disk, which reaches the threshold for triggering purge.

3.2.2 Results

The benchmark result could be reviewed with the following result:

  • Benchmarks on master with --reuse-data and without --reuse-data respectively shows that the reuse-data has no side effects on Write.
  • Benchmarks on #224 without --reuse-data shows that #224 would not introduce side effect on Write.
  • Benchmarks on #224 with --reuse-data and without --reuse-data respectively shows that the recycling log files strategy truely introduces positive effects on Write.

And visualized comparison results can be reviewed with the following charts:

  • Using normal mode, that is, without preparing expired files for recycling.
    image
  • Using --reuse-data to mock "There already exists several expired files". And in my settings, the count of expired files could reach 80, where there are 95 log files existing on the disk.
    image

Here, more detailed benchmark results on --reuse-data can be reviewed as the followings:

# Command
# cargo run --release --package stress -- --path benchmark --write-threads ${thread_count} --regions ${regions} --reuse-data

* [Branch] master
- Thread: 1
[write]
Throughput(QPS) = 7440.78
Latency(μs) min = 69, avg = 116.52, p50 = 109, p90 = 148, p95 = 163, p99 = 200, p99.9 = 925, max = 9159
Fairness = 100.0%
Write Bandwidth = 77.9MiB/s
- Thread: 2
[write]
Throughput(QPS) = 10166.82
Latency(μs) min = 74, avg = 176.53, p50 = 166, p90 = 213, p95 = 231, p99 = 293, p99.9 = 1299, max = 179711
Fairness = 99.7%
Write Bandwidth = 106.5MiB/s
- Thread: 4
[write]
Throughput(QPS) = 17164.32
Latency(μs) min = 84, avg = 213.93, p50 = 200, p90 = 250, p95 = 270, p99 = 354, p99.9 = 1763, max = 158207
Fairness = 99.9%
Write Bandwidth = 179.8MiB/s
- Thread: 8
[write]
Throughput(QPS) = 25636.96
Latency(μs) min = 75, avg = 292.34, p50 = 267, p90 = 349, p95 = 386, p99 = 619, p99.9 = 2375, max = 130431
Fairness = 99.9%
Write Bandwidth = 268.5MiB/s

*[Branch] recycle_logs
- Thread: 1
[write]
Throughput(QPS) = 7435.96
Latency(μs) min = 69, avg = 116.26, p50 = 109, p90 = 148, p95 = 163, p99 = 210, p99.9 = 890, max = 9927
Fairness = 100.0%
Write Bandwidth = 77.9MiB/s
- Thread: 2
[write]
Throughput(QPS) = 11903.69
Latency(μs) min = 38, avg = 148.60, p50 = 153, p90 = 203, p95 = 222, p99 = 295, p99.9 = 1090, max = 141311
Fairness = 99.9%
Write Bandwidth = 124.7MiB/s
- Thread: 4
[write]
Throughput(QPS) = 28634.28
Latency(μs) min = 42, avg = 121.28, p50 = 101, p90 = 191, p95 = 214, p99 = 266, p99.9 = 939, max = 103743
Fairness = 99.8%
Write Bandwidth = 299.9MiB/s
- Thread: 8
[write]
Throughput(QPS) = 39408.23
Latency(μs) min = 38, avg = 184.22, p50 = 149, p90 = 277, p95 = 310, p99 = 393, p99.9 = 1992, max = 244991
Fairness = 99.5%
Write Bandwidth = 412.8MiB/s

In this commit, there changes have been introduced:
>* Remove unnecessary annotations.
>* Remove inappropriate method `rename` of trait `FileSystem`.
>* Simplify the design of FileCollection with capability of recycling log files.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator
Copy link
Contributor Author

/cc @tabokie please take a final review, thanks

As the title shows, this commit make the strategy of computing `purge` more readable. And,
it also includes:
* Supplement extra necessary annotations.
* Refine the implementation of `get_signature` to make its strategy clear and reable.
* Remove unncessary annotations.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Copy link
Contributor Author

@LykxSassinator LykxSassinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ut on fetch_active_file has been supplemented in this commit.
/cc @tabokie thanks.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
…y stale V1 files.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
…to`.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found a few bugs along the way, it really worries my whether you have added sufficient tests for them. (E.g. the bug in purge_to is still not covered). Moving forward, if there's a review comment that points out a specific issue that isn't covered, please first add a test to reproduce it, then fix the issue.

…w funcs.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
@LykxSassinator
Copy link
Contributor Author

Thanks for your suggestions. Bugs in rename and purge_to have been checked and fixed. Also, relevant uts have been supplemented in this commit.

Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tabokie tabokie merged commit ee024c7 into tikv:master Jul 20, 2022
@LykxSassinator LykxSassinator mentioned this pull request Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants