Add fuzzer for async data cache #10244

zacw7 · 2024-06-18T06:38:13Z

Introduce a basic fuzzer for the async data cache. Each iteration involves:

Creating a set of data files of varying sizes.
Setting up the async data cache with an SSD using a specified configuration.
Performing parallel random reads from these data files.

In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests.

netlify · 2024-06-18T06:38:28Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`edb7c76`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/667cc5d84ecd530008b13dbd

facebook-github-bot · 2024-06-18T06:53:44Z

@zacw7 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-06-18T07:15:09Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

facebook-github-bot · 2024-06-18T18:53:11Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

facebook-github-bot · 2024-06-18T22:59:48Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

kewang1024

Can you add doc for running asyc data cache fuzzer, an example to follow:

velox/velox/docs/develop/testing/join-fuzzer.rst

Line 4 in c6d7390

kewang1024 · 2024-06-18T23:25:09Z

velox/exec/fuzzer/AsyncDataCacheFuzzer.cpp

+    return executor_.get();
+  }
+
+  void initializeDataFiles();


Can you add comments for those functions to explain what is done in the different initializations? Same for the rest of init functions.

I would prefer the function name to be as succinct as possible, so how about initDataFiles

This seems to be the naming convention of all cache related testing. Examples:

velox/velox/common/caching/tests/SsdFileTest.cpp

Line 61 in 11bdeb8

void initializeCache(

velox/velox/common/caching/tests/SsdFileTest.cpp

Line 72 in 11bdeb8

initializeSsdFile(

velox/velox/common/caching/tests/AsyncDataCacheTest.cpp

Line 67 in 11bdeb8

static void initializeContents(int64_t sequence, memory::Allocation& alloc) {

I would say let's keep it consistent here.

kewang1024 · 2024-06-19T00:13:44Z

velox/exec/fuzzer/AsyncDataCacheFuzzer.cpp

+DEFINE_int32(
+    max_num_reads,
+    100,
+    "Max number of reads to be performed per thread.");
+
+DEFINE_int32(num_threads, 16, "Number of threads to read.");
+
+DEFINE_int32(num_files, 8, "Number of data files to be created.");
+
+DEFINE_uint64(
+    offset_interval_bytes,
+    8 << 20,
+    "The offset bytes to be aligned at for cache reads.");
+
+DEFINE_uint64(
+    min_file_bytes,
+    32 << 20,
+    "Minimum file size in bytes of the data files to be created.");
+
+DEFINE_uint64(
+    max_file_bytes,
+    64 << 20,
+    "Maximum file size in bytes of the data files to be created.");
+
+DEFINE_int32(num_files_in_group, 3, "Number of files to be grouped together.");
+
+DEFINE_int64(memory_cache_bytes, 16 << 20, "Memory cache size in bytes.");
+
+DEFINE_uint64(ssd_cache_bytes, 128 << 20, "Ssd cache size in bytes.");
+
+DEFINE_int32(num_shards, 4, "Number of shards of SSD cache.");
+
+DEFINE_uint64(
+    ssd_checkpoint_interval_bytes,
+    64 << 20,


I don't think we should expose those as parameter of fuzzer. They should be able to be randomized, for now if we want to keep it simple, we can make them constant

Thanks for pointing that out. I've discussed with @xiaoxmeng and we'll decide later which parameters should be 1) randomized by fuzzer; 2) kept as configurable parameters; 3) fixed as constants.

So I don't have a strong preference on how we should define them for now in this initial PR. Let's see if @xiaoxmeng has some different opinion.

We can randomize this later if it is zero.

facebook-github-bot · 2024-06-19T03:50:41Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

xiaoxmeng

@zacw7 thanks for adding the cache fuzzer % comments.

velox/docs/develop/testing/async-data-cache-fuzzer.rst

velox/exec/fuzzer/AsyncDataCacheFuzzer.h

velox/exec/fuzzer/AsyncDataCacheFuzzer.cpp

xiaoxmeng · 2024-06-24T05:53:27Z

velox/exec/fuzzer/AsyncDataCacheFuzzer.cpp

+  cache_ = AsyncDataCache::create(allocator_, std::move(ssdCache), {});
+}
+
+void AsyncDataCacheFuzzer::initializeInputs() {


How about each read thread

Loop: 1. pickup a file; 2. create a cache buffer input 3. enqueue a randomly selected read offsets 4. call load on the cache buffer input 5. randomly to read from a subset or all the enqueued streams in step3? 6. for each selected enqueue stream, read from start to the end and verify the read bytes? Thanks!

velox/exec/fuzzer/AsyncDataCacheFuzzer.cpp

facebook-github-bot · 2024-06-25T00:18:06Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

facebook-github-bot · 2024-06-25T00:32:07Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

xiaoxmeng

@zacw7 thanks for the update % minors!

velox/exec/fuzzer/CacheFuzzer.cpp

xiaoxmeng · 2024-06-25T05:15:56Z

velox/exec/fuzzer/CacheFuzzer.cpp

+void CacheFuzzer::initializeCache() {
+  // We have up to 20 threads and 16 threads are used for reading so
+  // there are some threads left over for SSD background write.
+  executor_ = std::make_unique<folly::IOThreadPoolExecutor>(20);


I think we shall separate them?

readerExecutor_ -> cpu executor: 64 prefetchExecutor_ -> io executor which passed to buffered input: 4 ssdExecutor_ -> io executor which passed to SSD cache for SSD staging write: 4?

velox/exec/fuzzer/CacheFuzzer.cpp

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

facebook-github-bot · 2024-06-25T18:12:35Z

This pull request was exported from Phabricator. Differential Revision: D58715904

facebook-github-bot · 2024-06-25T18:45:31Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

facebook-github-bot · 2024-06-25T20:08:47Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

xiaoxmeng

@zacw7 few more comments.

velox/exec/fuzzer/CacheFuzzer.cpp

xiaoxmeng · 2024-06-25T20:18:26Z

velox/exec/fuzzer/CacheFuzzer.cpp

+  for (auto i = 0; i < FLAGS_num_source_files; ++i) {
+    // Initialize buffered input.
+    auto readFile = fs_->openFileForRead(fileNames_[i]);
+    groupIds_.emplace_back(


Do we need the file group support? Thanks!

Probably not. Let me remove it.

velox/exec/fuzzer/CacheFuzzer.cpp

facebook-github-bot · 2024-06-26T05:32:56Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Differential Revision: D58715904 Pulled By: zacw7

xiaoxmeng

@zacw7 LGTM. Thanks % minors!

velox/exec/fuzzer/CacheFuzzer.cpp

facebook-github-bot · 2024-06-26T21:33:08Z

This pull request was exported from Phabricator. Differential Revision: D58715904

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Reviewed By: xiaoxmeng Differential Revision: D58715904 Pulled By: zacw7

xiaoxmeng

@zacw7 thanks for the update!

Summary: Introduce a basic fuzzer for the async data cache. Each iteration involves: 1. Creating a set of data files of varying sizes. 2. Setting up the async data cache with an SSD using a specified configuration. 3. Performing parallel random reads from these data files. In the initial setup, most of the parameters are defined as gflags and we'll decide later which parameters should be fuzzed during the tests. Pull Request resolved: facebookincubator#10244 Reviewed By: xiaoxmeng Differential Revision: D58715904 Pulled By: zacw7

facebook-github-bot · 2024-06-27T01:52:15Z

This pull request was exported from Phabricator. Differential Revision: D58715904

facebook-github-bot · 2024-06-27T05:48:31Z

@zacw7 merged this pull request in d26cb1d.

conbench-facebook · 2024-06-27T06:12:34Z

Conbench analyzed the 1 benchmark run on commit d26cb1df.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

zacw7 requested a review from kewang1024 June 18, 2024 06:38

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2024

zacw7 requested review from amitkdutta and xiaoxmeng June 18, 2024 06:38

zacw7 marked this pull request as ready for review June 18, 2024 06:38

zacw7 force-pushed the cache-fuzzer branch from 65922f4 to 6d762ea Compare June 18, 2024 06:53

facebook-github-bot added the fb-exported label Jun 18, 2024

zacw7 force-pushed the cache-fuzzer branch from 6d762ea to 550594a Compare June 18, 2024 07:15

zacw7 force-pushed the cache-fuzzer branch from 550594a to 427581c Compare June 18, 2024 18:53

zacw7 force-pushed the cache-fuzzer branch from 427581c to 4552292 Compare June 18, 2024 22:59

kewang1024 reviewed Jun 19, 2024

View reviewed changes

zacw7 force-pushed the cache-fuzzer branch from 4552292 to 7aeb378 Compare June 19, 2024 03:50

zacw7 requested a review from kewang1024 June 20, 2024 06:19

xiaoxmeng reviewed Jun 24, 2024

View reviewed changes

zacw7 force-pushed the cache-fuzzer branch from 7aeb378 to d6f9af4 Compare June 25, 2024 00:18

zacw7 requested a review from xiaoxmeng June 25, 2024 00:28

zacw7 force-pushed the cache-fuzzer branch from 4df2d78 to 8252d80 Compare June 25, 2024 00:41

xiaoxmeng reviewed Jun 25, 2024

View reviewed changes

zacw7 force-pushed the cache-fuzzer branch from 8252d80 to 9f94e28 Compare June 25, 2024 18:12

zacw7 force-pushed the cache-fuzzer branch from 9f94e28 to ff611f4 Compare June 25, 2024 18:45

zacw7 force-pushed the cache-fuzzer branch from ff611f4 to c15c492 Compare June 25, 2024 20:08

xiaoxmeng reviewed Jun 25, 2024

View reviewed changes

zacw7 force-pushed the cache-fuzzer branch from c15c492 to 82e0ef5 Compare June 26, 2024 05:32

xiaoxmeng approved these changes Jun 26, 2024

View reviewed changes

velox/exec/fuzzer/CacheFuzzer.cpp Outdated Show resolved Hide resolved

velox/exec/fuzzer/CacheFuzzer.cpp Outdated Show resolved Hide resolved

xiaoxmeng reviewed Jun 26, 2024

View reviewed changes

velox/exec/fuzzer/CacheFuzzer.cpp Show resolved Hide resolved

zacw7 force-pushed the cache-fuzzer branch from 82e0ef5 to 08dbab6 Compare June 26, 2024 21:33

xiaoxmeng approved these changes Jun 26, 2024

View reviewed changes

zacw7 force-pushed the cache-fuzzer branch from 08dbab6 to edb7c76 Compare June 27, 2024 01:52

facebook-github-bot closed this in d26cb1d Jun 27, 2024

facebook-github-bot added the Merged label Jun 27, 2024

zacw7 deleted the cache-fuzzer branch June 27, 2024 06:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fuzzer for async data cache #10244

Add fuzzer for async data cache #10244

zacw7 commented Jun 18, 2024 •

edited

Loading

netlify bot commented Jun 18, 2024 •

edited

Loading

facebook-github-bot commented Jun 18, 2024

facebook-github-bot commented Jun 18, 2024

facebook-github-bot commented Jun 18, 2024

facebook-github-bot commented Jun 18, 2024

kewang1024 left a comment

kewang1024 Jun 18, 2024 •

edited

Loading

zacw7 Jun 19, 2024

kewang1024 Jun 19, 2024

zacw7 Jun 19, 2024

xiaoxmeng Jun 24, 2024

facebook-github-bot commented Jun 19, 2024

xiaoxmeng left a comment

xiaoxmeng Jun 24, 2024

facebook-github-bot commented Jun 25, 2024

facebook-github-bot commented Jun 25, 2024

xiaoxmeng left a comment

xiaoxmeng Jun 25, 2024

facebook-github-bot commented Jun 25, 2024

facebook-github-bot commented Jun 25, 2024

facebook-github-bot commented Jun 25, 2024

xiaoxmeng left a comment

xiaoxmeng Jun 25, 2024

zacw7 Jun 25, 2024

facebook-github-bot commented Jun 26, 2024

xiaoxmeng left a comment

facebook-github-bot commented Jun 26, 2024

xiaoxmeng left a comment

facebook-github-bot commented Jun 27, 2024

facebook-github-bot commented Jun 27, 2024

conbench-facebook bot commented Jun 27, 2024

Add fuzzer for async data cache #10244

Add fuzzer for async data cache #10244

Conversation

zacw7 commented Jun 18, 2024 • edited Loading

netlify bot commented Jun 18, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

facebook-github-bot commented Jun 18, 2024

facebook-github-bot commented Jun 18, 2024

facebook-github-bot commented Jun 18, 2024

facebook-github-bot commented Jun 18, 2024

kewang1024 left a comment

Choose a reason for hiding this comment

kewang1024 Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 19, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 25, 2024

facebook-github-bot commented Jun 25, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 25, 2024

facebook-github-bot commented Jun 25, 2024

facebook-github-bot commented Jun 25, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 26, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 26, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 27, 2024

facebook-github-bot commented Jun 27, 2024

conbench-facebook bot commented Jun 27, 2024

zacw7 commented Jun 18, 2024 •

edited

Loading

netlify bot commented Jun 18, 2024 •

edited

Loading

kewang1024 Jun 18, 2024 •

edited

Loading