Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Shard test infra deriver input by source file path id
Summary: **Background** The testinfra fb-deriver step is often OOMing for hack coverage as it's using up to 250GB of memory on sandcastle, reaching the limits of machine sizes. Let's look at what the deriver is doing: 1. Loads all [CoveredFileOnly](https://www.internalfb.com/code/fbsource/[4510a3d1cb9f9f8fe07b6e8260724afd2f6d6c4a]/fbcode/glean/schema/source/facebook/testinfra.angle?lines=101-108) into (File id, CoveredFile id) 2. Loads all [CoveredFolder](https://www.internalfb.com/code/fbsource/[4510a3d1cb9f9f8fe07b6e8260724afd2f6d6c4a]/fbcode/glean/schema/source/facebook/testinfra.angle?lines=93-99): map from CoveredFolder id -> ([CoveredFolder ids], [File ids]) 1. Only File ids that are in the set of files ids loaded in the previous step are kept 1. *(Note: Could we have another space/time saving optimization to not save anything in this map if File has been filtered to empty?)* 3. Loads all [CoveredAssembly](https://www.internalfb.com/code/fbsource/[4510a3d1cb9f9f8fe07b6e8260724afd2f6d6c4a]/fbcode/glean/schema/source/facebook/testinfra.angle?lines=125-130): map from CoveredAssembly id -> ([CoveredFolder ids], [File ids]) from previous step 4. Inverts assemblies using that data: a. For each (File id, CoveredFile id) loaded in the first step, add the relevant [CoveredAssembly] to that tuple for (File id, CoveredFile id, [CoveredAssembly ids]) **This diff** * We want to shard the deriver into sets of file paths it cares about, so that the investing assemblies step does not require so much data in memory * We do this by filePathId % numShards == thisShardNum * We pass numShards and thisShardNum as flags to the deriver Reviewed By: pepeiborra Differential Revision: D51347271 fbshipit-source-id: 1ec6187eedbd4c503fce40328e04a07e8557d7ec
- Loading branch information