Skip to content

Commit

Permalink
Shard test infra deriver input by source file path id
Browse files Browse the repository at this point in the history
Summary:
**Background**
The testinfra fb-deriver step is often OOMing for hack coverage as it's using up to 250GB of memory on sandcastle, reaching the limits of machine sizes.

Let's look at what the deriver is doing:
1. Loads all [CoveredFileOnly](https://www.internalfb.com/code/fbsource/[4510a3d1cb9f9f8fe07b6e8260724afd2f6d6c4a]/fbcode/glean/schema/source/facebook/testinfra.angle?lines=101-108) into (File id, CoveredFile id)
2. Loads all [CoveredFolder](https://www.internalfb.com/code/fbsource/[4510a3d1cb9f9f8fe07b6e8260724afd2f6d6c4a]/fbcode/glean/schema/source/facebook/testinfra.angle?lines=93-99): map from CoveredFolder id -> ([CoveredFolder ids], [File ids])
   1. Only File ids that are in the set of files ids loaded in the previous step are kept
   1. *(Note: Could we have another space/time saving optimization to not save anything in this map if File has been filtered to empty?)*
3. Loads all [CoveredAssembly](https://www.internalfb.com/code/fbsource/[4510a3d1cb9f9f8fe07b6e8260724afd2f6d6c4a]/fbcode/glean/schema/source/facebook/testinfra.angle?lines=125-130): map from CoveredAssembly id -> ([CoveredFolder ids], [File ids]) from previous step
4. Inverts assemblies using that data:
   a. For each (File id, CoveredFile id) loaded in the first step, add the relevant [CoveredAssembly] to that tuple for (File id, CoveredFile id, [CoveredAssembly ids])

**This diff**
* We want to shard the deriver into sets of file paths it cares about, so that the investing assemblies step does not require so much data in memory
* We do this by filePathId % numShards == thisShardNum
* We pass numShards and thisShardNum as flags to the deriver

Reviewed By: pepeiborra

Differential Revision: D51347271

fbshipit-source-id: 1ec6187eedbd4c503fce40328e04a07e8557d7ec
  • Loading branch information
Will Hughes authored and facebook-github-bot committed Nov 28, 2023
1 parent e0163f1 commit 6d90f14
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 0 deletions.
4 changes: 4 additions & 0 deletions glean/hs/Glean/Util/PredMap.hs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ module Glean.Util.PredMap
, toList, toAscList
, size
, member, notMember
, filterValues
) where

import Control.DeepSeq (NFData)
Expand Down Expand Up @@ -109,3 +110,6 @@ member p pm = IntMap.member (one p) (predMap pm)

notMember :: IdOf p -> PredMap p v -> Bool
notMember p pm = IntMap.notMember (one p) (predMap pm)

filterValues :: (v -> Bool) -> PredMap p v -> PredMap p v
filterValues f (PredMap m) = PredMap $ IntMap.filter f m
14 changes: 14 additions & 0 deletions glean/lang/clang/Derive/Types.hs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ data Config = Config
, cfgDebugPrintReferences :: Bool -- ^ function-calls pass
, cfgMaxQueueSize :: Int -- ^ function-calls pass
, cfgIncremental :: Bool -- ^ derive incrementally
, cfgTestedFileShards :: Int -- Number of shards for --tested-file deriver
, cfgTestedFileShard :: Int -- Shard number for ---tested-file deriver
}

-- | Only used for regression testing derived passes
Expand All @@ -76,6 +78,8 @@ testConfig repo = Config
, cfgDebugPrintReferences = False
, cfgMaxQueueSize = 100000
, cfgIncremental = False
, cfgTestedFileShards = 1
, cfgTestedFileShard = 0
}

-- | Command-line argument parser for "Derive" to get 'Config'
Expand Down Expand Up @@ -132,4 +136,14 @@ options = do
<> O.help "function-calls: maximum queue size for cxx.FileXRefMap"
cfgIncremental <- O.switch $
O.long "incremental"
cfgTestedFileShards <- O.option O.auto $
O.long "tested-file-shards"
<> O.metavar "N"
<> O.value 1
<> O.help "number of file shards for the --tested-file deriver"
cfgTestedFileShard <- O.option O.auto $
O.long "tested-file-shard"
<> O.metavar "N"
<> O.value 0
<> O.help "the shard number to filter files for ---tested-file deriver"
return Config{..}

0 comments on commit 6d90f14

Please sign in to comment.