Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Growing memory and goroutines #142

Closed
hannahhoward opened this issue Nov 15, 2022 · 3 comments
Closed

Growing memory and goroutines #142

hannahhoward opened this issue Nov 15, 2022 · 3 comments
Assignees

Comments

@hannahhoward
Copy link
Collaborator

It appears the longer autoretrieve runs, the more goroutines run and the more likely autoretrieve is to crash.

I've attached goroutine dump and heap profile for the bedrock instance currently.

dumps.zip

@hannahhoward
Copy link
Collaborator Author

Largest go-routine counts appear to be running state machines and channel monitoring.
Memory appears to be in memory copies of the data transfer states

profile001

@rvagg
Copy link
Collaborator

rvagg commented Nov 28, 2022

This is the memory issue we're chasing, my thoughts and current hypothesis follows.

profile002

@rvagg
Copy link
Collaborator

rvagg commented Nov 28, 2022

To describe this memory path, it's like this:

There are 3 Bytes terminals in a Graphsync message, there is a UUID (32 bytes), a CID prefix (3-4 bytes), but the most interesting is the data field of a GraphSyncBlock, which can be quite large. For autoretrieve, we're mostly dealing with UnixFS data (either dag-pb or raw) which are going to be up to ~1Mb in size; so these are potentially quite large chunks and it wouldn't take much to compound up to the sizes we're seeing. We were getting regular crashes in our bedrock instance at our 32G memory max, now we have double that but we still get regular crashes, they just happen less frequently.

So block data eh? What's the mechanism for leaking these since we can confirm it's not whole message data that we're leaking, it's specifically the block portion.

My hypothesis: Graphsync is dealing with DAGs, after it gets these blocks, it stores them but also decodes them so it can traverse and transfer more of the DAG. The obvious candidate here is decoding a Cid, since that's all Graphsync really cares about inside block data. We have nothing else in the path above Graphsync that would care about anything else inside these blocks. If we decode dag-cbor blocks, we're going to be hitting the same Readn() callsite because a Cid is just a tagged byte section in CBOR, so even if we were leaking via Cids then we'd have to have a lot of them because we're talking about ~34 bytes each. But, we're not dealing mainly with dag-cbor data, this is mostly dag-pb data because it's IPFS dht requests which are mostly UnixFS. dag-pb does it differently, it does slice-of-slice to decode its Data section and its Links array Hash elements which are Cids, which Graphsync will be wanting for traversal. This whole block here performs no new allocations: https://github.com/ipld/go-codec-dagpb/blob/061699f10dbb9e9bc8e8ed3552cbd8597730f412/unmarshal.go#L191-L197 - the Cid generated holds on to the underlying slice for the entire block.

Graphsync could be holding on to these somewhere, I don't see an obvious place, perhaps something in the new reconciledloader (which could even be holding on to the entire block) or maybe even in the traverser which cares about Cids. But I found a better candidate closer to home, the autoretrieve RandomPruner: https://github.com/application-research/autoretrieve/blob/master/blocks/randompruner.go#L50; when you have any pruning option enabled (which we do for bedrock: prune-threshold: 45 GB); the RandomPruner will act as a Blockstore wrapper and will hold on to any Cid that's passed in via Put() and PutMany(). The Cids are put in a map along with their timestamp and only cleared out when that time expires.

For this to work, Graphsync would need to be using the Cid from a previous block in the DAG when storing the bytes of that block that come from a later message (if it was just storing blocks in isolation as they come, with newly generated Cids for it then those would be new slices). I believe this is what happens: https://github.com/ipfs/go-graphsync/blob/386a1d8990e59f4b3ac66e633960efa7b9d2934b/requestmanager/reconciledloader/load.go#L134-L146 - the ReconciledLoader maintains the list of Cids it needs, then when it gets the bytes for that block, it writes those bytes and commits them with the original Cid that it was looking for, which would be a sub-slice of a previous dag-pb block and hence be associated with that original allocated data. Filclient uses Graphsync's LinkSystemForBlockstore to wrap a Blockstore, and that uses the Cid in question directly for the Put() call: https://github.com/ipfs/go-graphsync/blob/386a1d8990e59f4b3ac66e633960efa7b9d2934b/storeutil/storeutil.go#L42

So:

  1. Graphsync receives a message
  2. Graphsync decodes 1 or more block data
  3. Graphsync decodes block data to find more Cids to traverse; Cids from dag-pb blocks are sub-slices of the whole dag-pb block
  4. Graphsync receives blocks for those Cids it's looking for
  5. Graphsync stores block data along with original Cid
  6. Filclient passes the storage operation through a blockstore Put(cid, bytes)
  7. Autoretrieve's blockstore, wrapped by a RandomPruner, holds on to the cid, and therefore also the associated underlying block data for 1 hour
  8. With enough volume, we have a lot of dag-pb blocks in memory, held on to just by one or more Cids which are used in association with the data that was found to match the Cid

To get to 32G, we'd need ~65k x 500k blocks. We're only doing ~5k successful "retrievals" per hour, but that's for a root of a potentially large DAG, and that doesn't count failed but partial retrievals. Maybe those numbers could get us there? Let's find out.

I'm about to submit a PR that will test this hypothesis by cloning the Cid; then we should only accumulate Cid memory (which is still going to be a lot .. but shouldn't be unmanageable). If it's not this then I reckon it's something close to this. If it is this, then it raises interesting questions about our stack and whether we should "fix" this at one or more points, including possibly dag-pb and graphsync.

rvagg added a commit that referenced this issue Nov 28, 2022
…ger memory range (#150)

* fix: create a new CID from a clean slice to avoid holding on to a larger memory range

Ref: #142 (comment)

* Update blocks/randompruner.go

Co-authored-by: Kyle Huntsman <3432646+kylehuntsman@users.noreply.github.com>

Co-authored-by: Kyle Huntsman <3432646+kylehuntsman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants