You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current behavior of pin add --wait seems to be waiting for "pinned" status everywhere (no matter what replication factor was set? I tried to pass --replication-min|max and those seem to be ignored as well).
This effectively breaks CI build if any of the cluster peers is slow/unresponsive, and even when all of them are online, makes build unnecessarily long.
In recent days, it was causing issues for both project teams and stewards (dist.ipfs.io).
I believe the most useful feature for CI would be to allow for "relaxed" --wait where we block only until "n" replica confirmations are received.
I propose we add a feature where a number of replica confirmations can be passed, and then we wait until that number of "pinned" confirmations is received:
The above would pin with replication factor 4 but wait only for 2 confirmations before unblocking CI job.
I imagine that when the number is missing, we would default to either --replication[-min] (if provided), and then fallback to number of peers when no replication is passed.
Alternative approach
Alternative is to make --wait just follow implicit/explicit --replication[-min], but I believe decoupling waiting from replication factor makes it more useful / flexible.
Adding an option to wait sounds good, but it seems something else was happening, which suggest a bug in current logic or an issue at some other level. The wait should have returned on errors too.
Are you sure that you saw what you think you saw? The item was probably not yet pinned in all the places it should have been pinned.
I remember trying multiple things and it looked like reproducible: had pin add --wait --replication 2 waiting forever, even tho --debug responses used for status polling revealed that >2 peers had the CID in pinned status.
There was only one peer in unpinned state (infra confirmed we had issues with cluster at that time), and everything else was pinned, which feels like a bug in situation where cluster is partially out of sync perhaps?
Ok, that was probably the problem. I'll call this a bug.
hsanjuan
added
kind/bug
A bug in existing code (including security flaws)
P1
High: Likely tackled by core team if no one steps up
and removed
need/triage
Needs initial labeling and prioritization
labels
Aug 11, 2021
Fixes#1427. Currently, if --wait is used when pinning it will wait until all
statuses reported for a pin are either Pinned or Remote. If a peer was lagging
behind and not syncing the state properly (reporting "unpinned" for example),
that would be enough to block waiting.
This modifies the behaviour of wait to return when replication_factor_min is
reached, regardless of what other statuses are.
Current behavior
Current behavior of
pin add --wait
seems to be waiting for "pinned" status everywhere (no matter what replication factor was set? I tried to pass--replication-min|max
and those seem to be ignored as well).This effectively breaks CI build if any of the cluster peers is slow/unresponsive, and even when all of them are online, makes build unnecessarily long.
In recent days, it was causing issues for both project teams and stewards (dist.ipfs.io).
I ended up not using
--wait
and decided to poll for status in userland insteadProposed enhancement
I believe the most useful feature for CI would be to allow for "relaxed"
--wait
where we block only until "n" replica confirmations are received.I propose we add a feature where a number of replica confirmations can be passed, and then we wait until that number of "pinned" confirmations is received:
The above would pin with replication factor 4 but wait only for 2 confirmations before unblocking CI job.
I imagine that when the number is missing, we would default to either
--replication[-min]
(if provided), and then fallback to number of peers when no replication is passed.Alternative approach
Alternative is to make
--wait
just follow implicit/explicit--replication[-min]
, but I believe decoupling waiting from replication factor makes it more useful / flexible.cc @olizilla @hsanjuan @dholms (#1311)
The text was updated successfully, but these errors were encountered: