Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: ban bad peers #4548

Merged
merged 17 commits into from
Mar 12, 2020
Merged

p2p: ban bad peers #4548

merged 17 commits into from
Mar 12, 2020

Conversation

cmwaters
Copy link
Contributor

Closes: #1614

Description

This is just a MVP implementation of having a blacklist that bans peers for a bantime (default is 1 day) and I do feel that this is potentially a bandage to be replaced when the entire p2p module gets redone. Below is a illustration of the implementation.

p2p

Reinstated peers are added to the "new" (unvetted) bucket.

Thoughts:

The ReinstateBadPeers() function is run by the synchronous function ensurePeers()every 30 seconds if the NeedMoreAddrs() fails. The alternative would be to have isBanned() check within parts of the code that want to use the address.

It might be a good idea to also keep a count of the amount of bans such that repeat offenders can serve longer bans / be banned forever.


For contributor use:

  • Wrote tests
  • Updated CHANGELOG_PENDING.md
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Updated relevant documentation (docs/) and code comments
  • Re-reviewed Files changed in the Github PR explorer

@cmwaters cmwaters requested a review from melekes March 10, 2020 17:57
@cmwaters cmwaters added the C:p2p Component: P2P pkg label Mar 10, 2020
@cmwaters cmwaters requested a review from erikgrinaker March 10, 2020 18:09
@tac0turtle tac0turtle changed the title Ban Bad Peers p2p: ban bad peers Mar 10, 2020
@codecov-io
Copy link

codecov-io commented Mar 10, 2020

Codecov Report

Merging #4548 into master will decrease coverage by 0.22%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##           master    #4548      +/-   ##
==========================================
- Coverage   65.56%   65.33%   -0.23%     
==========================================
  Files         229      229              
  Lines       20287    20326      +39     
==========================================
- Hits        13302    13281      -21     
- Misses       5937     5990      +53     
- Partials     1048     1055       +7     
Impacted Files Coverage Δ
p2p/pex/errors.go 11.11% <0.00%> (-13.89%) ⬇️
p2p/pex/pex_reactor.go 84.26% <75.00%> (-0.48%) ⬇️
p2p/pex/addrbook.go 70.39% <97.36%> (+1.55%) ⬆️
p2p/pex/known_address.go 56.36% <100.00%> (+3.42%) ⬆️
crypto/sr25519/pubkey.go 32.14% <0.00%> (-7.15%) ⬇️
consensus/reactor.go 76.93% <0.00%> (-2.89%) ⬇️
privval/signer_endpoint.go 78.37% <0.00%> (-2.71%) ⬇️
privval/signer_listener_endpoint.go 86.95% <0.00%> (-2.18%) ⬇️
consensus/state.go 77.16% <0.00%> (-1.76%) ⬇️
... and 7 more

@@ -8,7 +8,7 @@ import (

"github.com/pkg/errors"

amino "github.com/tendermint/go-amino"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. My main guess would be that my IDE did it in the background

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach makes sense. 👍 We should have some tests for this, and fix some minor style nits, but let's put that off until the final review.

func (a *addrBook) ReinstateBadPeers() {
for _, ka := range a.badPeers {
if !ka.isBanned(defaultBanTime) {
a.mtx.Lock()
Copy link
Contributor

@erikgrinaker erikgrinaker Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex lock must be taken out at the start of the function, otherwise a concurrent thread can modify badPeers while you are iterating over it and then bad things happen.

func (a *addrBook) MarkBad(addr *p2p.NetAddress) {
a.RemoveAddress(addr)
if a.addBadPeer(addr) {
a.RemoveAddress(addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addBadPeer() returns false if the peer is already considered bad, in which case it will not be removed from the address book. Shouldn't we always remove it regardless, since we do not want bad peers there under any circumstances? We may also want to refresh the ban timer in that case.

Also, since both of these calls take out mutex locks separately, the data can change between them, which may cause problems. Not sure if that will be an actual problem here, but in general it's a good idea to take out mutex locks in public methods, and have internal methods that assume locks are already held such that they can be combined and composed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if it's bad but still in part of the address book then we should return true so that it still runs RemoveAddress.

For your second paragraph about the mutexes, that makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I remember why I didn't add the mutexes in the public function because RemoveAddress has it's own mutex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Which means that addBadPeer and RemoveAddress can't be called in a single critical section. If we had removeAddress() which did not take out a lock, and RemoveAddress() which does take out a lock and call removeAddress() internally, then we could use removeAddress() in a critical section together with addBadPeer().

Again, not sure if it's worth spending time on in this case, since I suspect it's safe to call RemoveAddress() regardless of any state changes. But this is a general problem across the code base which we should keep in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes I understand. I think I will change it like that anyway

ka.LastBanTime = time.Now()
}

func (ka *knownAddress) isBanned(banTime time.Duration) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to pass banTime as a parameter here, maybe just use the constant directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering whether we want to make it something that is either a variant - or at least configurable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. But I don't think it's something that all callers should have to deal with every time time they want to check if someone is banned, it's more convenient if isBanned() just takes all conditions into account automatically. But I'm not sure what the best way is to inject variables into the current architecture - I guess this is fine for a first iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option may be to have ban() take a parameter for how long to ban, and store bannedUntil as the time when the ban ends (the max of the current and given). Then isBanned() can simply compare time.Now() against bannedUntil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that could be a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's necessary in the first iteration. But it did occur to me that passing the ban time to ban() would allow us to vary the ban duration for different offences, if that's something we would need later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we perhaps want to expose Ban() so other modules could use it?

Copy link
Contributor

@erikgrinaker erikgrinaker Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Let's consider that when there's an actual need for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay we can leave it for later. I was talking to Tess how each module sort of has their own way of categorising and dealing with peers that makes things at a whole inconsistent and potentially repetitive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. That's something to consider for the p2p refactor.

@cmwaters
Copy link
Contributor Author

Current usages of the ban list:

  • If a peer is supposed to be a persistent peer but doesn't respond after x amount of attempts
  • when ErrSwitchAuthenticationFailure happens which I don't think ever happens

This makes me think do we differentiate based on whether a peer timed out or if they actually misbehaved (#4415 (comment)) and whether we want a banlist and a blacklist - one to not talk to a node for a while and the other to permanently never talk to that node - note that removing the address at the moment isn't enough because we don't keep track of it and the node can simply join again #1444

@cmwaters cmwaters marked this pull request as ready for review March 11, 2020 16:35
@cmwaters cmwaters requested a review from ebuchman as a code owner March 11, 2020 16:35
Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this PR

func (a *addrBook) IsBanned(addr *p2p.NetAddress) bool {
a.mtx.Lock()
defer a.mtx.Unlock()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.mtx.Lock()
_, ok := a.badPeers[addr.ID]
a.mtx.Unlock()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain to me the difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need in defer if you don't call other functions and code is not complex

bucket := a.calcNewBucket(ka.Addr, ka.Src)
a.addToNewBucket(ka, bucket)
delete(a.badPeers, ka.ID())
a.Logger.Info("Reinstated Address", "addr", ka.Addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a.Logger.Info("Reinstated Address", "addr", ka.Addr)
a.Logger.Info("Reinstated address", "addr", ka.Addr)

@@ -63,3 +63,11 @@ type ErrAddrBookInvalidAddr struct {
func (err ErrAddrBookInvalidAddr) Error() string {
return fmt.Sprintf("Cannot add invalid address %v: %v", err.Addr, err.AddrErr)
}

type ErrAddressBanned struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about for other errors - there isn't any documentation for them either

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please write doc for them too?

@@ -529,7 +538,7 @@ func (r *Reactor) dialPeer(addr *p2p.NetAddress) error {
// failed to connect to. Then we can clean up attemptsToDial, which acts as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this comment?

I think we need both attemptsToDial and blacklist

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

// check it exists in addrbook
ka := a.addrLookup[addr.ID]
// check address is not already there
if ka != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simplify this with an early return, i.e. if ka == nil { return false }

@@ -54,6 +55,14 @@ func (ka *knownAddress) markGood() {
ka.LastSuccess = now
}

func (ka *knownAddress) ban(banTime time.Duration) {
ka.LastBanTime = time.Now().Add(banTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should only extend the ban, not shorten it. For example, if someone banned the address until 16:00, and then someone else wants to ban them until 13:00, they should remain banned until 16:00. This will matter when we start having different ban periods for various offences.

@@ -8,7 +8,7 @@ import (

"github.com/pkg/errors"

amino "github.com/tendermint/go-amino"
"github.com/tendermint/go-amino"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keep the old formatting. It's considered good form to use an explicit name if the package name differs from the URL name (amino vs go-amino), and to use a blank line between external and internal dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason my ide doesn't see the difference between amino and go-amino but I will make the change regardless

@@ -63,3 +63,12 @@ type ErrAddrBookInvalidAddr struct {
func (err ErrAddrBookInvalidAddr) Error() string {
return fmt.Sprintf("Cannot add invalid address %v: %v", err.Addr, err.AddrErr)
}

// Err is thrown when the address is banned and therefore cannot be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go comments should start with name of what they're describing, i.e. ErrAddressBanned is thrown .... I thought the linter would catch this, but we may have disabled it.

@cmwaters cmwaters merged commit 61a9ec1 into master Mar 12, 2020
@cmwaters cmwaters deleted the callum/p2p-blacklist branch March 12, 2020 14:52
@melekes
Copy link
Contributor

melekes commented Mar 12, 2020

Where's the changelog pending entry?

@melekes
Copy link
Contributor

melekes commented Mar 12, 2020

Also, tendermint/spec needs to be updated.

cmwaters added a commit that referenced this pull request Mar 12, 2020
cmwaters added a commit that referenced this pull request Mar 12, 2020
p2p: Update Changelog with ban list PR - #4548
cmwaters added a commit that referenced this pull request Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:p2p Component: P2P pkg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

p2p: prevent bad peer from connecting to us for some time
6 participants