feat: Add rate limiting to the lotus gateway #8517

coryschwartz · 2022-04-20T01:52:30Z

Add rate limiting to the lotus gateway.

This allows us to limit the number of requests the lotus gateway will allow.

Each API call can have its own cost depending on how much load is likely to be induced. API calls are permitted to go through, so long as there are enough tokens available in the bucket before the timeout expires.

If there are insufficient tokens, an error is returned and the API call is not performed.

coryschwartz · 2022-04-21T03:28:13Z

Ran a couple of quick tests and dirty tests

Without the rate limiter Queries to get the genesis tipset respond quickly at about 6.7k qps on my test setup.

With the rate limiter, but with --rate-limit 0, and there was no impact to performance.

Actually, the performance improved a little bit... but near enough consider the performance identical

codecov · 2022-04-21T04:08:03Z

Codecov Report

Merging #8517 (e679083) into master (effee8c) will decrease coverage by 0.17%.
The diff coverage is 14.03%.

❗ Current head e679083 differs from pull request most recent head b305483. Consider uploading reports for the commit b305483 to get more accurate results

@@            Coverage Diff             @@
##           master    #8517      +/-   ##
==========================================
- Coverage   40.95%   40.77%   -0.18%     
==========================================
  Files         687      686       -1     
  Lines       75809    75817       +8     
==========================================
- Hits        31049    30918     -131     
- Misses      39418    39538     +120     
- Partials     5342     5361      +19

Impacted Files	Coverage Δ
cmd/lotus-gateway/main.go	`0.00% <0.00%> (ø)`
metrics/metrics.go	`100.00% <ø> (ø)`
gateway/node.go	`34.72% <14.95%> (-10.42%)`	⬇️
node/modules/lp2p/rcmgr.go	`4.51% <0.00%> (-25.57%)`	⬇️
itests/kit/blockminer.go	`70.65% <0.00%> (-19.03%)`	⬇️
extern/sector-storage/worker_tracked.go	`80.50% <0.00%> (-6.78%)`	⬇️
miner/miner.go	`54.75% <0.00%> (-4.27%)`	⬇️
chain/stmgr/call.go	`69.18% <0.00%> (-3.25%)`	⬇️
markets/loggers/loggers.go	`89.28% <0.00%> (-1.90%)`	⬇️
node/impl/full/mpool.go	`46.95% <0.00%> (-1.74%)`	⬇️
... and 34 more

travisperson · 2022-04-22T18:04:45Z

gateway/node.go

 	"github.com/filecoin-project/lotus/node/impl/full"
 )

 const (
 	DefaultLookbackCap            = time.Hour * 24
 	DefaultStateWaitLookbackLimit = abi.ChainEpoch(20)
+	DefaultRateLimitTimeout       = time.Minute * 10


Does this mean a hanging request for 10 minutes?

We would override this.

There is a leaky-bucket rate limiter so when there is a burst of traffic, requests are queued and handled at a specific rate to avoid overloading the backend.

Without the timeout, requests would be allowd to queue indefinitely. This timeout sets a maximum amount of time we would permit a request to wait, and the gateway responds with a server busy.

I think it's worthwhile to note that requests do not actually wait for this entire time. Because the are handled at a specific rate during bursty periods, the system can predict whether the context timeout will expire before it is handled, and the error is returned immediately.

If a request comes in and there are enough tokens available in the bucket, the request is handled right away.

if a request comes in during high load and there are not enough tokens, the request will queue until enough tokens are available, then process the request.

if there are not enough tokens available in the bucket, and there are so many requests enqueued that the context timeout will expire before the request is handled, respond with an error right away. Dont wait for the timeout.

With that explanation...10 minutes is unreasonable for a real timeout. A more reasonable timeout is probably 1-5 seconds. By default, I didn't want the rate limiter to be enabled, but I need the context to be set to something. I can change this to 1 Second if this makes more sense, but it's definitely something you'd have to tune to your situation.

I'd set this to some reasonably high value; 5s is probably good enough

travisperson · 2022-04-25T19:40:05Z

Just a note: this appears to be a global rate limit. So while it will protect our backend, it does mean that the system can still be DOS'd very easily just simply making a request to Filecoin.Version which will eat 1 token. This can easily exhaust the system of tokens for all other users.

coryschwartz · 2022-04-28T07:12:15Z

Just a note: this appears to be a global rate limit. So while it will protect our backend, it does mean that the system can still be DOS'd very easily just simply making a request to Filecoin.Version which will eat 1 token. This can easily exhaust the system of tokens for all other users.

This is true. This will protect the backend from being overworked so it can continue processing messages under heavy load, but it does not prevent DOS attacks.

There is still some benefit to load shedding at the API layer even if only to allow the backend to not to become overwhelmed.

we can add some additional limits for each connnection and this would partially mitigate the problem. At least then a single individual cannot DOS the service by issuing Filecoin.Version as you suggest. Is that what you are thinking @travisperson ?

DDoS mitigation would likely have to be handled with scalability. As you suggest, this rate limiter would not help in that case except to keep the backend working.

travisperson · 2022-05-02T16:30:05Z

I think it's just something we need to understand and determine if it's acceptable. If one user exhausting all tokens is unacceptable then we need a better solution to limit per connection / ip.

coryschwartz · 2022-05-09T14:25:49Z

I'm moving this back to draft due to concerns about using only global rate limits.

I'll implement a per-connection rate limiting and will produce some performance benchmarks before I put this back out for review.

coryschwartz · 2022-05-20T10:44:24Z

@travisperson can you give this a try?

There's two new rate limiters in addition to the global one.

One limits the number of HTTP connections you're allowed to open in a single minute.

The other adds limits for RPC call over an established websocket.

magik6k

Just a high-level review pass, generally looks good.

Conflicts need resolving.

magik6k · 2022-05-31T10:37:32Z

gateway/node.go

 	"github.com/filecoin-project/lotus/node/impl/full"
 )

 const (
 	DefaultLookbackCap            = time.Hour * 24
 	DefaultStateWaitLookbackLimit = abi.ChainEpoch(20)
+	DefaultRateLimitTimeout       = time.Minute * 10


I'd set this to some reasonably high value; 5s is probably good enough

magik6k · 2022-05-31T10:44:50Z

gateway/handler.go

+	}
+
+	h.mu.Lock()
+	seen, ok := h.ipmap[host]


We should make this not abuseable with ipv6

This does work the same way on ipv4 or ipv6, this is keyed by just simply a string, which will be something like "1.2.3.4" when ipv4 is used or "1111:2222:3333:4444:5555:6666:7777:8888" when using ipv6.

This is the third kind of rate limiting on this PR, and I could be convinced that this should be done differently.

The global rate limiter and per-connection limiter protect the backend. The former is persistent for the lifetime of the process and the latter persists for the lifetime of a single connection.

The connection rate limiter, however, is intended to prevent abuse from scripts opening several connections in quick succession. the ipmap grows when there new connections, and shrinks again some time later.

Alternatively, we could do this as a max simultanious connections rather than "connections per minute"

My main worry is that with ipv6 you get a /64 subnet, which is a lot of IPs which are tracked separately; but yes, the other limits should help here

Cory Schwartz added 2 commits April 19, 2022 18:35

Add rate limiting to the lotus gateway

fc42d03

respond early

9558e9f

ArseniiPetrovich mentioned this pull request Apr 20, 2022

Install Filecoin gateway in front of public API nodes - should help reduce problems protofire/Filecoin-node-hosting-management#203

Open

add metrics

9bf9843

Cory Schwartz added 3 commits April 20, 2022 20:33

fix test

c2b7ffb

lint

30559a0

fix itests

9fd3f3f

coryschwartz marked this pull request as ready for review April 21, 2022 04:10

coryschwartz requested a review from a team as a code owner April 21, 2022 04:10

travisperson reviewed Apr 22, 2022

View reviewed changes

Cory Schwartz added 3 commits April 28, 2022 00:45

add test for the limit func

5ec084a

Merge branch 'master' into rate-limit-gateway

35e114a

spelling error

e679083

coryschwartz marked this pull request as draft May 9, 2022 14:26

Cory Schwartz added 2 commits May 19, 2022 14:25

Merge branch 'master' into rate-limit-gateway

c9d3652

connection rate limiting

b305483

coryschwartz marked this pull request as ready for review May 20, 2022 10:45

fix integration test

44264dc

magik6k changed the title ~~Add rate limiting to the lotus gateway~~ feat: Add rate limiting to the lotus gateway May 31, 2022

magik6k reviewed May 31, 2022

View reviewed changes

Cory Schwartz added 3 commits June 1, 2022 00:58

Merge branch 'master' into rate-limit-gateway

e645e8a

default 5 sec

4c51e4c

use key type

ea51732

lint

84cc1e3

magik6k approved these changes Jun 10, 2022

View reviewed changes

Merge remote-tracking branch 'origin/master' into rate-limit-gateway

ae42ed0

magik6k enabled auto-merge June 10, 2022 11:01

magik6k merged commit e899f73 into master Jun 10, 2022

magik6k deleted the rate-limit-gateway branch June 10, 2022 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add rate limiting to the lotus gateway #8517

feat: Add rate limiting to the lotus gateway #8517

coryschwartz commented Apr 20, 2022

coryschwartz commented Apr 21, 2022

codecov bot commented Apr 21, 2022 •

edited

Loading

travisperson Apr 22, 2022

coryschwartz Apr 25, 2022 •

edited

Loading

magik6k May 31, 2022

travisperson commented Apr 25, 2022

coryschwartz commented Apr 28, 2022

travisperson commented May 2, 2022

coryschwartz commented May 9, 2022

coryschwartz commented May 20, 2022

magik6k left a comment

magik6k May 31, 2022

magik6k May 31, 2022

coryschwartz Jun 7, 2022

magik6k Jun 10, 2022

feat: Add rate limiting to the lotus gateway #8517

feat: Add rate limiting to the lotus gateway #8517

Conversation

coryschwartz commented Apr 20, 2022

coryschwartz commented Apr 21, 2022

codecov bot commented Apr 21, 2022 • edited Loading

Codecov Report

travisperson Apr 22, 2022

Choose a reason for hiding this comment

coryschwartz Apr 25, 2022 • edited Loading

Choose a reason for hiding this comment

magik6k May 31, 2022

Choose a reason for hiding this comment

travisperson commented Apr 25, 2022

coryschwartz commented Apr 28, 2022

travisperson commented May 2, 2022

coryschwartz commented May 9, 2022

coryschwartz commented May 20, 2022

magik6k left a comment

Choose a reason for hiding this comment

magik6k May 31, 2022

Choose a reason for hiding this comment

magik6k May 31, 2022

Choose a reason for hiding this comment

coryschwartz Jun 7, 2022

Choose a reason for hiding this comment

magik6k Jun 10, 2022

Choose a reason for hiding this comment

codecov bot commented Apr 21, 2022 •

edited

Loading

coryschwartz Apr 25, 2022 •

edited

Loading