Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: implement :RateLimitingDefaultCapacityTiers in the new front-end #610

Open
donsizemore opened this issue Feb 24, 2025 · 4 comments

Comments

@donsizemore
Copy link

Overview of the Feature Request

We're seeing increased traffic from what are believed to be LLM scrapers. Current rate limiting logic applies to API endpoints only, but could similar logic help the new front-end protect from such attacks?

What kind of user is the feature intended for?

Sysadmin

What inspired the request?

A deluge of GET requests from a broad swath of client IPv4s, each client making some small (<20) number of requests before switching addresses.

What existing behavior do you want changed?

Database servers lit up like Christmas trees.

Any brand-new behavior do you want to add to Dataverse?

Configurable rate limiting in the new front-end, if its current architecture might make this possible in any way.

Any open or closed issues related to this feature request?

IQSS/dataverse#10211

@scolapasta
Copy link
Contributor

@landreev should be able to confirm, but I think the : RateLimitingDefaultCapacityTiers is actually at the Command layer. Which means it would limit both API calls and (current LSF) front end UI calls. At least for any activity that calls a command.

@landreev
Copy link

landreev commented Feb 24, 2025

Correct, as currently implemented, the tier levels are configured for commands.
2 special dummy commands were added, CheckRateLimitForCollectionPageCommand and CheckRateLimitForDatasetPageCommand specifically to be able to rate-limit the 2 workhorse pages in the current, jsf UI. ("dummy" = the commands don't do anything; their sole purpose is to be called from the init() methods of the 2 pages, providing a way to rate-limit the pages).
For the purposes of the new front end, one way to go would be to use this same model - and create dedicated dummy API calls, that don't do anything but call the dummy commands that don't do anything... etc. and have the SPA start all the page sessions with the calls to the corresponding APIs.
But it also may be possible to achieve the same result by rate-limiting the actual workload commands like GetDataverseCommand and GetLatestPublishedDatasetVersionCommand, since, presumably, the SPA will be consistently calling the API methods utilizing these commands. (The existing jsf Dataverse page does not call any commands to initialize; the dataset page does, but in some non-consistent manner - hence we just added the dedicated dummy commands).

@qqmyers
Copy link
Member

qqmyers commented Feb 24, 2025

FWIW: We (QDR) tried rate limiting for unauthenticated users to stop this, but that causes normal unauthenticated users to be rate limited as well, so we went with blocking/throttling IPs/blocks elsewhere. I'm not sure how the current rate-limiting code can handle bad users without impacting good users if they're using the same calls.

@landreev
Copy link

Same here (at IQSS), we haven't been using it much because of how much of a nuclear option it is, with all the human guest users being the collateral damage. But it is something we still consider as a last resort/serious emergency option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants