Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Validating large arrays are too slow #2212

Open
halaei opened this issue Apr 28, 2020 · 15 comments
Open

Validating large arrays are too slow #2212

halaei opened this issue Apr 28, 2020 · 15 comments

Comments

@halaei
Copy link

halaei commented Apr 28, 2020

  • Laravel Version: 7.5.2
  • PHP Version: 7.2.29
  • Database Driver & Version: None

Description:

Validating large arrays using Laravel is too slow. For example, validating an array of 30000 records takes 35 seconds, while validating an array of 50000 records takes 2 minutes and 4 seconds. Validating invalid arrays is slightly slower.

Steps To Reproduce:

Artisan::command('profile', function () {
    \Illuminate\Support\Facades\Validator::validate([
        'array' => range(1, 50000),
    ], [
        'array.*' => 'required|integer',
    ]);
});

A PR in 7.x is rejected because of backward compatibility concerns (see laravel/framework#32532 ). I send the issue here to hope for further successful optimization in 8.0, in case I can't get an accepted PR. Also documentations about time complexity of validation is highly appreciated.

@driesvints driesvints transferred this issue from laravel/framework Apr 28, 2020
@driesvints
Copy link
Member

Moving this to the ideas repo as it's an improvement rather than a bug.

@mfn
Copy link

mfn commented Apr 28, 2020

What's the use case to validate such big arrays and why has it to be the Laravel validator?

@halaei
Copy link
Author

halaei commented Apr 28, 2020

Thanks for the move. But still I think it has important impacts on applications that accept arrays from user and pass them to validation. So that is why I consider it as a very dangerous bug too.

@driesvints
Copy link
Member

Tbh I think if you're using such big arrays that you have different problems. I don't know your use case but it's better to batch them up or try a different approach.

@halaei
Copy link
Author

halaei commented Apr 28, 2020

@mfn I have no use-case. But an attacker can send small number of requests per minute with large arrays to application to take the server down. Checking the size of array inside the same validation that checks for items does not help either.
In other words size should be checked before validating items.

@halaei
Copy link
Author

halaei commented Apr 28, 2020

@driesvints did you know you must validate size of array before validating items so no performance issue happen in case of invalidly large inputs? I didn't know, so that is why I think it is a bug.

@driesvints
Copy link
Member

That should be done in userland imo.

@halaei
Copy link
Author

halaei commented Apr 28, 2020

@driesvints have you ever done that before passing data to validation? How likely is that other Laravel developers know that?

@mfn
Copy link

mfn commented Apr 28, 2020

The idea about the limit makes sense to me!

@halaei
Copy link
Author

halaei commented Apr 28, 2020

Also breaking the input into small batches and validating each batch separately then merging the results sounds like an interesting idea. It probably speed up things a lot. Maybe that could be easily done inside framework without even a breaking change.

@joelharkes
Copy link

joelharkes commented May 9, 2020

@halaei that's exactly what we are doing to.

We do bulk data uploads, to make sure of integrity of course they should be validated.

Only downside now is if half way through there is an issue in the data, half is already processed.

But laravel duplicates all rules to match the amount of items in the request. That is why it is so mega slow. Everything around the validator is horrible design in laravel imo.

Honestly I don't get why sub-requests/list request is not a better supported feature. I don't think it's a niche because it often makes processes much more efficient.

What is horrible about laravel validator:

  • unclear how *. actually works, or at least not intuitive
  • bad performance for bulk uploads
  • no type safety: you can validate on integer and still get a string same for booleans.
  • relies on random byte string to validate wether null is an uploaded value or means the value was not provided in the request.
  • half the rules are functions on the validator other half of the rules are sub classes of rule. (Inconsistency).

@duxthefux
Copy link

@halaei could you try benchmarking your code against the PR I created laravel/framework#33705

would be interested if it gets you a better performance as well.

@halaei
Copy link
Author

halaei commented Aug 8, 2020

@duxthefux my benchmark is really simple. Just run the code in the issue description above and time it using microtime() function once with your pr and once without.
Also note that there are a couple of places in the code causing it to be slow. In order to find what slows down the process the most you may use a profiling tool like https://github.com/adsr/phpspy

@tpetry
Copy link

tpetry commented Feb 24, 2021

I was now researching why we got a php timeout in our application. We're validating the api response of an webservice we're calling whether the returned data is valid. A simple rule ('events.*.timestamp' => ['required']) now crashed the whole request because the array has 7000 values.

But checking 7000 array elements for a required attribute is now problem, php is doing this faster than a millisecond. But laravel's validation parser is doing a lot of work for these 7000 arrays.

@ernst77
Copy link

ernst77 commented Feb 26, 2021

Yeah, we have the same issue, had to ditch all laravel validators because php handles in no time but laravel either gives us timeout or 500 error... Don't know what's happening inside but it seems broken, as few simple if's in controller do the same validation in miliseconds.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants