-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[10.x] The validator is very slow with large datasets when the key contains asterisks #49375
Comments
Thanks but I just don't think these are realistic examples, sorry. Sending that much data at once through the validator will slow things down. There are other ways to work with data and validate it. Of course, if you could ever find out ways to improve validation speed, we'd be open to a PR. |
Thank you very much for your reply @driesvints. In any case, with all due respect to you and your work, to say that the validator of the best PHP framework does not work with a large amount of data (10k rows I don't know if it can be considered a large amount of data) is to disregard the problem that clearly exists in the code. The chunk examples show that there really is an n^n problem that in my opinion, should be fixed. As realistic examples, I'm using the validator to validate CSV and Excel sheets, also to accept product listings in ecommerce import systems, bulk pricing updates, log analysis, etc... Right now I am passing that same validation to a validation of my own, but that does exactly the same job that the framework system should do. Again, thank you very much for your great work, and I think it would be a good idea to reconsider the decision to close the issue, or at least take a look at it to see what can be done. Best regards! |
The performance problems aren't just because of a big file. I have a json object with only 50 items in it, but a lot of fields to be validated and I can see also the same behaviour. It would be really awesome if you could considerate this and take a look if it can be improved. |
Having the same issue when validating csv files with a few hundred lines. I feel like there is some room for performance improvements in the validator. I agree with @eusonlito please reconsider this issue as it is a realistic example |
Encountered the same problem in the past with an API response that just had 7k rows and needed to be validated to ensure its valid against the spec. The performance was sometimes so slow that the execution was stopped because of @driesvints Are these enough realistic example to reopen it and let the community think of a solution? It triggers many people and I guess each one wasted hours to research the issue (in production) and many more hours trying to search for a problem. And in the end abandoning Laravel's validation functionality for this part. |
referencing this PR #49386 here |
to close the circle, @eusonlito, with the improved
|
Look everyone. For performance improvements we really just appreciate any PR's you could send. It seems @bastien-phi did just that and already sent in a PR which seems to help. If there's any additional speed improvements to be made we'd appreciate more PR's with proper benchmarks.
@tpetry I just don't feel this is a realistic example sorry. 7K rows for an API response is just too much and should be paginated. Same for 10K row excel, CSV tables, etc. These should be batched and the validations of these rows should be aggregated in a result set. I know everyone in this thread feels I'm a bit quick to judge here but it seems we don't see eye to eye here and just have a difference of opinion. That's fine and the best thing you can do in this situation is either attempt a PR to improve performance if there's any bottlenecks or try and find a different way to solve the problem for your use case. We don't go into performance improvements ourselves as we have other priorities/work of our own. In these situations we appreciate an effort from the community. I also realise most of you think I'm wrong in this situation but I'd appreciate it if you also respected my opinion even if it doesn't aligns with your own (mostly referring to people who downvoted me). I appreciate all of you who replied with thoughtful responses here in this thread. |
@driesvints I'm sorry if you feel like downvoting you was disrespectful, it was not the intention. I apologize. When I downvoted your feedback it was only to show that I indeed disagree, according to the information we have (that it seemed to be an exponential complexity). However I don't expect you to change your mind based on downvotes and say "We'll look at it and fix it", I understand the core team has other priorities. 🙏 |
Hej Folks, I'm the guy who figured out the issue with I spent a little time those days solving the second bottleneck and I was able to boil it down to a few lines of code within the protected function explodeWildcardRules($results, $attribute, $rules)
{
$pattern = str_replace('\*', '[^\.]*', preg_quote($attribute, '/'));
$data = ValidationData::initializeAndGatherData($attribute, $this->data);
foreach ($data as $key => $value) {
if (Str::startsWith($key, $attribute) || (bool) preg_match('/^'.$pattern.'\z/', $key)) {
foreach ((array) $rules as $rule) {
if ($rule instanceof NestedRules) {
$compiled = $rule->compile($key, $value, $data);
$this->implicitAttributes = array_merge_recursive(
$compiled->implicitAttributes,
$this->implicitAttributes,
[$attribute => [$key]]
);
$results = $this->mergeRules($results, $compiled->rules);
} else {
$this->implicitAttributes[$attribute][] = $key;
$results = $this->mergeRules($results, $key, $rule);
}
}
}
}
return $results;
} mainly I've created a simple benchmark within public function testAsterisk()
{
$subset = [];
for ($i=0; $i < 550; $i++) {
$subset[] = [
'id' => $i,
'name' => 'Name ' . $i,
'email' => 'email' . $i . '@example.com',
'date' => now()->addDays($i)->format('Y-m-d'),
'another_sub_array' => [
'street' => 'Street ' . $i,
'city' => 'City ' . $i,
'country' => 'Country ' . $i,
],
];
}
$data = ['items' => $subset];
$rules = [
'items.*' => Rule::forEach(function () {
return [
'id' => 'integer',
'name' => ['required', 'string'],
'email' => ['required', 'email'],
'date' => ['required', 'date'],
];
}),
];
$trans = $this->getIlluminateArrayTranslator();
// $v = new Validator($trans, $data, $rules);
$v = new Validator($trans, $data, []);
$start = microtime(true);
$res = (new ValidationRuleParser($data))->explode($rules);
// dd($res);
dd(microtime(true) - $start);
$this->assertFalse($v->fails());
} If one of you can figure out how to improve this code, I'll do my best to contribute all my knowledge. Maybe it is the countless calls of if we feed If we feed the validator with the precompiled ruleset I learned this trick from the amazing @calebporzio, who did this trick in livewire https://github.com/livewire/livewire/blob/2.x/src/ComponentConcerns/ValidatesInput.php#L217-L238. |
I think the |
I am wondering, if instead calling $mergeArrays = $compiled->implicitAttributes + $this->implicitAttributes + [$attribute => [$key]];
$this->implicitAttributes = array_merge_recursive(...$mergeArrays); Could the array union become a problem? |
ArgumentCountError: array_merge_recursive() does not accept unknown named parameters |
Thank you for your help, I couldnt try it from my vacation, but I am really looking forward for a performance improvement in this, as I always have problems with the validator. EDIT: This is what I have done and performance seems to be a lot better, even when I am using the asterisk, but I am not using the // $rules = [
// 'items.*' => Rule::forEach(function () {
// return [
// 'id' => 'integer',
// 'name' => ['required', 'string'],
// 'email' => ['required', 'email'],
// 'date' => ['required', 'date'],
// ];
// }),
// ];
$rules = [
'items' => ['required', 'array'],
'items.*.id' => ['integer'],
'items.*.name' => ['required', 'string'],
'items.*.email' => ['required', 'email'],
'items.*.date' => ['required', 'date'],
]; |
I noticed that validating an array with 1000 rows takes about 1 minute, but validating them one row at a time only takes about 3 sec. There are no rules that require thing from multiple rows it it feels like there should be a way to improve this, though I have no idea what part of the code is causing the slow down. |
When validating large data such as CSV data, the following tips can greatly improve the processing speed.
When I validate large data in production, I use my own |
@333sec can you share your |
For those finding this thread on Google because they are sending large payloads over HTTP, let me share an idea and a quick benchmark: Skip the Laravel validation and, instead, new up native PHP objects. The difference is remarkable. <?php
declare(strict_types=1);
readonly class RoomBulkUploaderSpeedTest {
public function __construct(public int $room_id, public string $external_id)
{
throw_if($this->room_id < 0, new \DomainException("Example of a validation rule"));
}
}
$payload = array_map(fn($i) => ['room_id' => $i, 'external_id' => "string_{$i}"], range(1, 5000));
\Illuminate\Support\Benchmark::dd([
'validation' => fn() => \Illuminate\Support\Facades\Validator::validate([
'array' => $payload,
], [
'array.*.room_id' => 'required|integer',
'array.*.external_id' => 'required|string',
]),
'newing_up' => fn() => array_map(fn($i) => new RoomBulkUploaderSpeedTest(...$i), $payload)
], 1); "validation" => "636.862ms" If you want a 422 instead of a 500, you can catch the exception thrown in this code and return a 422. Depending on your use case, you might omit or be sure to include |
well you loose the power of validation. instead of [
'array.*.room_id' => 'required|integer',
'array.*.external_id' => 'required|string',
]```
explode the `*` your self.
```php
$rules = array_map(fn ($i) => [
"array.{$i}.room_id" => 'required|integer',
"array.{$i}.external_id" => 'required|string',
], range(1, 5000)); it will validate way faster. the slow part in validation is to resolve the readonly class RoomBulkUploaderSpeedTest
{
public function __construct(public int $room_id, public string $external_id)
{
throw_if(
$this->room_id < 0,
new \DomainException("Example of a validation rule")
);
}
}
$rules = array_map(
fn($i) => [
"array.{$i}.room_id" => "required|integer",
"array.{$i}.external_id" => "required|string"
],
range(1, 5000)
);
$payload = array_map(
fn($i) => ["room_id" => $i, "external_id" => "string_{$i}"],
range(1, 5000)
);
\Illuminate\Support\Benchmark::dd(
[
"validation" => fn() => \Illuminate\Support\Facades\Validator::validate(
[
"array" => $payload
],
$rules
),
"newing_up" => fn() => array_map(
fn($i) => new RoomBulkUploaderSpeedTest(...$i),
$payload
)
],
1
); not blazing fast, but way faster. |
This is what I came up with it goes from 60 seconds to 10 seconds with a large dataset. (still bad) public static function optimizeRules(array $rules, array $data): array
{
$optimizedRules = [];
foreach ($rules as $ruleKey => $ruleValue) {
$explodedRuleKey = explode('.', $ruleKey);
self::recursiveRuleBuilder($explodedRuleKey, $data, '', $ruleValue, $optimizedRules);
}
return $optimizedRules;
}
private static function recursiveRuleBuilder($ruleParts, $data, $currentKey, $ruleValue, &$optimizedRules)
{
$part = array_shift($ruleParts);
if (empty($ruleParts)) {
$optimizedRules[$currentKey . $part] = $ruleValue;
return;
}
if ($part === '*') {
foreach ($data as $key => $value) {
self::recursiveRuleBuilder($ruleParts, $data[$key], $currentKey . $key . '.', $ruleValue, $optimizedRules);
}
} elseif (isset($data[$part])) {
self::recursiveRuleBuilder($ruleParts, $data[$part], $currentKey . $part . '.', $ruleValue, $optimizedRules);
}
} |
@eusonlito , which validtor you are using instead ? |
Laravel Version
10.37.3
PHP Version
8.3.0
Database Driver & Version
No response
Description
As reported on previous Laravel versions, the laravel Validator is very very slow with large datasets:
I have tried to understand and review the validation process, but it is too complex to be able to send a PR. I could only find that it iterates thousands of times over all values when calculating the validation rules. The problem seems to come only when using asterisks in the fields to validate.
Here a tweet with some tests: https://twitter.com/lito_ordes/status/1735082009666494594
And the repo to test: https://github.com/eusonlito/laravel-validator-slow
Steps To Reproduce
Tests Results
Added memory usage and chunks
The text was updated successfully, but these errors were encountered: