-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ZEND_ARRAY_KEY_EXISTS opcode to speed up array_key_exists() #3360
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on! I've left a few implementation notes, in particular in regard to how performance may be improved.
Zend/zend_vm_def.h
Outdated
subject = GET_OP2_ZVAL_PTR(BP_VAR_R); | ||
key = GET_OP1_ZVAL_PTR(BP_VAR_R); | ||
|
||
if (UNEXPECTED(Z_ISREF_P(subject))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more efficient (and for VM code, more idiomatic) way to handle this is to check for ISREF prior to the error case and use a goto try_again
pattern. See https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L4419 for an example.
Zend/zend_vm_def.h
Outdated
|
||
SAVE_OPLINE(); | ||
|
||
subject = GET_OP2_ZVAL_PTR(BP_VAR_R); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferable to use GET_OP2_ZVAL_PTR_UNDEF
here and check the UNDEF case for IS_CV inside the branch that generates the type error. For an example of delayed UNDEF handling see https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L59.
Zend/zend_vm_def.h
Outdated
zend_error(E_WARNING, "array_key_exists(): The first argument should be either a string or an integer"); | ||
result = 0; | ||
} | ||
ZVAL_BOOL(EX_VAR(opline->result.var), result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code between IS_ARRAY and IS_OBJECT is duplicated here (which is actually rather dubious and will handle some cases incorrectly, but that's an existing issue and best solved by deprecating array_key_exists on objects entirely). It would be better to only extract the hashtable using Z_ARRVAL_P and Z_OBJPROP_P in these branches and have only one copy of the common key handling code afterwards.
@@ -2394,6 +2394,7 @@ static int zend_update_type_info(const zend_op_array *op_array, | |||
case ZEND_ISSET_ISEMPTY_STATIC_PROP: | |||
case ZEND_ASSERT_CHECK: | |||
case ZEND_IN_ARRAY: | |||
case ZEND_ARRAY_KEY_EXISTS: | |||
UPDATE_SSA_TYPE(MAY_BE_FALSE|MAY_BE_TRUE, ssa_ops[i].result_def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_key_exists() can also return null for the zpp failure case.
Additionally, ARRAY_KEY_EXISTS needs to be handled in https://github.com/php/php-src/blob/master/ext/opcache/Optimizer/sccp.c#L1328. Looking at the ISSET_ISEMPTY_DIM_OBJ case should provide inspiration here.
|
||
FREE_OP2(); | ||
FREE_OP1(); | ||
ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_key_exists() is a function that would benefit from smart branch handling. See https://github.com/Majkl578/php-src/blob/2265ca4f783b9e5e3a575ebe8c3b1f21f3009015/Zend/zend_vm_def.h#L8085 how this is done for ZEND_IN_ARRAY. You'll also have to add it to https://github.com/php/php-src/blob/master/Zend/zend_compile.c#L2181.
@nikic Thanks for your review 🥇, I'll try to address it later this week. |
Also you should just revert the change to NEWS, usually the merger of the PR will do this part =) |
For this PR an (additional) entry in UPGRADING appears to be appropriate (UPGRADING usually serves as base for the migration guide in the PHP manual, but NEWS is likely to be overlooked, and this improvement seems to be quite noteworthy). |
Although isset and array_key_exists behavior slightly different, but I think in most cases, array_key_exists can be repalced with isset.. thus, I doubt such optimization is worthy to do... |
@laruence array_key_exists() is common in foundational libraries that need to support null values. These are often very hot code paths and currently the usual way to handle them is |
@cmb69 Currently we usually do not include performance changes in UPGRADING, unless they come with some observable behavior change. But I think it's a good idea to start doing this, maybe under a new "Performance Improvements" section. We do tend to have a fair bit of improvements in each release, but they aren't really called out anywhere. |
@nikic Even if we won't document performance changes, I still think this very change should be documented, since it can affect how userland code will be written. You already gave a typical example which could be improved wrt. this PR. |
By the way, it would be nice to have a comment above the code for the normal function |
2265ca4
to
6a76ae4
Compare
6a76ae4
to
4ff60bb
Compare
@nikic I adressed all your comments, except one: SCCP for this new opcode, since it's a bit beyond my current knowledge of PHP internals. May I ask you or anyone else to help me with this one? Thanks! |
gcc-8 -O0
gcc-8 -O3
@nikic Can you please another round of review? Thanks. |
4ff60bb
to
16c5a7b
Compare
/cc @cmb69 for whether this can still land in 7.3. It introduces a new opcode, not sure what our policy is there. I guess as it may require further extension changes (xdebug support etc) probably too late? |
Zend/zend_vm_def.h
Outdated
|
||
ZEND_VM_C_LABEL(try_again): | ||
|
||
if (UNEXPECTED(Z_ISREF_P(subject)) || UNEXPECTED(Z_ISREF_P(key))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite what I had in mind regarding the reference handling. The idea is to do something like this:
ZEND_VM_C_LABEL(try_again_subject):
if (EXPECTED(Z_TYPE_P(subject) == IS_ARRAY)) {
ht = Z_ARRVAL_P(subject);
} else if (UNEXPECTED(Z_TYPE_P(subject) == IS_OBJECT)) {
ht = Z_OBJPROP_P(subject);
} else if (Z_TYPE_P(Z_ISREF_P(subject)) {
subject = Z_REFVAL_P(subject);
ZEND_VM_C_GOTO(try_again_subject);
} else {
if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(key) == IS_UNDEF)) {
key = GET_OP1_UNDEF_CV(key, BP_VAR_R);
}
if (OP2_TYPE == IS_CV && UNEXPECTED(Z_TYPE_INFO_P(subject) == IS_UNDEF)) {
subject = GET_OP2_UNDEF_CV(subject, BP_VAR_R);
}
zend_internal_type_error(EX_USES_STRICT_TYPES(), "array_key_exists() expects parameter 2 to be array, %s given", zend_get_type_by_const(Z_TYPE_P(subject)));
FREE_OP2();
FREE_OP1();
ZEND_VM_SMART_BRANCH(result, 0);
ZVAL_NULL(EX_VAR(opline->result.var));
ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();
}
The point is that the check for the reference is only done after the most likely case (subject is an array) is already handled, so that rare cases (references) don't have a runtime impact on common ones (array).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, got it!
Zend/zend_vm_def.h
Outdated
zend_internal_type_error(EX_USES_STRICT_TYPES(), "array_key_exists() expects parameter 2 to be array, %s given", zend_get_type_by_const(Z_TYPE_P(subject))); | ||
FREE_OP2(); | ||
FREE_OP1(); | ||
ZEND_VM_SMART_BRANCH(result, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe result
is uninitialized here. It should be fine to just drop the smart branch line here, as this is a rare error case anyway.
16c5a7b
to
bc6db1f
Compare
Regarding PHP 7.3: I cannot assess how many extensions and SAPIs (phpdbg?) would be affected, and how much work it would be to update them, but given that Xdebug is not ready for PHP 7.3 yet, and the long UPGRADING.INTERNALS it might indeed be better to delay this to PHP 7.4. Otherwise this issue should be discussed on internals@. |
I only quickly checked code of XDebug and PHPDBG and found no usages of i.e. ZEND_COUNT or ZEND_IN_ARRAY (which is basically the same thing). Of course I don't know if there are any indirect implications for these tools, but it doesn't seem so, unlike i.e. new syntax. |
(Waiting on author label can be removed.) |
bc6db1f
to
c9a918b
Compare
Alright, retargeted to master/7.4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine for me and doesn't require RFC.
It may be merged after renaming GET_OP?_UNDEF_CV(...) into ZVAL_UNDEFINED_OP?().
I may take care about some final tuning after the merging.
Zend/zend_vm_def.h
Outdated
result = zend_symtable_exists_ind(ht, ZSTR_EMPTY_ALLOC()); | ||
} else if (Z_TYPE_P(key) <= IS_NULL) { | ||
if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(key) == IS_UNDEF)) { | ||
GET_OP1_UNDEF_CV(key, BP_VAR_R); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GET_OP1_UNDEF_CV() and GET_OP2_UNDEF_CV() have to be changed into ZVAL_UNDEFINED_OP1() and ZVAL_UNDEFINED_OP2().
Zend/zend_vm_def.h
Outdated
} else if (EXPECTED(Z_TYPE_P(key) == IS_LONG)) { | ||
result = zend_hash_index_exists(ht, Z_LVAL_P(key)); | ||
} else if (UNEXPECTED(Z_TYPE_P(key) == IS_NULL)) { | ||
result = zend_symtable_exists_ind(ht, ZSTR_EMPTY_ALLOC()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be zend_hash_exists_ind(). "" is not numeric.
Not a best place to report about the problem, but thanks anyway. |
@Majkl578 do you like me to finalize and merge this? |
I'll handle the feedback over next few days. 👍 |
9d43818
to
f473132
Compare
@dstogov Adressed your comments and also rebased again. 👍 |
result = zend_hash_exists_ind(ht, ZSTR_EMPTY_ALLOC()); | ||
} else if (Z_TYPE_P(key) <= IS_NULL) { | ||
if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(key) == IS_UNDEF)) { | ||
ZVAL_UNDEFINED_OP1(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be OP2_TYPE ans ZVAL_UNDEFINED_OP2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems, you changed OP1->OP2 in wrong place. I'll take care about this and merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my bad. Everything is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key is OP1 and haystack is OP2 so it should be fine.
https://github.com/php/php-src/compare/9d438180daf644b9937ba4177c27b7e9e95e05cf..f4731326d340d9f67d608201a95ef182af223db2#diff-3054389ad750ce9a9f5895cd6d27800fR6400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. My mistake. It's already merged into master.
I've added the perf improvement section for UPGRADING which I mentioned above: f1c0e67 |
Make opcache replace the result with false if the array argument is known to be empty. This may be useful when a codebase has placeholders, e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }` In zend_inference.c: In php 8, array_key_exists will throw a TypeError instead of returning null. I didn't see any discussion of this optimization (for/against) after a quick search on github, e.g. phpGH-3360 Potential future optimizations: - convert `in_array($needle, ['only one element'], true)` to `===`? (or `==` for strict=false) - When the number of elements is less than 4, switch to looping instead of hash lookup. (exact threshold for better performance to be determined) Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
Make opcache replace the result with false if the array argument is known to be empty. This may be useful when a codebase has placeholders, e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }` In zend_inference.c: In php 8, array_key_exists will throw a TypeError instead of returning null. I didn't see any discussion of this optimization (for/against) after a quick search on github, e.g. phpGH-3360 Potential future optimizations: - convert `in_array($needle, ['only one element'], true)` to `===`? (or `==` for strict=false) - When the number of elements is less than 4, switch to looping instead of hash lookup. (exact threshold for better performance to be determined) Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
Besides being faster (since php/php-src#3360), array_key_exists is also safer as it narrows down key existence to arrays only and performs type-checking on both parameters.
Make opcache replace the result with false if the array argument is known to be empty. This may be useful when a codebase has placeholders, e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }` In zend_inference.c: In php 8, array_key_exists will throw a TypeError instead of returning null. I didn't see any discussion of this optimization (for/against) after a quick search on github, e.g. phpGH-3360 Potential future optimizations: - convert `in_array($needle, ['only one element'], true)` to `===`? (or `==` for strict=false) - When the number of elements is less than 4, switch to looping instead of hash lookup. (exact threshold for better performance to be determined) Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
Make opcache replace the result with false if the array argument is known to be empty. This may be useful when a codebase has placeholders, e.g. `if (!in_array($method, self::ALLOWED_METHODS)) { return; }` In zend_inference.c: In php 8, array_key_exists will throw a TypeError instead of returning null. I didn't see any discussion of this optimization (for/against) after a quick search on github, e.g. phpGH-3360 Potential future optimizations: - convert `in_array($needle, ['only one element'], true)` to `===`? (or `==` for strict=false) - When the number of elements is less than 4, switch to looping instead of hash lookup. (exact threshold for better performance to be determined) Also support looping for `in_array($value, [false, 'str', 2.5], true/false)`
This change makes
array_key_exists()
actually faster thanisset()
by ~25% (tested with GCC 8, -O3, march=native, mtune=native).addresses https://bugs.php.net/bug.php?id=76148
addresses https://bugs.php.net/bug.php?id=71239
With a synthetic benchmark:
It yields results like:
7.2 (unpatched):
7.3 + patch: