Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "R" (Reference) type #18

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kaorukobo
Copy link
Contributor

While the "R" (Reference) type rarely appears in real-world PHP serialization, certain applications may use it. (at least I met it and saw Unable to unserialize type 'R' error.)

Serialization — PHP Internals Book

$a[1] =& $a[0];

a:2:{i:0;s:3:"foo";i:1;R:2;}

This pull request introduces support for the "R" type by adding the PHP::Reference object,
which can be re-serialized to what it was before unserializing.

@jqr
Copy link
Owner

jqr commented Feb 24, 2024

In PHP unserializing that object results in an Array with two elements, both which are the the same object. Modifications to either element modify both elements.

php > $u = unserialize('a:2:{i:0;s:3:"foo";i:1;R:2;}');
php > print_r($u);
Array
(
    [0] => foo
    [1] => foo
)

php > $u[0] .= "bar";
php > print_r($u);
Array
(
    [0] => foobar
    [1] => foobar
)

Shouldn't the Ruby version do the same instead of making a PHP::Reference object?

ruby> u = PHP.unserialize('a:2:{i:0;s:3:"foo";i:1;R:2;}');
   => ["foo", "foo"]

ruby> u[0] << "bar"
   => "foobar"

ruby> u
   => ["foobar", "foobar"]

@kaorukobo
Copy link
Contributor Author

kaorukobo commented Feb 25, 2024

Reading your reply, I also noticed the level of support is crucially insufficient in production.
As shown in the example below:

  • it lacks support for "r" (represents repeating the same object, not a PHP reference = "R")
  • breaks a reference easily when we change the position of a referred object before re-serializing the entire data.
deserialized = PHP.unserialize(
    # $a = (object)[]; echo serialize([$a,$a]);
    'a:2:{i:0;O:8:"stdClass":0:{}i:1;r:2;}'     # <== needs support for "r"
)
deserialized.unshift "foo" # <== breaks `nth` of "r:nth"

I would try another PR or improve this branch.

@kaorukobo kaorukobo force-pushed the support-r-type branch 2 times, most recently from fcd0f49 to 9ba7966 Compare February 25, 2024 08:34
@kaorukobo
Copy link
Contributor Author

I have improved the patch. Could you take a look?

  • The first commit is a refactoring:
    • This patch needs more values in the state retained during serialization/deserialization. It will make the code very unreadable. So firstly I refactored the code by introducing a state object that is passed thru along recursion calls.
  • then the second commit is an essential change:
    • on PHP.unserialize:
      • Instead of introducing PHP::Reference, simply dereference R/r to the deserialized value before in the same session.
    • on PHP.serialize:
      • Generate an object reference ('r') for a recurring object instead of serializing it again.
      • The behavior is limited to Struct or objects having #to_assoc method. A simple value like String or a data structure like Array/Hash are dereferenced to a plain value. (that is, it does not generate R type.)
        • IMO, I cannot see advantages in holding non-object references inside a PHP array/object using =&... much less in retaining them in a serialized form.

@kaorukobo
Copy link
Contributor Author

Please be noted that the improved patch enables "Modifications to either element modify both elements."

def test_reference_of_object
  # ...
  assert_same unserialized[0], unserialized[1]

@kaorukobo
Copy link
Contributor Author

Rebased on v1.4.1

@nonnenmacher
Copy link

do we have an update on this PR if it pass 'tests' ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants