Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where a % in a serialized array can lead to the data being broken #16694

Merged
merged 1 commit into from
Mar 12, 2020

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Mar 5, 2020

Overview

Fixes obscure dedupe bug

Before

Dedupe will fail on a contact with '%2' in their name

After

Dedupe can cope

Technical Details

Basically we have a contact in our DB who had a weird string (carriage returns & a .) in his name.
The dedupe script fell over on him because it could not unserialize what it had serialised for
his display name.

It turns out that is a field in a serialized array has a %2 (for example) this gets swapped in executeQuery for the
%2 value (in this case srcID - rendering the serialized array invalid. This proposes that we
explicitly handle arrays as a data type in compose query.

Some thoughts

we could make serialized arrays valid types in validate (not done here)
we could iterate through the array keys & values escaping them -
at this stage it's left in the calling function
there are whole bikeshed factories to keep in business on discussion of whether
'Array-1', 'Array-2' etc are the right format

---- notes below written before @pfigel pointed to where it was going awol-----

INSERT INTO civicrm_prevnext_cache (entity_table, entity_id1, entity_id2, cacheKey, data) VALUES
        ('civicrm_contact', 4, 5, 'merge_Individual_4_0_0_0_0', 'a:6:{s:5:"dstID";s:1:"4";s:7:"dstName";s:23:"Mr. Anthony Anderson II";s:5:"srcID";s:1:"5";s:7:"srcName";s:31:"Mr. Gerrit%0a4e%0a Anderson II";s:6:"weight";s:2:"10";s:8:"canMerge";b:1;}')

It then retrieves with this query


SELECT SQL_CALC_FOUND_ROWS pn.*
FROM   civicrm_prevnext_cache pn
       
      LEFT JOIN civicrm_dedupe_exception de
        ON (
          pn.entity_id1 = de.contact_id1
          AND pn.entity_id2 = de.contact_id2 )
       
        WHERE (pn.cachekey = 'merge_Individual_4_0_0_0_0') AND de.id IS NULL
       
 LIMIT 0, 1

And the value in data is

a:6:{s:5:"dstID";s:1:"4";s:7:"dstName";s:23:"Mr. Anthony Anderson II";s:5:"srcID";s:1:"5";s:7:"srcName";s:31:"Mr. Gerrit%0a4e%0a Anderson II";s:6:"weight";s:2:"10";s:8:"canMerge";b:1;}

which fails to unserialize here

if (self::is_serialized($dao->data)) {

Comments

@pfigel @ejegg @seamuslee001 if anyone has thoughts - this IS pretty obscure & I'm comfortable it's not a security hole

@civibot
Copy link

civibot bot commented Mar 5, 2020

(Standard links)

@pfigel
Copy link
Contributor

pfigel commented Mar 6, 2020

@eileenmcnaughton I've traced this a bit and I think this is what's going on:

The query in CRM_Core_BAO_PrevNextCache::setItem looks like this:

CRM_Core_DAO::executeQuery("INSERT INTO civicrm_prevnext_cache (entity_table, entity_id1, entity_id2, cacheKey, data) VALUES
        (%1, %2, %3, %4, '{$data}')", [
          1 => [$entity_table, 'String'],
          2 => [$entity_id1, 'Integer'],
          3 => [$entity_id2, 'Integer'],
          4 => [$cacheKey, 'String'],
        ]);

Note that the first 4 parameters use placeholders with escaping, while $data is passed in as-is. If $data contains something like %2 (like your example), CRM_Core_DAO::composeQuery() would replace that with the value of $entity_id1 (i.e. 4). This changes the length of the serialized field and thus breaks unserialize().

Not sure if there's a workaround for this or if we want to change the contract so that $data is expected as an unescaped value and use a placeholder for that as well? I imagine that might break a lot of callers (and we probably had a reason not to do that in the first place?).

I suppose we could add a $dataIsEscaped = FALSE param ... but that makes me feel like I need a shower.

@eileenmcnaughton
Copy link
Contributor Author

I've added a commit that fixes this based on digging by @pfigel into what the problem is.

Commit comments :

Fix bug where a % in a serialized array can lead to the data being broken

It turns out that is a field in a serialized array has a %2 (for example) this gets swapped in executeQuery for the
%2 value (in this case srcID - rendering the serialized array invalid. This proposes that we
explicitly handle arrays as a data type in compose query.

Some thoughts

  1. we could make serialized arrays valid types in validate (not done here)
  2. we could iterate through the array keys & values escaping them -
    at this stage it's left in the calling function
  3. there are whole bikeshed factories to keep in business on discussion of whether
    'Array-1', 'Array-2' etc are the right format

@eileenmcnaughton eileenmcnaughton changed the title argh what's the answer - Possible regression from serialize tightening Fix bug where a % in a serialized array can lead to the data being broken Mar 11, 2020
@pfigel
Copy link
Contributor

pfigel commented Mar 11, 2020

@eileenmcnaughton I don't fully understand why handling serialized values differently is necessary. I might be missing something, but for all intents and purposes, a serialized array should just be a string, so this should work:

CRM_Core_DAO::executeQuery("INSERT INTO civicrm_prevnext_cache (entity_table, entity_id1, entity_id2, cacheKey, data) VALUES
  ('civicrm_contact', %1, %2, %3, %4)", [
    1 => [$dstID, 'Integer'],
    2 => [$srcID, 'Integer'],
    3 => [$cacheKeyString, 'String'],
    4 => [serialize($row), 'String'],
  ]
);

For $row, I wouldn't apply CRM_Core_DAO::escapeString to any of the values - that would essentially double-escape them.

@seamuslee001
Copy link
Contributor

@pfigel the problem is that if the seralized content contains a %1 or %2 etc Execute Query would then add in the value of either the dstId or srcID etc

@eileenmcnaughton
Copy link
Contributor Author

@pfigel what you suggest would work I think - but generally where we have added serrialize handling for arrays we've taken into account the various 'serialize' types we have

$this->individualCreate(['first_name' => 'Gerrit%0a%2e%0a']);
$this->callAPISuccess('Job', 'process_batch_merge', []);
// As serialized
//a:6:{s:5:"dstID";s:1:"4";s:7:"dstName";s:23:"Mr. Anthony Anderson II";s:5:"srcID";s:1:"5";s:7:"srcName";s:31:"Mr. Gerrit%0a4e%0a Anderson II";s:6:"weight";s:2:"10";s:8:"canMerge";b:1;}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these get uncommented out?

@pfigel
Copy link
Contributor

pfigel commented Mar 11, 2020

the problem is that if the seralized content contains a %1 or %2 etc Execute Query would then add in the value of either the dstId or srcID etc

I think this is a problem specific to the current implementation, where $data is passed in as a variable as-is, rather than via a typed parameter. With the typed parameter, the implementation in CRM_Core_DAO::composeQuery would substitute %4 with $row without recursively substituting %2 with the second parameter - otherwise, we would be running into this issue all the time.

what you suggest would work I think - but generally where we have added serrialize handling for arrays we've taken into account the various 'serialize' types we have

I have some reservations on doing this in composeQuery. My expectation of what parameterized queries do is to encode values for use in an SQL query. Converting arrays to strings in PHP's serialization format (or JSON, etc.) feels a bit like scope creep for this method and something that should be the responsibility of the callee.

@eileenmcnaughton
Copy link
Contributor Author

@pfigel so I can generally accept that - where it would break down is if $row has any legitimate slashes in it as they would be escaped AFTER serialization

@pfigel
Copy link
Contributor

pfigel commented Mar 11, 2020

so I can generally accept that - where it would break down is if $row has any legitimate slashes in it as they would be escaped AFTER serialization

I think that's fine? Escaping the value is a way to encode the value so MySQL understands it. Take this example:

$x = serialize(['first_name' => 'foo\\bar\'baz']);

$x is now a:1:{s:10:"first_name";s:11:"foo\bar'baz";}.

When we run that through CRM_Core_DAO::escapeString, we end up with a:1:{s:10:\"first_name\";s:11:\"foo\\bar\'baz\";}. This is how MySQL expects the original value of $x to be encoded in an SQL statement, so what ends up being written to the table (and what you get back with a SELECT) is a:1:{s:10:"first_name";s:11:"foo\bar'baz";} - i.e. what we want.

Basically: any interaction with MySQL should always look like this (in really weird pseudo-code):

SELECT .... WHERE x = escape(something(somethingelse(value)))

and generally never:

SELECT .... WHERE x = something(escape(somethingelse(value)))

@eileenmcnaughton
Copy link
Contributor Author

@pfigel @seamuslee001 I added @pfigel's example to the test with @pfigel's suggestions - test passes so I think it's right

…oken

It turns out that is a field in a serialized array has a %2 (for example) this gets swapped in executeQuery for the
%2 value (in this case srcID - rendering the serialized array invalid. This proposes that we
explicitly handle arrays as  a data type in compose query.

Some thoughts
1) we could make serialized arrays valid types in validate (not done here)
2) we could iterate through the  array keys  & values escaping them -
at this stage it's left in the calling function
3) there are whole bikeshed factories to keep in business on discussion of whether
'Array-1', 'Array-2' etc are the right format
@seamuslee001
Copy link
Contributor

This looks good to me and its passing tests so lets merge this in

@seamuslee001 seamuslee001 merged commit 62cc144 into civicrm:master Mar 12, 2020
@seamuslee001 seamuslee001 deleted the dedupebug branch March 12, 2020 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants