Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF] Use random_bytes instead of uniqid/rand for random hex strings #32205

Merged
merged 1 commit into from
Feb 25, 2025

Conversation

Sjord
Copy link
Contributor

@Sjord Sjord commented Feb 24, 2025

Using random_bytes is both faster and more secure than md5(uniqid(rand(), TRUE)). It is possibly also easier to read, in the sense that it is more obvious that it returns hexadecimal encoded random bytes.

I did not find an instance where guessing the random identifier would result in a security vulnerability. So this change does not have direct security impact as far as I know. It's more of a best practice thing and I hope people copy paste the new, secure way of generating random bytes when creating identifiers for security-sensitive stuff, instead of copying the old, insecure way.

In some test files the lengths of the random strings are one character longer. E.g. I replaced substr(sha1(rand()), 0, 7) with bin2hex(random_bytes(4)). The length did not seem very important here, so I don't think this matters.

I haven't tested all changed code. I rely on unit tests, and that the code generates a random hex string of a certain length before and after I replaced it.

I also looked into the SQL statements that use MD5(RAND()). These should be replaced by HEX(RANDOM_BYTES()), but this is only available starting in MariaDB 10.10, and we require 10.2.

Copy link

civibot bot commented Feb 24, 2025

🤖 Thank you for contributing to CiviCRM! ❤️ We will need to test and review this PR. 👷

Introduction for new contributors...
  • If this is your first PR, an admin will greenlight automated testing with the command ok to test or add to whitelist.
  • A series of tests will automatically run. You can see the results at the bottom of this page (if there are any problems, it will include a link to see what went wrong).
  • A demo site will be built where anyone can try out a version of CiviCRM that includes your changes.
  • If this process needs to be repeated, an admin will issue the command test this please to rerun tests and build a new demo site.
  • Before this PR can be merged, it needs to be reviewed. Please keep in mind that reviewers are volunteers, and their response time can vary from a few hours to a few weeks depending on their availability and their knowledge of this particular part of CiviCRM.
  • A great way to speed up this process is to "trade reviews" with someone - find an open PR that you feel able to review, and leave a comment like "I'm reviewing this now, could you please review mine?" (include a link to yours). You don't have to wait for a response to get started (and you don't have to stop at one!) the more you review, the faster this process goes for everyone 😄
  • To ensure that you are credited properly in the final release notes, please add yourself to contributor-key.yml
  • For more information about contributing, see CONTRIBUTING.md.
Quick links for reviewers...

➡️ Online demo of this PR 🔗

@civibot civibot bot added the master label Feb 24, 2025
Using random_bytes is both faster and more secure than md5(uniqid(rand(), TRUE)). It is possibly also easier to read, in the sense that it is more obvious that it returns hexadecimal encoded random bytes.

I did not find an instance where guessing the random identifier would result in a security vulnerability. So this change does not have direct security impact as far as I know. It's more of a best practice thing and I hope people copy paste the new, secure way of generating random bytes when creating identifiers for security-sensitive stuff, instead of copying the old, insecure way.

In some test files the lengths of the random strings are one character longer. E.g. I replaced `substr(sha1(rand()), 0, 7)` with `bin2hex(random_bytes(4))`. The length did not seem very important here, so I don't think this matters.

I haven't tested all changed code. I rely on unit tests, and that the code generates a random hex string of a certain length before and after I replaced it.

I also looked into the SQL statements that use MD5(RAND()). These should be replaced by HEX(RANDOM_BYTES()), but this is only available starting in MariaDB 10.10, and we require 10.2.
@totten
Copy link
Member

totten commented Feb 25, 2025

Woot. This looks pretty good to me.

I read 85-95% of the diffs, and I spot-checked (copied/executed) several of the earlier ones (which affect runtime code) -- and confirmed the resulting outputs were similar

In all the cases where I saw elongated outputs, it seemed to be generating a test-identifier (not runtime code). In fact, a huge proportion of updates were test-related. So as long as those tests pass, they're good.

(Aside: Thank goodness for random_bytes() in PHP 7+. I believe these hacky formulas all originated circa PHP 5.x, where it was hard to get reliable source of entropy.)

@totten totten merged commit 7bf3b31 into civicrm:master Feb 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants