Adjust hex/octal string decoding #627

GreyWyvern · 2023-08-04T19:33:37Z

Add a second check to be sure a string is hexadecimal before applying the pack() function. This ensures we avoid illegal hex digit and resolves #499

PdfParser currently only decodes triple digit escaped octal codes, when single, double and triple digits are all allowed. See PDF Reference 1.7 Section 3.2 Objects (page 55): https://ia801001.us.archive.org/1/items/pdf1.7/pdf_reference_1-7.pdf

Modify the regexp to search for escaped octal codes from one to three digits, and exclude escaped backslashes. In sections of text that aren't escaped octal codes, un-escape backslashes and parentheses as described in PDF Reference 1.7 Section 3.2 Table 3.2. This resolves #470

Adjust the unit test testDecodeOctal() to escape the valid octal code \\1 so that the output matches the existing expected value AB \199.

Add a second check to be sure a string is hexadecimal before applying the `pack()` function. This ensures we avoid `illegal hex digit` and resolves smalot#499 PdfParser currently only decodes triple digit escaped octal codes, when single, double and triple digits are all allowed. See PDF Reference 1.7 Section 3.2 Objects (page 55): https://ia801001.us.archive.org/1/items/pdf1.7/pdf_reference_1-7.pdf Modify the regexp to search for escaped octal codes from one to three digits, and exclude escaped backslashes. In sections of text that aren't escaped octal codes, un-escape backslashes and parentheses as described in PDF Reference 1.7 Section 3.2 Table 3.2. This resolves smalot#470 Adjust the unit test `testDecodeOctal()` to escape the valid octal code `\\1` so that the output matches the existing expected value `AB \199`.

k00ni added fix de-/encoding issue labels Aug 5, 2023

k00ni merged commit f97e38c into smalot:master Aug 7, 2023

GreyWyvern deleted the hexa-octal branch August 8, 2023 13:13

GreyWyvern mentioned this pull request Aug 21, 2023

Incorrect output for some non UTF-8 characters #584

Open

This was referenced Sep 19, 2023

Major Update to PDFObject.php + Ancillary #634

Merged

Better octal and hex-entity decode #640

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust hex/octal string decoding #627

Adjust hex/octal string decoding #627

GreyWyvern commented Aug 4, 2023

Adjust hex/octal string decoding #627

Adjust hex/octal string decoding #627

Conversation

GreyWyvern commented Aug 4, 2023