-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add font fallback + Support for font IDs containing hyphens #614
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If a text stream is "decoded" and contains UTF-8 control characters, it probably wasn't decoded using the proper font code page. Add a loop that cycles through all the available fonts to see if there's a better decode choice. Resolves Issue 586. As well, add the ability to parse font IDs containing dashes (-). Resolves Issue 145
Simplify these tests in case future edits change spacing rules.
k00ni
requested changes
Jul 13, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its good to see you are still with us!
Just a few remarks/questions.
This was referenced Jul 13, 2023
Let PCRE handle the conversion rather than PHP. Hopefully fixes PHPStan complaints about null byte.
k00ni
reviewed
Jul 14, 2023
Remove the Font ID with hyphen test case PDF as we could not contact the submitter to get permission to use it. Change the unit test to directly test if a Font ID with a hyphen is correctly parsed.
Add one more test for font-fallback. This addition also resolves smalot#495. Catches situations where a null byte \x00 may not be found by preg_match in a unicode context. Null bytes in the text string usually means that a CIDMap encoded string has been passed through as UTF-8 bytes without being translated by any matching CIDMap pairs.
Closed
Closed
Are you done here @GreyWyvern? |
k00ni
approved these changes
Jul 26, 2023
Yes, sorry. I was on vacation this week. :) |
All good, hope you had a good one. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If part of a text stream is positioned after an incorrect font command, then undecoded, jumbled bytes will appear in the
getText()
output. This change adds code to check this output for UTF-8 control characters (\x00-\x1f + \x7f) and if they appear, loop through all available fonts to see if we can find one that decodes this output properly. If none is found, the original string is used. Resolves #586.Also add support for font IDs containing hyphens. Previously these were ignored as invalid. Resolves #145.