-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding mismatches in text boxes with truncation #777
Comments
It looks like when That seems counter-intuitive to me ;) Example debugging output:
|
Ah, here's an alternative approach which appears to work. Instead of using text = text[0, (text.length - overflow.length - 1)]
text = "#{text}..." (The Of course the dots may still get truncated...but with |
This may be related to #603; I'm not sure. |
@airblade Thanks, I'll take a closer look at this in the next few days, hopefully. We have some lingering encoding issues (mainly on Ruby 1.9.3, but some affect all Ruby versions), and I'd like to get those sorted out if we can. |
@sandal Thanks. I'm on Ruby 1.9.3p286 and Prawn 1.0.0 – let me know if I can provide any further information. |
@airblade: Would it be possible for you to try to reproduce on Ruby 2.0 or 2.1? Even if it's not feasible for you to upgrade Ruby in your production code, it'll help us narrow this down. |
@sandal I'll have a go and let you know what I find. |
Here's a short program that demonstrates the problem. I ran it on Ruby 1.9.3p286 and Ruby 2.1.3, and Prawn 1.0.0 and Prawn 1.3.0. The results are below. # encoding: utf-8
require 'prawn'
@doc = Prawn::Document.new page_size: 'A4'
def debug(name, string)
puts "#{name}: (#{string.encoding}) #{string.inspect}"
end
def render_text_box(string)
Prawn::Text::Box.new(
string,
width: 100,
height: 20,
document: @doc
).render
end
text = "A quick brown fox jumped over the lazy dog."
overflow = render_text_box text
debug 'text', text
debug 'overflow', overflow
text = "A quick brown fox jumped über the lazy dog."
overflow = render_text_box text
debug 'text', text
debug 'overflow', overflow And here are the results:
I would expect the overflow to always have the same encoding as the original text, i.e. UTF-8. |
@airblade: On closer investigation, behavior isn't exactly a bug, at least in 1.3.0. Here's the summary of why:
Using DejaVuSans, I was able to get the following output on both Prawn 1.0 and Prawn 1.3 (I don't think Ruby version matters): text: (UTF-8) "A quick brown fox jumped over the lazy dog."
overflow: (UTF-8) "fox jumped over the lazy dog."
text: (UTF-8) "A quick brown fox jumped über the lazy dog."
overflow: (UTF-8) "fox jumped über the lazy dog." I think that's what you were looking for, right? We need to do a better job of informing people that Prawn's default font selection (and not coincidentally, the PDF format's defaults) are NOT unicode friendly, even though Prawn itself handles UTF-8 text fine given fonts that support it. I think this may involve raising a warning or error when non-compatible glyphs are found, and also probably a guide explaining this. I'll open a ticket for those issues. |
Note about need for better documentation / warning behavior is in #779. |
@airblade Upon closer investigation, the plot thickens! Here's a summary of what's going wrong here:
In WinAnsi, the byte value and codepoint are the same ( >> "ü".codepoints
=> [252]
>> "ü".bytes
=> [195, 188] So when we attempt to convert this text back into UTF-8 in various places throughout the text call chain, we're losing information and attempting to treat WinAnsi byte values as if they are equivalent to UTF-8 byte values: They're not! This is going to take further investigation, but it seems like this gets us at least a little closer. Sorry for the long and probably fuzzy explanation above. |
@sandal Thank you very much for investigating this and for the explanations. I had no idea that the PDF standard specifies default fonts which only support Win-1252. That's surprising in this day and age but I suppose it's an antique standard. I'll leave a comment on #779. You're right, using a TTF font solves my immediate problem – so thank you. As for the thickening plot, I shall follow with interest and try to contribute where I can. |
@airblade: I'm working on a fix that would convert the remaining text back into UTF-8, which I think is a better behavior. But as it turns out, the existing behavior is documented with specs, and there is a way of getting things to work on released versions of Prawn. By passing the |
@sandal That looks like as simple a solution as can be. |
Here's my use case: I have a fixed size text box and I want to write text in it. Often that text is too long for the text box and so the text should be truncated. This all works fine for me.
However I would like to indicate when the text has been truncated, perhaps with an ellipsis or simply three full stops (periods). This is how I am trying to achieve it:
However I often, though not always, get
incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
errors. Sometimestext
is UTF-8, sometimes it's US-ASCII. Sometimesoverflow
is UTF-8, sometimes it's US-ASCII.I have tried re-encoding both
text
andoverflow
to UTF-8 but then the substitution doesn't work because the strings no longer match.The text was updated successfully, but these errors were encountered: