-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The last paragraph in a section is not terminated with a paragraph mark... #105
Comments
Some new information today: Wordinator creates a Word file that Word opens in Compatibility Mode. The problem is manifest. Creating a new file creates a file not in Compatibility Mode, and the problem is not manifest. Saving that new file as a ".doc" instead of ".docx" changes the status to be in Compatibility Mode, and the problem is manifest without changing any data in the file. Saving that new file with the problem manifest as yet another new ".docx" file, Word presents the dialogue about upgrading the file, the file is saved, and the problem is not manifest anymore. Can Wordinator be configured to produce Word files that are opened as ".docx" not in Compatibility Mode, rather than ".docx" in Compatibility Mode? |
I'll look into this tomorrow. |
What Word actually outputs is the below. As you can see, the only difference between paragraphs that are formatted correctly and those that are not is that the ones that are not formatted correctly have It's not immediately obvious whether this is a Word bug or what it is.
|
This However, it seems this is something to do with the predefined style I don't think the paragraph marks and pressing enter have anything to do with this. Pressing enter just creates a new paragraph and moves the problem to the next paragraph (which doesn't show, as it's empty). Unfortunately, I don't have a working version of Word here (would need to pay for license), so I can't investigate further in Word. But it seems that the problem is neither Word nor Wordinator, but the
The
If I edit the docx file so that So the problem is the |
And, yet, the data I've provided uses the same P style on paragraphs that are not the last in a section, as well as last in the section. And when the generated "Compatibility Mode" output is upgraded to DOCX without changing the P style, the result works without changing. I'm unclear why the unchanged setting for "jc" wouldn't work everywhere in "Compatibility Mode", nor why revealing the paragraph marks reveals there are paragraph marks at the end of every paragraph EXCEPT the paragraph at the end of the section. Surely if Wordinator consistently added the paragraph marker consistently at the end of every paragraph we create, it would work in both Compatibility Mode and not. That was what I identified regarding the paragraph marks: the lack of consistency. The SWPX markup, as I illustrated, is identical for every paragraph I marked up. |
If you look at the OOXML that Wordinator produces there is nothing like a paragraph mark in there. So that mark is some sort of visualization that Word produces, and I guess the absence of it is another symptom of the problem, rather than a cause. I agree it looks like a bug, but it looks like a bug in Word. As far as I can tell, Wordinator is doing exactly what it's supposed to do. Do we need |
I can find no user interface manipulation of |
I think |
Of course, yes, it corresponds to paragraph alignment, but the controls in Word regarding paragraph alignment are not so finely tunable to modify any concept of "both"/"start" for an attribute of the name "jc". The coarse adjustment only selects between "Left", "Centered", "Right", and "Justified". XSL-FO specifically addresses what jc= addresses, and that is the alignment specifically of the last line of a paragraph of text. So I'm not surprised there exists a property in OOXML supporting a traditional layout concept ... I stated that I could find no user interface manipulation of that concept in the Word tool. |
My Mac version of Word has these buttons: Left to right they seem to correspond to Ultimately, whatever the solution to this is, it has to be something that we can express in OOXML. |
Can you confirm when you set jc="start" that the OTHER lines of the paragraph remain justified? The user requirement is that all lines except the last line of each paragraph be justified. If not, then this has been a red herring and the drop-down list I found, and the buttons you found, all correspond only to jc= and are irrelevant to the problem because there are no other fine tunings available, as there is in XSL-FO, for the alignment of the last line of a paragraph. When Word in "Compatibility Mode" presents the last justified paragraph of a section, Word justifies the last line of a justified paragraph. Word does not justify the last line of other justified paragraphs in the section. When upgrading the file out of "Compatibility Mode" by saving the document appropriately, the appearance of the unmodified last justified paragraph of a section is correct. Word modifies the paragraph and the last line is presented properly. I suppose I could hack every file we create by adding an empty paragraph of zero or one pt font at the end of every section ... but that doesn't reflect the user data and, likely, the conversion of the Word file to STS XML downstream will then add that paragraph element that never was created by the user. |
They don't. If I use
That's definitely a reasonable requirement, and there doesn't seem to be a
Hmmm. Word for Mac doesn't create any warnings about compatibility mode. I'm not sure how important this is. It sounds like it might be just a question of Word versions.
Yes. We also don't know what other problems this might create in other use cases. Part of the difficulty here is that the settings which appear to cause the problem do not actually appear in the file that the Wordinator code is creating (
As you can see we don't even have the There is one thing we can perhaps try to explore a little more, and that's this hint:
Is there something about having the section properties stored on it that makes Word treat the last paragraph differently? I could try removing the sections just to see, but of course that's not really a solution, either. But at least we might learn more about what's going on. Overall, this looks like a very tricky problem, because it's not at all clear what's really causing the problem. I think there are only two viable routes toward a solution here:
Unfortunately, I suspect we could end up spending a lot of time before we get to the bottom of this. So we need to consider how much time this is worth. |
Where you say...
... in fact we are getting exactly what we need for all paragraphs except the last paragraph of a section, even when all paragraphs are marked up the same. My very first post has the DOCX that shows this, created using Wordinator, from an SWPX file that also is in the ZIP. This is the PDF that is created from Word: paragraph-marker-20230213.pdf ... and one sees that the second and third pages show the first paragraph of the page correctly formatted as we need: justification of all lines except the last line. And the third page shows the second paragraph correctly formatted as we need, even though it presented incorrectly on the second page, using the identical markup. |
Could you try a couple of experiments? First, solve the problem the way you described in your initial comment, then save the Word file and send it to me (or attach here). Second, start with a newly converted document, then try to apply the ctrl+shift+j trick to see if that lets you fix the last para. Also try it on one of the paras that is OK, to see what happens. Then save and share the document. I don't have a Word where I can edit files, but this way we might be able to learn more about what OOXML achieves the desired result. |
The original generated Word file and PDF: paragraph-marker-20230213.docx - paragraph-marker-20230213.pdf Pressing <Enter> at the end of each last paragraph of the section: paragraph-marker-20230213-with-Enter.docx - paragraph-marker-20230213-with-Enter.pdf Saving the original file new as an upgraded file: paragraph-marker-20230213-upgraded.docx - paragraph-marker-20230213-upgraded.pdf Reading the trick page you cited, I see that the c+s+j expands the given line of a paragraph, it doesn't do the contraction that we need. Sure enough, that is the result that I see: paragraph-marker-20230213-tricked.docx - paragraph-marker-20230213-tricked.pdf OOXML is achieving the desired result in the paragraph when the paragraph isn't the last paragraph of the section. There are no markup changes to compare between any non-last paragraph and the last paragraph. And when I add a new paragraph at the end of an existing section, the paragraph that was incorrectly formatted magically becomes correctly formatted. But that is simply an extension of my "add <Enter>" technique above. |
In the docx where you pressed enter the first paragraph turns into this monstrosity:
The reason it works, however, is that there is an empty
I'm not sure that can be used for anything, but the next
So perhaps this The upgraded document looks the same. The document with the trick has Okay, but then it looks like the way to solve this is to not convert sections the way we have done so far, but to use this other markup instead. I'll see if I can do that, but I'm not sure when I'll be able to work on it. |
The problem goes away when one opens the file, selects File/Info, and turns off "Compatibility Mode" ... no changes to the file and all paragraphs look correct. If there is no way to configure the environment to produce a file that is not in "Compatibility Mode", I suggest shelving this ticket until a client complains about having to do the conversion manually after the fact. |
If you review older issues, you'll see that managing the details of final paragraphs in sections is a significant challenge because of the way Word uses the last paragraph of a section to define the rules for the entire section. If memory serves, this also affects how things like justification of the last line of the last paragraph of a section get handled. |
Okay. The I tried the method ChatGPT suggested. I can't see anything in the output files that say what .docx version is used, so I'm not convinced this can be done at all. The code from ChatGPT is basically nonsensical when you compare it to the actual Apache POI API. The classes and methods quite simply don't fit together this way, and some of the methods appear not to exist at all. So this seems to be a dead end. Trying to google for a way to set the version of the .docx format yields nothing. (As expected, since that doesn't seem to be a meaningful thing at all.) In fact, there only appear to be two versions: the ECMA and the ISO version, and there doesn't seem to be any way to declare which one you're producing. So this doesn't seem to be a meaningful thing to try to do. |
Yes, my initial attempts to "correct" or at least control the behavior of sections under specific conditions ran into some significant rework issues. Part of the issue, if I'm remembering correctly, was that if there is exactly one section then one set of markup in the DOCX is required, but if there are two or more, then different markup is required, and the XML cursor technique used with POI doesn't make lookahead easy, so the problem is best solved in the SWPX generation, where you have all knowledge. |
Closing as fixed: Turning off compatibility mode corrects the issue. Note that this became possible with POI 5.2.5 which enabled modifying the document-level settings using the XWPX API. |
* Fixes #133, #105: Set compatibity mode setting to turn off compatibility mode. Signed-off-by: eliot.kimber <eliot.kimber@servicenow.com> * Fixes #109: Incorporate fix from Lars Marius to ensure table cell ends with <p> Signed-off-by: eliot.kimber <eliot.kimber@servicenow.com> * WIP: Added multi-section test cases from Lars Marius Signed-off-by: eliot.kimber <eliot.kimber@servicenow.com> * Fixes #117: Last section handling from Lars Marius Signed-off-by: eliot.kimber <eliot.kimber@servicenow.com> --------- Signed-off-by: eliot.kimber <eliot.kimber@servicenow.com>
... resulting in the last line of a justified paragraph being justified instead of aligned left.
I diagnosed this by turning on paragraph marks and noting the absence of the paragraph mark at the end of the paragraph. Adding it my hand made the justified paragraph format correctly with the last line not being justified.
To illustrate this, the attached is a three-section SWPX file, each with one, two, and three paragraphs. The order and content of the paragraphs is the same in the three sections. The last paragraph of each section is not formatted correctly when brought into word as is. Pressing <Enter> at the end of the last paragraph of the section formats that paragraph correctly.
paragraph-marker-20230213.zip
The text was updated successfully, but these errors were encountered: